> We need to introduce a change in the stm module API and in the filters > management: the memory should be allocated by the caller and should be > possible to apply a filter using repeated calls. As in: > > /* create a stm and install filters for read... */ > /* allocate a 10k buffer... */ > /* read 10k and check for eof... */ > /* allocate 10k... */ > /* read 10k and check for eof... */ > ...
I don't understand this scheme - a stream tells you how long its input data segment is. In the case of an ASCII85 or ASCIIHEX stream, you only need to read 5 and 2 bytes at a time to decode; the operating system will handle the file buffering. For the decoded stream, ASCII85 and ASCIIHEX have predictable sizes; in fact you always know the length of the decoded ASCIIHEX and you can predict the ASCII85 result to within 4 characters. The ASCII85 and ACIIHEX filters has a predictable output size. But the size of the output of some filters (such as flate-decode) cannot be determined before to apply the filter to the entire data. The idea is to allow the client to speak in terms of filtered data. In this way if we install some filters to decode a stream and we tell a stm to get 10k we are asking to retrieve 10k of filtered data. In this way the stm_read function will work quite similar to fread. Except for very large streams such as audio and video, I don't see a point in piecemeal memory allocation; that will only result in poor performance and a horribly fragmented memory table. Streams in PDF files can be quite lengthy. Both audio and video data can be encoded in a PDF stream. From PDF 1.5 there are also object streams. I think that, like fread, stm_read should allow the user to make a suitable management of the memory used to return the data. I also don't understand why the caller should allocate memory when in principle the caller should be ignorant of the details of the filter behavior. I dont understand. Asking for 10k of filtered data is a quite good way to hide the details of the filter behavior: we dont care about the length of the unfiltered data.
