This message "got a little longer". Here is a summary for the impatient, more details are inline below.
When you need something that supports seek(), make sure you operate on files. When you are a componpent within a pipe, just consume the input stream and generate an output stream. Don't assume random access to previously generated data when you cannot seek on your output (e.g. because it's a stream). A _decoder_ method to communicate "end of input" may be desirable, and could provide some "flush" semantics, but still does not allow modification of previously generated stream content. (And does not resolve the "last sample" issue either.) When there is no or no simple solution, we might be looking at the wrong problem perhaps? Use appropriate data formats if the currently used format breaks/prevents the specific use case. Keep decoders simple in the sense that they consume an input stream and generate an output stream. Don't bother with "file formats" in a decoder, such a requirement might be a strong hint that this is the job for an output module. On Thu, Oct 06, 2016 at 20:30 -0700, Chris Dreher wrote: > > > Date: Thu, 6 Oct 2016 09:23:16 +0200 > > From: gerhard.sit...@gmx.net > > To: sigrok-devel@lists.sourceforge.net > > Subject: Re: [sigrok-devel] Outputting files from Protocol Decoders > > > > On Wed, Oct 05, 2016 at 11:36 -0700, Chris Dreher wrote: > > > > > > In looking at how to output files from protocol decoders, I > > > have the following questions: > > > > > > 1. Once put() is called for OUTPUT_BINARY, is there anyway to > > > go back and change those bytes? This is especially useful for > > > adjusting file headers based on sample inputs the come later in > > > the stream. For example, the i2s PD can output a WAV file. > [ ... ] > > > > This does not solve the general issue. > > Actually, it would solve the general issue. Going back and > changing previously submitted bytes would provide similar > functionality as seek() followed by a write() that most > operating systems provide. We don't disagree at all. The part that you cite referred to the part that you dropped. You might have put this part of my reply into an unintended context, or might have stopped reading too soon. Try the positive interpretation first, before assuming that somebody wants to dismiss you. :) And I wrongly assumed that the specific example that you mentioned in detail would have been your actual motivation for asking in the first place. I'm sorry about that. What I said was that generating WAV files from several chunks does not solve the general issue of manipulating arbitrary data _after_ it was handed to some other component for further processing. The approach of generating the output from several chunks might just help working around the specific issue that you mentioned in particular. (While there still might be issues left, but as you said WAV files were just an example, not the actual and most pressing issue for you.) The most important part that you dropped is my questioning the choice of the WAV file format (that requires seeking in the specific case) in a spot that is not supposed to deal with "file formats" at all. A protocol decoder should not bother, and just assume "stream in, stream out". Regardless of whether the decoder's output happens to get written to a file at some other component in the software topology. An output module _might_ assume having access to an output file, which then allows to seek. The most appropriate approach would be to generate an annotation of "audio samples" from the decoder. And to have output modules take those audio samples, and write them to files in whatever format they please. Add the AC97 decoder (which is on the project's wishlist) and PWM (which exists, and could provide audio signal amplitudes, too), and you see why decoders should not re-invent file format handling. Add an output module for formats other than WAV, and all audio sample sources will benefit in transparent ways. Increase the number of components on either side of the interface (interpret delta-sigma as "some kind of PWM" maybe, decode I2C based audio codec communication like WM chips on FPGA boards or RPi hats, analog input readers, spectrum analyzer output channels, etc) and you see how the "m + n" complexity is preferrable over "m * n" or lack of orthogonal support. Apply this "audio samples" or "transparent data within an annotation of the output stream" approach to whatever kind of data you actually had in mind. Stick with the general idea of "decoders consume and generate streams", and "if file access is assumed, make sure you have a file" (which translates to "should be an output module"). This shall solve the issue you have. Also note that seek(3) is not only a feature of "an OS", but moreso of the object that you manipulate. While you can seek on files, you cannot seek on pipes and sockets. See potentially related Bugzilla items: There is 292 for decoders, reporting a specific symptom. There is 749 which discusses the general issue of how decoders (need to) work. There is 236 for PulseView which states that even input modules may not know the sample count upfront. Capture devices may suffer from the same issue of not knowing how many samples they will provide, until the terminating condition is met after an arbitrary amount of samples was delivered. So yes, there is a limitation in the current implementation that does not allow what you appear to try to achieve. Yet we can see that there are reasons for that limitation, and that resolving the issue may or may not be possible, but certainly is not trivial. It's worth checking whether you are looking at the "right problem" when there is no obvious or no straight forward solution, or when the solution comes with new disadvantages or limitations. Did I miss this? Are output modules "just" other participants in the pipe architecture, and don't (necessarily) have access to an output file? Though it might be acceptable for an output module to "throw up" when its output is not a seekable object (if that can get detected). And of course only in those cases where the output needs to get seeked and re-written. Existing code might already come with such a constraint. > I deleted the remainder of the WAV-specific response since the > question is about how to solve the issue generally. The WAV > output by the i2s PD is just an example of existing code where > an incorrect header is output because the total length of the > data is known. I mistook your mentioning the I2S decoder and WAV file example as a specific motivation in that moment, while there is some general issue behind it but in some further distance. Sorry for getting this wrong. I got it after you told me. > > > 2. Is it acceptable to buffer most of the file data and then > > > just output the entire file at the end? Again, this relates to > > > file headers. Theoretically, a PD could buffer the file data > > > until it reaches the end of the sample inputs, then calculate > > > the size fields, and finally call put() to output the entire > > > file at once. Are there memory limits in python, similar to > > > Java VM memory limits, or is memory only limited by the OS's > > > memory limits? > > > > Have seen decoders buffer data all the time, though they only > > "defer" data until the completion of a frame or transaction. > > Haven't seen deferral for all of the input data yet. > > Are you confirming that python does not have the memory limits > that other languages (ex: Java) has? I've written code in C++ > that defers output but the question is whether this is ok in > python and whether it is acceptable for sigrok's PDs. Actually it's only Java where I ever saw such a "memory setup" feature (which always reminded me of XMS/EMS setup in the old days of DOS). No other interpreted language that I'm aware of has such a thing, so it's actually Java which I'd perceive as being the exception. What applies however are the "regular" limits that are inherent to process management, see ulimit(1) and setrlimit(2) or whatever is the counterpart of your platform of choice. The sigrok(1) process happens to embed the python(3) library which is used to execute the PD(3py) scripts. So you are limited to whatever resources the machine (and its OS configuration) has to offer, and optionally what users/admins have configured to suite their taste. But that's obvious, and does not differ from any other process that you are running. See the sigrok project's coding guidelines, which suggest that allocations of up to 1MB are assumed to always succeed (or fail in other acceptable/appropriate ways), and larger allocations should get checked. Depending on your expected "workload" (typical data volume for a "transaction" in your decoder), you may just accumulate data until you encounter the transaction's ending condition (that's what decoders normally do, as far as I understand it). Or your decoder has the option of "chunking" its output (which could prevent excessive buffering, _if_ the output data format lends itself to chunking). Or you deal with data formats that stream in natural ways, and don't require buffering. There already are reports where decoders fail in the presence of huge amounts of sample data (or specific use patterns in the input data, ISTR one report was on a long duration combined with very high resolution). You write the code for normal use, and cannot do much about excessive or insane amounts of input that exhaust available resources while you are busy operating on the logical content. Even if you may detect situations of resource exhaustion, you may not be able to handle them appropriately anyway. So what's the point of catching them then? In theory you can overrun _any_ decoder which supports a protocol that has the notion of "frames" or "transactions" of arbitrary size. Consider the SPI protocol with its chip select signal. Typical use (displays, sensors, eeproms) may never transfer more than a few hundred bytes most of the time. Still you can drain complete flash chips within a single transfer. Or even refuse to ever release CS at all. Thus you could accumulate megabytes and gigabytes and more before the transfer ends, if it ends at all. As a decoder, you never know the logical content of the input stream upfront. Some are in the comfortable position to just generate an opaque output stream, with no buffering at all or just minimal buffering. Some might work on simple fixed size data items, and some might have the option of "chunking" their output. Always consider the specific environment that you work in. There is no one-size-fits-all and quick answer to that. Do you have a specific other protocol in mind when it's not I2S? What's the itch you try to scratch? Can you use a "streaming data format"? If not then why not (within the decoder)? I'm not rebutting your position, just trying to better understand the problem. > > > 3. Is there a way to know that the end of the sample input > > > stream has been reached? This way, a PD would know that there > > > is no more data and that decoding is done. This would prevent > > > a PD from waiting any further for sample inputs that will never > > > come. > [ ... ] See Bugs 292/749/236. The symptoms are known but it's yet to get determined what a solution is (and maybe what exactly the problem or the scope of the problem is). I feel that regardless of the potentially added decoder method's name (I lean towards "end_of_input" maybe, or "decode_end"), the semantics can only be that of fflush(3). In that way decoders might push data to the output stream that was internally kept for potential performance reasons or generation of chunks where the size of chunks needs to be known upfront. But the flush decoder method will neither solve the "manipulate previously written data" issue, nor the "last sample" issue. Still it's worth considering the flush feature as a future improvement to the decoder API. virtually yours Gerhard Sittig -- If you don't understand or are scared by any of the above ask your parents or an adult to help you. ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ sigrok-devel mailing list sigrok-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sigrok-devel