Here is a brief summary of the original questions and their status here (TL;DR).


1.  Once put() is called for OUTPUT_BINARY, is there anyway to go back and 
change those bytes?


This question remains unanswered.  There is a lot of good discussion about how 
to redesign how sigrok's file generation code could/should work, though this 
would be better handled on a separate email thread.  What I'm looking for is 
whether bytes can be changed after put() has been called in the current 
implementation of sigrok's PD API.  At this point, I expect that bytes can not 
be changed after put() has been called.


2.  Is it acceptable to buffer most of the file data and then just output the 
entire file at the end?


The answer here is yes and no.  Yes, it is ok to buffer data for a considerable 
amount of time.  However, per bugs 292 and 749, the end of the sample stream is 
currently not known to PDs (this is especially problematic for live data 
streams).  Regarding memory limits, it sounds like there are no memory limits 
beyond those imposed by the OS (physical or via tools such as ulimit).  There 
are no unusual memory limits, such as those imposed by the Java VM.


3.  Is there a way to know that the end of the sample input stream has been 
reached?


No, per bugs 292 and 749 (thanks Gerhard).


4.  (new question) Can put()'s parameters of startsample and endsample be used 
to insert data earlier into the file output or are these parameters ignored 
because the order of calls to put() determine the order bytes are output?


Theoretically, calling put(0, 0, ...) could be used to insert a header at the 
beginning of file after the rest of the file data has already been output.  
However, I strongly suspect that the startsample and endsample parameters are 
ignored for file output.


To answer a question by Gerhard below: I am looking at the MIDI file format, 
specifically the type 0 format for MIDI files.


-Chris


________________________________
From: Gerhard Sittig <gerhard.sit...@gmx.net>
Sent: Sunday, October 9, 2016 10:02 AM
To: sigrok-devel@lists.sourceforge.net
Subject: Re: [sigrok-devel] Outputting files from Protocol Decoders

This message "got a little longer".  Here is a summary for the
impatient, more details are inline below.

When you need something that supports seek(), make sure you
operate on files.  When you are a componpent within a pipe, just
consume the input stream and generate an output stream.  Don't
assume random access to previously generated data when you cannot
seek on your output (e.g. because it's a stream).

A _decoder_ method to communicate "end of input" may be
desirable, and could provide some "flush" semantics, but still
does not allow modification of previously generated stream
content.  (And does not resolve the "last sample" issue either.)

When there is no or no simple solution, we might be looking at
the wrong problem perhaps?  Use appropriate data formats if the
currently used format breaks/prevents the specific use case.

Keep decoders simple in the sense that they consume an input
stream and generate an output stream.  Don't bother with "file
formats" in a decoder, such a requirement might be a strong hint
that this is the job for an output module.


On Thu, Oct 06, 2016 at 20:30 -0700, Chris Dreher wrote:
>
> > Date: Thu, 6 Oct 2016 09:23:16 +0200
> > From: gerhard.sit...@gmx.net
> > To: sigrok-devel@lists.sourceforge.net
> > Subject: Re: [sigrok-devel] Outputting files from Protocol Decoders
> >
> > On Wed, Oct 05, 2016 at 11:36 -0700, Chris Dreher wrote:
> > >
> > > In looking at how to output files from protocol decoders, I
> > > have the following questions:
> > >
> > > 1.  Once put() is called for OUTPUT_BINARY, is there anyway to
> > > go back and change those bytes?  This is especially useful for
> > > adjusting file headers based on sample inputs the come later in
> > > the stream.  For example, the i2s PD can output a WAV file.
> [ ... ]
> >
> > This does not solve the general issue.
>
> Actually, it would solve the general issue.  Going back and
> changing previously submitted bytes would provide similar
> functionality as seek() followed by a write() that most
> operating systems provide.

We don't disagree at all.  The part that you cite referred to the
part that you dropped.  You might have put this part of my reply
into an unintended context, or might have stopped reading too
soon.  Try the positive interpretation first, before assuming
that somebody wants to dismiss you. :)  And I wrongly assumed
that the specific example that you mentioned in detail would have
been your actual motivation for asking in the first place.  I'm
sorry about that.

What I said was that generating WAV files from several chunks
does not solve the general issue of manipulating arbitrary data
_after_ it was handed to some other component for further
processing.  The approach of generating the output from several
chunks might just help working around the specific issue that you
mentioned in particular.  (While there still might be issues
left, but as you said WAV files were just an example, not the
actual and most pressing issue for you.)

The most important part that you dropped is my questioning the
choice of the WAV file format (that requires seeking in the
specific case) in a spot that is not supposed to deal with "file
formats" at all.  A protocol decoder should not bother, and just
assume "stream in, stream out".  Regardless of whether the
decoder's output happens to get written to a file at some other
component in the software topology.  An output module _might_
assume having access to an output file, which then allows to
seek.

The most appropriate approach would be to generate an annotation
of "audio samples" from the decoder.  And to have output modules
take those audio samples, and write them to files in whatever
format they please.  Add the AC97 decoder (which is on the
project's wishlist) and PWM (which exists, and could provide
audio signal amplitudes, too), and you see why decoders should
not re-invent file format handling.  Add an output module for
formats other than WAV, and all audio sample sources will benefit
in transparent ways.  Increase the number of components on either
side of the interface (interpret delta-sigma as "some kind of
PWM" maybe, decode I2C based audio codec communication like WM
chips on FPGA boards or RPi hats, analog input readers, spectrum
analyzer output channels, etc) and you see how the "m + n"
complexity is preferrable over "m * n" or lack of orthogonal
support.

Apply this "audio samples" or "transparent data within an
annotation of the output stream" approach to whatever kind of
data you actually had in mind.  Stick with the general idea of
"decoders consume and generate streams", and "if file access is
assumed, make sure you have a file" (which translates to "should
be an output module").  This shall solve the issue you have.

Also note that seek(3) is not only a feature of "an OS", but
moreso of the object that you manipulate.  While you can seek on
files, you cannot seek on pipes and sockets.


See potentially related Bugzilla items:  There is 292 for
decoders, reporting a specific symptom.  There is 749 which
discusses the general issue of how decoders (need to) work.
There is 236 for PulseView which states that even input modules
may not know the sample count upfront.  Capture devices may
suffer from the same issue of not knowing how many samples they
will provide, until the terminating condition is met after an
arbitrary amount of samples was delivered.

So yes, there is a limitation in the current implementation that
does not allow what you appear to try to achieve.  Yet we can see
that there are reasons for that limitation, and that resolving
the issue may or may not be possible, but certainly is not
trivial.  It's worth checking whether you are looking at the
"right problem" when there is no obvious or no straight forward
solution, or when the solution comes with new disadvantages or
limitations.


Did I miss this?  Are output modules "just" other participants in
the pipe architecture, and don't (necessarily) have access to an
output file?  Though it might be acceptable for an output module
to "throw up" when its output is not a seekable object (if that
can get detected).  And of course only in those cases where the
output needs to get seeked and re-written.  Existing code might
already come with such a constraint.


> I deleted the remainder of the WAV-specific response since the
> question is about how to solve the issue generally.  The WAV
> output by the i2s PD is just an example of existing code where
> an incorrect header is output because the total length of the
> data is known.

I mistook your mentioning the I2S decoder and WAV file example as
a specific motivation in that moment, while there is some general
issue behind it but in some further distance.  Sorry for getting
this wrong.  I got it after you told me.


> > > 2.  Is it acceptable to buffer most of the file data and then
> > > just output the entire file at the end?  Again, this relates to
> > > file headers.  Theoretically, a PD could buffer the file data
> > > until it reaches the end of the sample inputs, then calculate
> > > the size fields, and finally call put() to output the entire
> > > file at once.  Are there memory limits in python, similar to
> > > Java VM memory limits, or is memory only limited by the OS's
> > > memory limits?
> >
> > Have seen decoders buffer data all the time, though they only
> > "defer" data until the completion of a frame or transaction.
> > Haven't seen deferral for all of the input data yet.
>
> Are you confirming that python does not have the memory limits
> that other languages (ex: Java) has?  I've written code in C++
> that defers output but the question is whether this is ok in
> python and whether it is acceptable for sigrok's PDs.

Actually it's only Java where I ever saw such a "memory setup"
feature (which always reminded me of XMS/EMS setup in the old
days of DOS).  No other interpreted language that I'm aware of
has such a thing, so it's actually Java which I'd perceive as
being the exception.

What applies however are the "regular" limits that are inherent
to process management, see ulimit(1) and setrlimit(2) or whatever
is the counterpart of your platform of choice.  The sigrok(1)
process happens to embed the python(3) library which is used to
execute the PD(3py) scripts.  So you are limited to whatever
resources the machine (and its OS configuration) has to offer,
and optionally what users/admins have configured to suite their
taste.  But that's obvious, and does not differ from any other
process that you are running.


See the sigrok project's coding guidelines, which suggest that
allocations of up to 1MB are assumed to always succeed (or fail
in other acceptable/appropriate ways), and larger allocations
should get checked.

Depending on your expected "workload" (typical data volume for a
"transaction" in your decoder), you may just accumulate data
until you encounter the transaction's ending condition (that's
what decoders normally do, as far as I understand it).  Or your
decoder has the option of "chunking" its output (which could
prevent excessive buffering, _if_ the output data format lends
itself to chunking).  Or you deal with data formats that stream
in natural ways, and don't require buffering.


There already are reports where decoders fail in the presence of
huge amounts of sample data (or specific use patterns in the
input data, ISTR one report was on a long duration combined with
very high resolution).  You write the code for normal use, and
cannot do much about excessive or insane amounts of input that
exhaust available resources while you are busy operating on the
logical content.  Even if you may detect situations of resource
exhaustion, you may not be able to handle them appropriately
anyway.  So what's the point of catching them then?

In theory you can overrun _any_ decoder which supports a protocol
that has the notion of "frames" or "transactions" of arbitrary
size.  Consider the SPI protocol with its chip select signal.
Typical use (displays, sensors, eeproms) may never transfer more
than a few hundred bytes most of the time.  Still you can drain
complete flash chips within a single transfer.  Or even refuse to
ever release CS at all.  Thus you could accumulate megabytes and
gigabytes and more before the transfer ends, if it ends at all.

As a decoder, you never know the logical content of the input
stream upfront.  Some are in the comfortable position to just
generate an opaque output stream, with no buffering at all or
just minimal buffering.  Some might work on simple fixed size
data items, and some might have the option of "chunking" their
output.  Always consider the specific environment that you work
in.  There is no one-size-fits-all and quick answer to that.


Do you have a specific other protocol in mind when it's not I2S?
What's the itch you try to scratch?  Can you use a "streaming
data format"?  If not then why not (within the decoder)?  I'm not
rebutting your position, just trying to better understand the
problem.


> > > 3.  Is there a way to know that the end of the sample input
> > > stream has been reached?  This way, a PD would know that there
> > > is no more data and that decoding is done.  This would prevent
> > > a PD from waiting any further for sample inputs that will never
> > > come.
> [ ... ]

See Bugs 292/749/236.  The symptoms are known but it's yet to get
determined what a solution is (and maybe what exactly the problem
or the scope of the problem is).

I feel that regardless of the potentially added decoder method's
name (I lean towards "end_of_input" maybe, or "decode_end"), the
semantics can only be that of fflush(3).  In that way decoders
might push data to the output stream that was internally kept for
potential performance reasons or generation of chunks where the
size of chunks needs to be known upfront.

But the flush decoder method will neither solve the "manipulate
previously written data" issue, nor the "last sample" issue.
Still it's worth considering the flush feature as a future
improvement to the decoder API.


virtually yours
Gerhard Sittig
--
     If you don't understand or are scared by any of the above
             ask your parents or an adult to help you.

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
Slashdot: News for nerds, stuff that matters<http://sdm.link/slashdot>
sdm.link
Slashdot: News for nerds, stuff that matters. Timely news source for technology 
related news with a heavy slant towards Linux and Open Source issues.


_______________________________________________
sigrok-devel mailing list
sigrok-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sigrok-devel
sigrok-devel Info Page - 
SourceForge<https://lists.sourceforge.net/lists/listinfo/sigrok-devel>
lists.sourceforge.net
To see the collection of prior postings to the list, visit the sigrok-devel 
Archives. Using sigrok-devel: To post a message to all the list members ...


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
sigrok-devel mailing list
sigrok-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sigrok-devel

Reply via email to