The underlying cause is that Daffodil doesn't read all the data in at
once. Instead it only reads and parses data in small-ish chunks and
discards chunks once it is done with them. The benefit to this approach
is that it allows Daffodil use smaller amounts of memory than might
otherwise be needed if it had read the entire file in all at once. In
fact, it even allows Daffodil to parse files that are larger than could
fit in JVM memory.
However, this means that when a parse fails, Daffodil might not have
read the entire file, and so it doesn't actually know how much is really
left. All it knows about is the size of the chunks that still remain.
If we wanted to fix this, once parsing completes we could try to consume
data until we hit EOF, which would give us an accurate number of
remaining bits. But if the input is coming from a stream, then EOF could
take a while or not actually happen, and Daffodil would appear to hang.
So instead we just bail and report as much as we know about.
Alternatively we could check if the input is a file vs a stream and then
do simple file size calculation, but thus far is hasn't been a high
priority for the extra complication.
On 2023-04-13 12:10 PM, Roger L Costello wrote:
I am gradually adding to my DFDL schema. I expect there to be "Left over data".
But it would be nice if Daffodil accurately told me how much left over data there is. Or
at least, it would be nice of Daffodil didn't (apparently) make things up. Let me explain.
I ran my DFDL schema and got this message:
[error] Left over data. Consumed 109767424 bit(s) with at least 5376 bit(s)
remaining.
Okay, so I have about 5,000 bits remaining to be parsed.
I added more stuff into my DFDL schema. The schema now gobbles up more of the
input. I expect the number of bits consumed to increase and the number of
left-over bits to decrease. Here's what Daffodil gives:
[error] Left over data. Consumed 191712176 bit(s) with at least 46160 bit(s)
remaining.
Daffodil reports that more bits were consumed: 109,767,424 --> 191,712,176
Good. Makes sense.
Daffodil reports that there are more remaining bits: 5,376 --> 46,160
Huh? That's crazy.
Why can't Daffodil accurately tell the number of remaining bits?
/Roger