In daffodil there's a driver loop attempting to parse elements of the array. Since your array has maxOccurs unbounded, and this is speculative parsing, it's going to keep parsing more and and more array elements until a parse error occurs. It is the parse error that causes the array to terminate at the prior successful element. The error is discarded. This is the normal way speculative parsing resolves the end of a repetition.
So you are depending on a parse error for your schema to work. Now the + in your regex insists that to match, there must be at least one of the non-null characters. The mistaken assumption is that if there is no match for this, that causes a parse error. It does not. It's just a determination that the length is zero. The length pattern is just a regex. There's no notion of success/failure of the match. Only "how long was the match". Inability to match at all means the length of the match was zero. But the data type is string, and a zero-length string is a perfectly good value for a string. Unless you put in that dfdl:assert..... Then the parse error is raised by the assertion failure for the zero-length value. That parse error causes the array loop to terminate which is the behavior your schema needs. ________________________________ From: Costello, Roger L. <[email protected]> Sent: Monday, December 31, 2018 1:54:08 PM To: [email protected] Subject: RE: Why am I getting an "infinite loop" error message? Hi Mike, Thank you very much! Based on your excellent information I succeeded in getting the schema to work. See below for the working schema. The key was the addition of dfdl:assert <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ fn:string-length(.) gt 0 }</dfdl:assert> </xs:appinfo> </xs:annotation> Without it, I get the “infinite loop” error. I don’t understand why the dfdl:assert should be necessary. After all, the plus sign ( + ) in the regex dfdl:lengthPattern="[\x20-\x7F]+?(?=\x00)" specifies that the string must contain at least one character. Can you describe a bit more why the dfdl:assert is needed, please? Happy New Year! /Roger <xs:element name="input"> <xs:complexType> <xs:sequence> <xs:element name="String" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="value" type="xs:string" dfdl:lengthKind="pattern" dfdl:lengthPattern="[\x20-\x7F]+?(?=\x00)" dfdl:representation="text" dfdl:encoding="ASCII"> <xs:annotation> <xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ fn:string-length(.) gt 0 }</dfdl:assert> </xs:appinfo> </xs:annotation> </xs:element> <xs:sequence dfdl:hiddenGroupRef="hidden_nulls_Group" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:group name="hidden_nulls_Group"> <xs:sequence> <xs:element name="Hidden_nulls" type="xs:hexBinary" dfdl:lengthKind="pattern" dfdl:lengthUnits="bytes" dfdl:lengthPattern="[\x00]+?(?=([^\x00]|$))" dfdl:outputValueCalc='{ . }' /> </xs:sequence> </xs:group> From: Beckerle, Mike <[email protected]> Sent: Monday, December 31, 2018 12:50 PM To: [email protected] Subject: [EXT] Re: Why am I getting an "infinite loop" error message? I have hit what I think is this problem this problem a bunch of times. I have come to think of it as a flaw in dfdl. The problem is lengthKind pattern, and what it means when there is no match. Intuitively we think no match should cause a failure, and backtrack, but what it means is the length is "however much is matched", so no match means length zero. I.e., no match is a successful parse, producing zero length. Seriously, I think DFDL may need a new length kind of patternMatch where it must positively match, where failure to match is a true failure aka parse error. You can simulate this by adding an dfdl:assert to the string element insisting that its length is greater than 0. E.g., <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ fn:string-length(.) gt 0 }</dfdl:assert> </xs:appinfo> </xs:annotation> This will force failure and therefore backtracking if the regex match length is actually zero, which it should never be in your case. What I think is happening here is at some point here, your match fails, which results in zero length for the element, and then your repeating thing has zero length, and a zero-length repeating thing, when maxOccurs="unbounded" is an error, because it would result in an infinite loop. As for what's causing your match to fail, I'm less sure, Just some ideas here. Keep in mind a regex match for lengthKind pattern, those \xHH patterns are matching character code points, not bytes. The correspondence of character code point to byte is only 1 to 1 if you specify iso-8859-1. I think even though your hidden group is hexBinary, there may be some daffodil bug there. I suggest you try making the hidden group element not hidden (for testing), and make the element a string with encoding iso-8859-1 rather than a hexBinary. Your regex might be simplified. Really it's just [\x00]+ I think, i.e., match as many nulls as possible. I don't think you need the added complexity of telling it to match reluctantly up until a non-null or end of data. I'm not sure what that added stuff achieves. I don't know this is your error, but a common error is to forget that ASCII is 7 bits only. So for example \xFF will never be a valid ASCII char and if that byte 0xFF is found in the data it will cause a replacement character and that replacement character will NOT match your regex. So the encoding really matters. If you are using \xFF as a byte, you need iso-8859-1 encoding for sure. I hope that all helps Happy New Year Mike Beckerle Tresys Technology. Get Outlook for Android<https://aka.ms/ghei36> From: Costello, Roger L. Sent: Monday, December 31, 11:30 AM Subject: Why am I getting an "infinite loop" error message? To: [email protected]<mailto:[email protected]> Hello DFDL community, I have a binary input file containing: string null(s) string null(s) …. Here is my input file: [Image] Notice that each string is followed by one or more null symbols. One way to characterize the input is that there is a list of: string followed by one or more nulls The schema below is my attempt to faithfully implement that characterization. However, when I execute the schema, I get this “infinite loop” error message: [error] Parse Error: Repeating or Optional Element – No forward progress at byte 47. Attempt to parse List_of_strings succeeded but consumed no data. Please re-examine your schema to correct this infinite loop. I do not understand where the infinite loop is occurring. Would you explain, please? How to fix it? /Roger <xs:element name="input"> <xs:complexType> <xs:sequence> <xs:element name="List_of_strings" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="string" type="xs:string" dfdl:lengthKind="pattern" dfdl:lengthPattern="[\x01-\xFF]+?(?=\x00)" dfdl:representation="text" dfdl:encoding="ISO-8859-1"/> <xs:sequence dfdl:hiddenGroupRef="hidden_null_Group" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:group name="hidden_null_Group"> <xs:sequence> <xs:element name="Hidden_null" type="xs:hexBinary" dfdl:lengthKind="pattern" dfdl:lengthUnits="bytes" dfdl:lengthPattern="[\x00]+?(?=([^\x00]|$))" dfdl:outputValueCalc='{ . }' /> </xs:sequence> </xs:group>
