I have hit what I think is this problem this problem a bunch of times. I have come to think of it as a flaw in dfdl.
The problem is lengthKind pattern, and what it means when there is no match. Intuitively we think no match should cause a failure, and backtrack, but what it means is the length is "however much is matched", so no match means length zero. I.e., no match is a successful parse, producing zero length. Seriously, I think DFDL may need a new length kind of patternMatch where it must positively match, where failure to match is a true failure aka parse error. You can simulate this by adding an dfdl:assert to the string element insisting that its length is greater than 0. E.g., <xs:annotation><xs:appinfo source="http://www.ogf.org/dfdl/"> <dfdl:assert>{ fn:string-length(.) gt 0 }</dfdl:assert> </xs:appinfo> </xs:annotation> This will force failure and therefore backtracking if the regex match length is actually zero, which it should never be in your case. What I think is happening here is at some point here, your match fails, which results in zero length for the element, and then your repeating thing has zero length, and a zero-length repeating thing, when maxOccurs="unbounded" is an error, because it would result in an infinite loop. As for what's causing your match to fail, I'm less sure, Just some ideas here. Keep in mind a regex match for lengthKind pattern, those \xHH patterns are matching character code points, not bytes. The correspondence of character code point to byte is only 1 to 1 if you specify iso-8859-1. I think even though your hidden group is hexBinary, there may be some daffodil bug there. I suggest you try making the hidden group element not hidden (for testing), and make the element a string with encoding iso-8859-1 rather than a hexBinary. Your regex might be simplified. Really it's just [\x00]+ I think, i.e., match as many nulls as possible. I don't think you need the added complexity of telling it to match reluctantly up until a non-null or end of data. I'm not sure what that added stuff achieves. I don't know this is your error, but a common error is to forget that ASCII is 7 bits only. So for example \xFF will never be a valid ASCII char and if that byte 0xFF is found in the data it will cause a replacement character and that replacement character will NOT match your regex. So the encoding really matters. If you are using \xFF as a byte, you need iso-8859-1 encoding for sure. I hope that all helps Happy New Year Mike Beckerle Tresys Technology. Get Outlook for Android<https://aka.ms/ghei36> From: Costello, Roger L. Sent: Monday, December 31, 11:30 AM Subject: Why am I getting an "infinite loop" error message? To: [email protected] Hello DFDL community, I have a binary input file containing: string null(s) string null(s) …. Here is my input file: [Image] Notice that each string is followed by one or more null symbols. One way to characterize the input is that there is a list of: string followed by one or more nulls The schema below is my attempt to faithfully implement that characterization. However, when I execute the schema, I get this “infinite loop” error message: [error] Parse Error: Repeating or Optional Element – No forward progress at byte 47. Attempt to parse List_of_strings succeeded but consumed no data. Please re-examine your schema to correct this infinite loop. I do not understand where the infinite loop is occurring. Would you explain, please? How to fix it? /Roger <xs:element name="input"> <xs:complexType> <xs:sequence> <xs:element name="List_of_strings" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="string" type="xs:string" dfdl:lengthKind="pattern" dfdl:lengthPattern="[\x01-\xFF]+?(?=\x00)" dfdl:representation="text" dfdl:encoding="ISO-8859-1"/> <xs:sequence dfdl:hiddenGroupRef="hidden_null_Group" /> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> <xs:group name="hidden_null_Group"> <xs:sequence> <xs:element name="Hidden_null" type="xs:hexBinary" dfdl:lengthKind="pattern" dfdl:lengthUnits="bytes" dfdl:lengthPattern="[\x00]+?(?=([^\x00]|$))" dfdl:outputValueCalc='{ . }' /> </xs:sequence> </xs:group>
