This is great Mark, DFDL does not have a way to say that one element is terminated by the initiator of whatever comes next. It's a long standing feature request.
So you need to use look-ahead to do what you want. That means you need to use dfdl:lengthKind="pattern" to gather the data up to, but not including, the expected terminator. dfdl:lengthPattern="[^\[]{1,100}(?=\[)" That will match 1 to 100 non open-bracket characters followed by, but not including, a "[" character. The next element can just be a string that is lengthKind delimited with no terminator specified which means "to end of data". The usual problem with dfdl:lengthKind="pattern" is that a failure to match the pattern at all does *not* cause a parse error. Rather it just causes the length to be zero. If the data is type xs:string then zero is a legal length. So often a string element with lengthKind 'pattern' carries a dfdl:assert that the length is not zero, so as to cause a non-match to be a parse-error. This is needed often enough that a named simpleType "nzString" for "non-zero-length string" turns out to be convenient to have around. On Thu, Nov 14, 2024 at 10:16 AM Mark Kozak <mark.ko...@adeptus-cs.com> wrote: > Apologies for being unclear. Hopefully this helps. > > > > From that example I would want: > > <f1>AAA</f1> > > <f2>["bbb"["ccc"]]</f2><!-- literally all characters after the AAA as just > a string --> > > > > The number of nested brackets is unknown. A more complicated example could > be: > > AAA["bbb”["ccc”][“ddd”] > > Producing: > > <f1>AAA</f1> > > <f2>["bbb"["ccc"][“ddd”]]</f2> > > > > So all I really need is two elements which are the string before the first > separator (the left bracket) and literally everything else. > > > > I have a similar data stream that uses : as the separator. That example > might look like: > > > > AAA:”bbb”:”ccc” > > > > Again, the number of : separated strings is unbounded. So getting the > following would work: > > > > <f1>AAA</f1> > > <f2>"bbb":"ccc"</f2> > > > > I think both examples are really the same problem, with one using the [ as > a separator and the second using : > > I can use different schema solutions if they are in fact not the same > problem. > > > > > > > > > > *From:* Mike Beckerle <mbecke...@apache.org> > *Sent:* Thursday, November 14, 2024 9:51 AM > *To:* users@daffodil.apache.org > *Subject:* Re: Parsing text without an end terminator? > > > > I'm going to need more to go on than this. > > > > Can you provide (several) richer examples? It's not clear from this little > snippet what's even the terminator you were describing before. > > You started with ":" terminators, now we're looking at matched pairs of > brackets. How does one relate to the other? > > > > When you say the second element is "the rest of the line", what exactly do > you mean by that? Do you want: > > > > <f1>AAA</f1> > > <f2>["bbb"["ccc"]]</f2><!-- literally all characters after the AAA as just > a string --> > > > > Or something where the fields inside f2 are also parsed based on the > brackets? > > > > <f1>AAA</f1> > > <f2> > > <f3>bbb</f3> > > <f4>ccc</f4> > > </f2> > > > > > > On Thu, Nov 14, 2024 at 9:28 AM Mark Kozak <mark.ko...@adeptus-cs.com> > wrote: > > Here is an example of the type of data I need to parse. > > > > AAA["bbb”["ccc”]] > > > > The file has exactly one line with no terminator. Ideally, I would like to > get 2 elements. The first is the AAA, and the second is the rest of the > line. I can work with or without the first left bracket. > > > > > > *From:* Mark Kozak > *Sent:* Thursday, November 14, 2024 8:58 AM > *To:* users@daffodil.apache.org > *Subject:* RE: Parsing text without an end terminator? > > > > The final terminator is not allowed. > > > > *From:* Mike Beckerle <mbecke...@apache.org> > *Sent:* Thursday, November 14, 2024 8:55 AM > *To:* users@daffodil.apache.org > *Subject:* Re: Parsing text without an end terminator? > > > > Did you try using dfdl:separator ? > > > > To clarify, in your format is this final terminator optional, or is it not > allowed to be present? > > > > Alas, the dfdl:documentFinalTerminatorCanBeMissing property is not > implemented by Daffodil. (See https://daffodil.apache.org/unsupported/) > > It is suitable only for final terminators that are optional, but which > will be added when unparsing. > > > > > > On Wed, Nov 13, 2024 at 5:42 PM Mark Kozak <mark.ko...@adeptus-cs.com> > wrote: > > Hello Community, > > > > I have a text file that is delimited with a character like : > > The challenge I am having is that there is no delimiter at the end of the > file. I can get things to work if I add a new-line to the end and specify a > terminator to be the NL. I thought the documentFinalTerminatorCanBeMissing > property would be the solution, but setting that to yes did not appear to > make a difference. Are there any recommended workarounds? > > > > Thank for the support, > > > > Mark Kozak > > Director of Engineering > > Adeptus Cyber Solutions > > Adeptus-CS.com > > > >