RE: Parsing text without an end terminator?

Mark Kozak Thu, 14 Nov 2024 07:15:52 -0800

Apologies for being unclear. Hopefully this helps.


>From that example I would want:

<f1>AAA</f1>

<f2>["bbb"["ccc"]]</f2><!-- literally all characters after the AAA as just a 
string -->

 

The number of nested brackets is unknown. A more complicated example could be:

AAA["bbb”["ccc”][“ddd”]

Producing:

<f1>AAA</f1>

<f2>["bbb"["ccc"][“ddd”]]</f2>

 

So all I really need is two elements which are the string before the first 
separator (the left bracket) and literally everything else.

 

I have a similar data stream that uses : as the separator. That example might 
look like:

 

AAA:”bbb”:”ccc”

 

Again, the number of : separated strings is unbounded. So getting  the 
following would work:

 

<f1>AAA</f1>

<f2>"bbb":"ccc"</f2>

 

I think both examples are really the same problem, with one using the [ as a 
separator and the second using :

I can use different schema solutions if they are in fact not the same problem.

 

 

 

 

From: Mike Beckerle <mbecke...@apache.org> 
Sent: Thursday, November 14, 2024 9:51 AM
To: users@daffodil.apache.org
Subject: Re: Parsing text without an end terminator?

 

I'm going to need more to go on than this. 

 

Can you provide (several) richer examples? It's not clear from this little 
snippet what's even the terminator you were describing before.

You started with ":" terminators, now we're looking at matched pairs of 
brackets. How does one relate to the other?

 

When you say the second element is "the rest of the line", what exactly do you 
mean by that? Do you want:

 

<f1>AAA</f1>

<f2>["bbb"["ccc"]]</f2><!-- literally all characters after the AAA as just a 
string -->

 

Or something where the fields inside f2 are also parsed based on the brackets?

 

<f1>AAA</f1>

<f2>

  <f3>bbb</f3>

  <f4>ccc</f4>

</f2>

 

 

On Thu, Nov 14, 2024 at 9:28 AM Mark Kozak <mark.ko...@adeptus-cs.com 
<mailto:mark.ko...@adeptus-cs.com> > wrote:

Here is an example of the type of data I need to parse. 

 

AAA["bbb”["ccc”]]

 

The file has exactly one line with no terminator. Ideally, I would like to get 
2 elements. The first is the AAA, and the second is the rest of the line. I can 
work with or without the first left bracket.

 

 

From: Mark Kozak 
Sent: Thursday, November 14, 2024 8:58 AM
To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> 
Subject: RE: Parsing text without an end terminator?

 

The final terminator is not allowed.

 

From: Mike Beckerle <mbecke...@apache.org <mailto:mbecke...@apache.org> > 
Sent: Thursday, November 14, 2024 8:55 AM
To: users@daffodil.apache.org <mailto:users@daffodil.apache.org> 
Subject: Re: Parsing text without an end terminator?

 

Did you try using dfdl:separator ? 

 

To clarify, in your format is this final terminator optional, or is it not 
allowed to be present? 

 

Alas, the dfdl:documentFinalTerminatorCanBeMissing property is not implemented 
by Daffodil. (See https://daffodil.apache.org/unsupported/)

It is suitable only for final terminators that are optional, but which will be 
added when unparsing. 

 

 

On Wed, Nov 13, 2024 at 5:42 PM Mark Kozak <mark.ko...@adeptus-cs.com 
<mailto:mark.ko...@adeptus-cs.com> > wrote:

Hello Community,

 

I have a text file that is delimited with a character like :

The challenge I am having is that there is no delimiter at the end of the file. 
I can get things to work if I add a new-line to the end and specify a 
terminator to be the NL. I thought the documentFinalTerminatorCanBeMissing 
property would be the solution, but setting that to yes did not appear to make 
a difference. Are there any recommended workarounds?

 

Thank for the support,

 

Mark Kozak

Director of Engineering

Adeptus Cyber Solutions

Adeptus-CS.com

smime.p7s
Description: S/MIME cryptographic signature

RE: Parsing text without an end terminator?

Reply via email to