Re: Unparsing a 10 GB JSON infoset

Beckerle, Mike Thu, 23 May 2019 04:58:58 -0700

Claude,

I am afraid you will be disappointed that today daffodil 2.3.0 cannot handle 
data files this large in one shot.


A unlimited size stream of smaller data items can be handled, but not a single 
large file being parsed into a single json root.

Lots of people have this requirement.

We do have ambitions in the roadmap to provide a more incremental parse and 
unparse for large files when the dfdl schema allows it. On the parse side this 
would be more like XML SAX or STAX APIs (you can still create json as Infoset, 
this is just the API style). The unparse side already has a streaming API, but 
the implementation doesnt provide the streaming behavior except in very very 
simple schemas.

If you are interested in becoming a Daffodil developer to implement what you 
need, we are always looking for contributors to dig in and would provide lots 
of initial assistance.

-Mike Beckerle
Tresys


From: Claude Mamo
Sent: Thursday, May 23, 2:21 AM
Subject: Unparsing a 10 GB JSON infoset
To: [email protected]


Hello all,

I'm testing Daffodil's capability to handle large files. The parsing is done in 
chunks but the unparsing happens in one go. For the latter, the following error 
happens after about 100 MB is written out to disk: "OutOfMemoryError: GC 
Overhead Limit Exceeded". Should unparsing happen in chunks as well or could 
this be a memory leak? The DFDL schema isn't particularly complex from my 
perspective and the validation is very basic (mostly maxOccurs=1 for a few 
elements).

Claude

Re: Unparsing a 10 GB JSON infoset

Reply via email to