Awesome analysis/input Thad and great references. Reading through now. The flagging of the lack of stream orientation to the library is great.
Somewhat related we've pondered adding an annotation that can be placed on processors so that those which are able to operate on input and output streams without loading full objects in memory get some visual flag/indicator in the UI. Idea being it would help dataflow managers to at least realize what they're doing can create memory congestion/scalability issues. What do you think of that idea? On Fri, Apr 8, 2016 at 10:03 PM, Thad Guidry <[email protected]> wrote: > Frank's work utilizes the Jolt spec(Apache 2 license), which is a great way > to handle JsonToJson transforms in my opinion. > > Jolt is not a good fit for Process or Rules, (Use Groovy or Java, etc), but > transforming Json in a great declarative way with Jolt beats the pants off > of anything else out there. Although its not stream based, and can consume > memory when your Json payload size is huge, like 300mb json files, etc, but > fine for most Json payloads in the wild. > > "Two things to be aware of : > > Jolt is not "stream" based, so if you have a very large Json document to > transform you need to have enough memory to hold it. > The transform process will create and discard a lot of objects, so the > garbage collector will have work to do. > " > > A few more details about how it can be used are mentioned on its official > page here: > http://bazaarvoice.github.io/jolt/ > > A demo of Jolt to see how you can transform Json to Json (click the > Transform button): > http://jolt-demo.appspot.com/#ritwickgupta > > Here's the rough performance of Jolt in 2013 where an 80k json file is > shifted in about 5 secs. (authors notes on this slide are interesting), : > https://docs.google.com/presentation/d/1sAiuiFC4Lzz4-064sg1p8EQt2ev0o442MfEbvrpD1ls/edit#slide=id.g9ac79e71_01 > > Thad > +ThadGuidry
