Re: Data performance with FlowFile Repo's RocksDB

Matt Burgess Thu, 10 Sep 2020 08:00:02 -0700

You can use a JsonTreeReader set to Infer Schema and use that in 
JoltTransformRecord. But if your payload is one big JSON object (rather than a 
top-level array of JSON objects), then you only have one record and should 
stick to JoltTransformJson. If you do have an array, JoltTransformJson will 
still read the whole thing into memory where JoltTransformRecord will process 
each element individually.


You may be able to use a Jolt transform to do the flattening but you’d need to 
know the structure of the JSON in order to match the various levels correctly.

Regards,
Matt

> On Sep 10, 2020, at 10:41 AM, Ryan Hendrickson 
> <ryan.andrew.hendrick...@gmail.com> wrote:
> 
> 
> Hey Joe,
>    Right now I'm using an InputPort -> JoltTransformJSON -> Custom 
> FlattenJsonArray -> DistributeLoad -> PutElasticHTTP  on a 8 core 64GB of ram 
> box.
> 
>    I did see there is a JoltTransformRecord, but my rudimentary information 
> on the Record processing is that you need a pre-defined well-known schema for 
> the records.  What happens if you don't know the whole schema?
> 
> Thanks,
> Ryan
> 
>> On Thu, Sep 10, 2020 at 10:33 AM Joe Witt <joe.w...@gmail.com> wrote:
>> Ryan
>> 
>> By far the largest performance relevant activity is flow design itself.  As 
>> a last resort I'd look at repo changes.
>> 
>> Are you using the record processors?  Does your data arrive in batches?
>> 
>> Thanks
>> 
>>> On Thu, Sep 10, 2020 at 7:27 AM Ryan Hendrickson 
>>> <ryan.andrew.hendrick...@gmail.com> wrote:
>>> Hi all,
>>>    I've got a NiFi running with a lot of small JSON files and I'm trying to 
>>> squeeze the most performance out of it.
>>> 
>>>    I recently saw the new RocksDB FlowFile Repo 
>>> (https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#rocksdb-flowfile-repository)
>>>  and was wondering what kind, if any, performance gains we could expect out 
>>> of it.
>>> 
>>> Thanks,
>>> Ryan

Re: Data performance with FlowFile Repo's RocksDB

Reply via email to