Isn't that what PartitionRecord + RouteOnAttribute already does?
On Tue, May 1, 2018 at 4:58 PM, Otto Fowler <[email protected]> wrote: > Maybe a group and dispatch processor would help, > > JsonPath -> key > Group by key > key -> route > > > On May 1, 2018 at 16:30:28, Bryan Bende ([email protected]) wrote: > > I see, so the partition is helping if you want to route based on the > partition and is also giving you the attribute. > > Right now it is the bottleneck because it is one record per flow file, > but once you can stop splitting then presumably you could partition > one big flow into a flow file per state and it would be much better, > and you probably wouldn't need MergeRecord anymore. > > On Tue, May 1, 2018 at 4:21 PM, Juan Sequeiros <[email protected]> wrote: >> Thanks Bryan. >> I am partitioning off a key "/state[0]" >> And add a route called "STATE" for "/state[0]" >> Then MergeRecord using correlation attribute: STATE >> >> PartitionRecord also adds an attribute with the value it partitioned so >> I'll >> get an attribute STATE="FL" for example and after I Merge them I can >> ingest >> in to something like /data/$STATE/ >> >> I think I have to look closer my schema. >> >> On Tue, May 1, 2018 at 3:59 PM Bryan Bende <[email protected]> wrote: >>> >>> Unfortunately the current JSON record readers are not expecting a JSON >>> document per line because technically that is not a valid JSON >>> document itself. Your file would have to be represented as an array of >>> documents like [ doc1, doc2, doc3, ...] >>> >>> There is a PR up to support the per-line JSON document though: >>> https://github.com/apache/nifi/pull/2640 >>> >>> In both of your examples, if you are splitting before partitioning, >>> then what is the partitioning accomplishing? >>> >>> If you had the changes in the PR above then the goal would be to not >>> use SplitRecord... you would just send GetFile -> PartitionRecord -> >>> to whatever else. >>> >>> >>> On Tue, May 1, 2018 at 3:34 PM, Juan Sequeiros <[email protected]> >>> wrote: >>> > Hello all, >>> > >>> > I have one file on local disk with thousands of lines each representing >>> > valid JSON object. >>> > My flow is like this: >>> > >>> > GetFile > SplitText > PartitionRecord ( based on a key ) > MergeRecord >>> > > >>> > PutElasticSearchRecord >>> > >>> > This works well, however, I seem to bottleneck at PartitionRecord >>> > >>> > So I looked at using >>> > GetFile > ConvertRecord > SplitRecord > PartitionRecord >>> > >>> > But it seems to only convert the first line of the content from my >>> > GetFile. >>> > >>> > Am I missing something? >>> > >>> > I have a bottleneck that could very well be a system resource issue, >>> > but >>> > still, what is the best way to take a file with lines of JSON and >>> > convert >>> > them into records? I assume its through the record readers and writers, >>> > and >>> > then its implied that it converts it "object" based on the AvroSchema ( >>> > in >>> > my case)? >>> > >>> >
