Re: Am I doing this right? with regarding to records

Bryan Bende Tue, 01 May 2018 14:09:57 -0700

Isn't that what PartitionRecord + RouteOnAttribute already does?



On Tue, May 1, 2018 at 4:58 PM, Otto Fowler <[email protected]> wrote:
> Maybe a group and dispatch processor would help,
>
> JsonPath -> key
> Group by key
> key -> route
>
>
> On May 1, 2018 at 16:30:28, Bryan Bende ([email protected]) wrote:
>
> I see, so the partition is helping if you want to route based on the
> partition and is also giving you the attribute.
>
> Right now it is the bottleneck because it is one record per flow file,
> but once you can stop splitting then presumably you could partition
> one big flow into a flow file per state and it would be much better,
> and you probably wouldn't need MergeRecord anymore.
>
> On Tue, May 1, 2018 at 4:21 PM, Juan Sequeiros <[email protected]> wrote:
>> Thanks Bryan.
>> I am partitioning off a key "/state[0]"
>> And add a route called "STATE" for "/state[0]"
>> Then MergeRecord using correlation attribute: STATE
>>
>> PartitionRecord also adds an attribute with the value it partitioned so
>> I'll
>> get an attribute STATE="FL" for example and after I Merge them I can
>> ingest
>> in to something like /data/$STATE/
>>
>> I think I have to look closer my schema.
>>
>> On Tue, May 1, 2018 at 3:59 PM Bryan Bende <[email protected]> wrote:
>>>
>>> Unfortunately the current JSON record readers are not expecting a JSON
>>> document per line because technically that is not a valid JSON
>>> document itself. Your file would have to be represented as an array of
>>> documents like [ doc1, doc2, doc3, ...]
>>>
>>> There is a PR up to support the per-line JSON document though:
>>> https://github.com/apache/nifi/pull/2640
>>>
>>> In both of your examples, if you are splitting before partitioning,
>>> then what is the partitioning accomplishing?
>>>
>>> If you had the changes in the PR above then the goal would be to not
>>> use SplitRecord... you would just send GetFile -> PartitionRecord ->
>>> to whatever else.
>>>
>>>
>>> On Tue, May 1, 2018 at 3:34 PM, Juan Sequeiros <[email protected]>
>>> wrote:
>>> > Hello all,
>>> >
>>> > I have one file on local disk with thousands of lines each representing
>>> > valid JSON object.
>>> > My flow is like this:
>>> >
>>> > GetFile > SplitText > PartitionRecord ( based on a key ) > MergeRecord
>>> > >
>>> > PutElasticSearchRecord
>>> >
>>> > This works well, however, I seem to bottleneck at PartitionRecord
>>> >
>>> > So I looked at using
>>> > GetFile > ConvertRecord > SplitRecord > PartitionRecord
>>> >
>>> > But it seems to only convert the first line of the content from my
>>> > GetFile.
>>> >
>>> > Am I missing something?
>>> >
>>> > I have a bottleneck that could very well be a system resource issue,
>>> > but
>>> > still, what is the best way to take a file with lines of JSON and
>>> > convert
>>> > them into records? I assume its through the record readers and writers,
>>> > and
>>> > then its implied that it converts it "object" based on the AvroSchema (
>>> > in
>>> > my case)?
>>> >
>>> >

Re: Am I doing this right? with regarding to records

Reply via email to