Perfect, thanks!
On Tuesday, November 22, 2016 8:23 AM, Joe Witt <[email protected]> wrote:
Hello
Collecting the data:
- Use Consume Kafka
Routing each event based on its content:
- There is no out of the box processor to route Avro just yet. We
will hopefully tackle that soon. In the mean time you could covert
the Avro to JSON and use RouteJson. If you want the data back in avro
you can convert back to avro. So it would be something like
- ConvertAvroToJson
- EvaluateJsonPath
- ConvertJsonToAvro
At this point you'll have a flow file attribute which tells you about
the binning you want. This approach will have the performance hit of
the conversions so if you can avoid going back to Avro that will help
or if you use the ExecuteScript or related processors you can whip up
some Groovy to deal with the Avro directly. You have lots of options
there.
Merging events together to sympathize with HDFS block sizes:
- Use MergeContent. It supports binning data together using a flow
file attribute such as the one you would have pulled in the previous
section above.
Delivery to HDFS
- Use PutHDFS
Thanks
Joe
On Tue, Nov 22, 2016 at 7:59 AM, E Worthy <[email protected]> wrote:
> Is this not possible with this product?
>
>
> On Sunday, November 20, 2016 8:50 PM, E Worthy <[email protected]>
> wrote:
>
>
> Hello,
>
> I have Avro data coming in from Kafka and I need to place this data into the
> proper HDFS partition based on values in each row of data. Is this possible
> in Nifi? I'm looking at the PutHDFS, ConvertAvroToORC, and expressions. Do I
> somehow set a variable using an expression for each row and use that in the
> PutHDFS?
>
> Thanks for any help,
> Eric
>
>