Re: CSV/delimited to Parquet conversion via Nifi

Tony Kurc Tue, 22 Mar 2016 18:51:11 -0700

Interesting question. A couple discussion points: If we start doing a
processor for each of these conversions, it may become unwieldy (P(x,2)
processors, where x is number of data formats?) I'd say maybe a more
general ConvertFormat processor may be appropriate, but then configuration
and code complexity may suffer. If there is a canonical internal data form
and a bunch (2*x) of convertXtocanonical, and convertcanonicaltoX
processors, the flow could get complex and the extra transform could be
expensive.
On Mar 21, 2016 9:39 PM, "Dmitry Goldenberg" <[email protected]>
wrote:


> Since NiFi has ConvertJsonToAvro and ConvertCsvToAvro processors, would it
> make sense to add a feature request for a ConvertJsonToParquet processor
> and a ConvertCsvToParquet processor?
>
> - Dmitry
>
> On Mon, Mar 21, 2016 at 9:23 PM, Matt Burgess <[email protected]> wrote:
>
>> Edmon,
>>
>> NIFI-1663 [1] was created to add ORC support to NiFi. If you have a
>> target dataset that has been created with Parquet format, I think you can
>> use ConvertCSVtoAvro then StoreInKiteDataset to get flow files in Parquet
>> format into Hive, HDFS, etc. Others in the community know a lot more about
>> the StoreInKiteDataset processor than I do.
>>
>> Regards,
>> Matt
>>
>> [1] https://issues.apache.org/jira/browse/NIFI-1663
>>
>> On Mon, Mar 21, 2016 at 8:25 PM, Edmon Begoli <[email protected]> wrote:
>>
>>>
>>> Is there a way to do straight CSV(PSV) to Parquet or ORC conversion via
>>> Nifi, or do I always need to push the data through some of the
>>> "data engines" - Drill, Spark, Hive, etc.?
>>>
>>>
>>>
>>>
>>
>

Re: CSV/delimited to Parquet conversion via Nifi

Reply via email to