Good point.

I just think that Parquet and ORC are important targets, just as
relational/JDBC stores are.

On Tuesday, March 22, 2016, Tony Kurc <[email protected]> wrote:

> Interesting question. A couple discussion points: If we start doing a
> processor for each of these conversions, it may become unwieldy (P(x,2)
> processors, where x is number of data formats?) I'd say maybe a more
> general ConvertFormat processor may be appropriate, but then configuration
> and code complexity may suffer. If there is a canonical internal data form
> and a bunch (2*x) of convertXtocanonical, and convertcanonicaltoX
> processors, the flow could get complex and the extra transform could be
> expensive.
> On Mar 21, 2016 9:39 PM, "Dmitry Goldenberg" <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>
>> Since NiFi has ConvertJsonToAvro and ConvertCsvToAvro processors, would
>> it make sense to add a feature request for a ConvertJsonToParquet processor
>> and a ConvertCsvToParquet processor?
>>
>> - Dmitry
>>
>> On Mon, Mar 21, 2016 at 9:23 PM, Matt Burgess <[email protected]
>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>
>>> Edmon,
>>>
>>> NIFI-1663 [1] was created to add ORC support to NiFi. If you have a
>>> target dataset that has been created with Parquet format, I think you can
>>> use ConvertCSVtoAvro then StoreInKiteDataset to get flow files in Parquet
>>> format into Hive, HDFS, etc. Others in the community know a lot more about
>>> the StoreInKiteDataset processor than I do.
>>>
>>> Regards,
>>> Matt
>>>
>>> [1] https://issues.apache.org/jira/browse/NIFI-1663
>>>
>>> On Mon, Mar 21, 2016 at 8:25 PM, Edmon Begoli <[email protected]
>>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote:
>>>
>>>>
>>>> Is there a way to do straight CSV(PSV) to Parquet or ORC conversion
>>>> via Nifi, or do I always need to push the data through some of the
>>>> "data engines" - Drill, Spark, Hive, etc.?
>>>>
>>>>
>>>>
>>>>
>>>
>>

Reply via email to