Good point. I just think that Parquet and ORC are important targets, just as relational/JDBC stores are.
On Tuesday, March 22, 2016, Tony Kurc <[email protected]> wrote: > Interesting question. A couple discussion points: If we start doing a > processor for each of these conversions, it may become unwieldy (P(x,2) > processors, where x is number of data formats?) I'd say maybe a more > general ConvertFormat processor may be appropriate, but then configuration > and code complexity may suffer. If there is a canonical internal data form > and a bunch (2*x) of convertXtocanonical, and convertcanonicaltoX > processors, the flow could get complex and the extra transform could be > expensive. > On Mar 21, 2016 9:39 PM, "Dmitry Goldenberg" <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Since NiFi has ConvertJsonToAvro and ConvertCsvToAvro processors, would >> it make sense to add a feature request for a ConvertJsonToParquet processor >> and a ConvertCsvToParquet processor? >> >> - Dmitry >> >> On Mon, Mar 21, 2016 at 9:23 PM, Matt Burgess <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >> >>> Edmon, >>> >>> NIFI-1663 [1] was created to add ORC support to NiFi. If you have a >>> target dataset that has been created with Parquet format, I think you can >>> use ConvertCSVtoAvro then StoreInKiteDataset to get flow files in Parquet >>> format into Hive, HDFS, etc. Others in the community know a lot more about >>> the StoreInKiteDataset processor than I do. >>> >>> Regards, >>> Matt >>> >>> [1] https://issues.apache.org/jira/browse/NIFI-1663 >>> >>> On Mon, Mar 21, 2016 at 8:25 PM, Edmon Begoli <[email protected] >>> <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: >>> >>>> >>>> Is there a way to do straight CSV(PSV) to Parquet or ORC conversion >>>> via Nifi, or do I always need to push the data through some of the >>>> "data engines" - Drill, Spark, Hive, etc.? >>>> >>>> >>>> >>>> >>> >>
