Interesting question. A couple discussion points: If we start doing a processor for each of these conversions, it may become unwieldy (P(x,2) processors, where x is number of data formats?) I'd say maybe a more general ConvertFormat processor may be appropriate, but then configuration and code complexity may suffer. If there is a canonical internal data form and a bunch (2*x) of convertXtocanonical, and convertcanonicaltoX processors, the flow could get complex and the extra transform could be expensive. On Mar 21, 2016 9:39 PM, "Dmitry Goldenberg" <[email protected]> wrote:
> Since NiFi has ConvertJsonToAvro and ConvertCsvToAvro processors, would it > make sense to add a feature request for a ConvertJsonToParquet processor > and a ConvertCsvToParquet processor? > > - Dmitry > > On Mon, Mar 21, 2016 at 9:23 PM, Matt Burgess <[email protected]> wrote: > >> Edmon, >> >> NIFI-1663 [1] was created to add ORC support to NiFi. If you have a >> target dataset that has been created with Parquet format, I think you can >> use ConvertCSVtoAvro then StoreInKiteDataset to get flow files in Parquet >> format into Hive, HDFS, etc. Others in the community know a lot more about >> the StoreInKiteDataset processor than I do. >> >> Regards, >> Matt >> >> [1] https://issues.apache.org/jira/browse/NIFI-1663 >> >> On Mon, Mar 21, 2016 at 8:25 PM, Edmon Begoli <[email protected]> wrote: >> >>> >>> Is there a way to do straight CSV(PSV) to Parquet or ORC conversion via >>> Nifi, or do I always need to push the data through some of the >>> "data engines" - Drill, Spark, Hive, etc.? >>> >>> >>> >>> >> >
