Carlos, Welcome to NiFi! I believe the Kite dataset is currently the most direct, built-in solution for writing Parquet files from NiFi.
I'm not an expert on Parquet, but I understand columnar formats like Parquet and ORC are not easily written to in the incremental, streaming fashion that NiFi excels at (I hope writing this will prompt expert correction). Other alternatives typically involve NiFi writing to more stream-friendly data stores or formats directly, then running periodic jobs to build Parquet data sets. Hive, Drill, and similar tools can do this. You are certainly not alone in wanting better Parquet support, there is at least one JIRA ticket for it as well: Add processors for Google Cloud Storage Fetch/Put/Delete https://issues.apache.org/jira/browse/NIFI-2725 You might want to chime in with some details of your use case, or create a new ticket if that's not a fit for you. Thanks, James On Mon, Feb 13, 2017 at 3:13 PM, Carlos Paradis <[email protected]> wrote: > Hi, > > Our group has recently started trying to prototype a setup of > Hadoop+Spark+NiFi+Parquet and I have been having trouble finding any > documentation other than a scarce discussion on using Kite as a workaround > to integrate NiFi and Parquet. > > Are there any future plans for this integration from NiFi or anyone would > be able to give me some insight in which scenario this workaround would > (not) be worthwhile and alternatives? > > The most recent discussion > <http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-td10145.html> > I found in this list dates from May 11, 2016. I also saw some interest in > doing this on Stackoverflow here > <http://stackoverflow.com/questions/37149331/apache-nifi-hdfs-parquet-format>, > and here > <http://stackoverflow.com/questions/37165764/convert-incoming-message-to-parquet-format> > . > > Thanks, > > -- > Carlos Paradis > http://carlosparadis.com <http://carlosandrade.co> >
