Carlos,

Welcome to NiFi!  I believe the Kite dataset is currently the most direct,
built-in solution for writing Parquet files from NiFi.

I'm not an expert on Parquet, but I understand columnar formats like
Parquet and ORC are not easily written to in the incremental, streaming
fashion that NiFi excels at (I hope writing this will prompt expert
correction).  Other alternatives typically involve NiFi writing to more
stream-friendly data stores or formats directly, then running periodic jobs
to build Parquet data sets.  Hive, Drill, and similar tools can do this.

You are certainly not alone in wanting better Parquet support, there is at
least one JIRA ticket for it as well:

Add processors for Google Cloud Storage Fetch/Put/Delete
https://issues.apache.org/jira/browse/NIFI-2725

You might want to chime in with some details of your use case, or create a
new ticket if that's not a fit for you.

Thanks,

James

On Mon, Feb 13, 2017 at 3:13 PM, Carlos Paradis <[email protected]> wrote:

> Hi,
>
> Our group has recently started trying to prototype a setup of
> Hadoop+Spark+NiFi+Parquet and I have been having trouble finding any
> documentation other than a scarce discussion on using Kite as a workaround
> to integrate NiFi and Parquet.
>
> Are there any future plans for this integration from NiFi or anyone would
> be able to give me some insight in which scenario this workaround would
> (not) be worthwhile and alternatives?
>
> The most recent discussion
> <http://apache-nifi-developer-list.39713.n7.nabble.com/parquet-format-td10145.html>
> I found in this list dates from May 11, 2016. I also saw some interest in
> doing this on Stackoverflow here
> <http://stackoverflow.com/questions/37149331/apache-nifi-hdfs-parquet-format>,
> and here
> <http://stackoverflow.com/questions/37165764/convert-incoming-message-to-parquet-format>
> .
>
> Thanks,
>
> --
> Carlos Paradis
> http://carlosparadis.com <http://carlosandrade.co>
>

Reply via email to