Thanks Bryan. I'll give the Hadoop client a try. On Wed, Aug 15, 2018 at 7:51 AM, Bryan Bende <[email protected]> wrote:
> I think there is a good chance that installing the Hadoop client would > solve the issue, but I can't say for sure since I don't have a Windows > machine to test. > > The processor depends on the Apache Parquet Java client library which > depends on Apache Hadoop client [1], and the Hadoop client has a > limitation on Windows where it requires something additional. > > [1] https://github.com/apache/parquet-mr/blob/master/ > parquet-avro/pom.xml#L62-L65 > > > > On Wed, Aug 15, 2018 at 10:16 AM, scott <[email protected]> wrote: > > If I install a Hadoop client on my NiFi host, would I be able to get past > > this error? > > I don't understand why this processor depends on Hadoop. Other projects > like > > Drill and Spark don't have such a dependency to be able to write Parquet > > files. > > > > On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella > > <[email protected]> wrote: > >> > >> It's a warning. You can ignore that. > >> > >> On Tue, 14 Aug 2018 at 18:53 Bryan Bende <[email protected]> wrote: > >>> > >>> Scott, > >>> > >>> Sorry I did not realize the Hadoop client would be looking for this > >>> winutils.exe when running on Windows. > >>> > >>> On linux and MacOS you don't need anything external installed outside > >>> of NiFi so I wasn't expecting this. > >>> > >>> Not sure if there is any other good option here regarding Parquet. > >>> > >>> Thanks, > >>> > >>> Bryan > >>> > >>> > >>> On Tue, Aug 14, 2018 at 5:31 PM, scott <[email protected]> wrote: > >>> > Hi Bryan, > >>> > I'm fine if I have to trick the API, but don't I still need Hadoop > >>> > installed > >>> > somewhere? After creating the core-site.xml as you described, I get > the > >>> > following errors: > >>> > > >>> > Failed to locate the winutils binary in the hadoop binary path > >>> > IOException: Could not locate executable null\bin\winutils.exe in the > >>> > Hadoop > >>> > binaries > >>> > Unable to load native-hadoop library for your platform... using > >>> > builtin-java > >>> > classes where applicable > >>> > Failed to write due to java.io.IOException: No FileSystem for scheme > >>> > > >>> > BTW, I'm using NiFi version 1.5 > >>> > > >>> > Thanks, > >>> > Scott > >>> > > >>> > > >>> > On Tue, Aug 14, 2018 at 12:44 PM, Bryan Bende <[email protected]> > wrote: > >>> >> > >>> >> Scott, > >>> >> > >>> >> Unfortunately the Parquet API itself is tied to the Hadoop > Filesystem > >>> >> object which is why NiFi can't read and write Parquet directly to > flow > >>> >> files (i.e. they don't provide a way to read/write to/from Java > input > >>> >> and output streams). > >>> >> > >>> >> The best you can do is trick the Hadoop API into using the local > >>> >> file-system by creating a core-site.xml with the following: > >>> >> > >>> >> <configuration> > >>> >> <property> > >>> >> <name>fs.defaultFS</name> > >>> >> <value>file:///</value> > >>> >> </property> > >>> >> </configuration> > >>> >> > >>> >> That will make PutParquet or FetchParquet work with your local > >>> >> file-system. > >>> >> > >>> >> Thanks, > >>> >> > >>> >> Bryan > >>> >> > >>> >> > >>> >> On Tue, Aug 14, 2018 at 3:22 PM, scott <[email protected]> wrote: > >>> >> > Hello NiFi community, > >>> >> > Is there a simple way to read CSV files and write them out as > >>> >> > Parquet > >>> >> > files > >>> >> > without Hadoop? I run NiFi on Windows and don't have access to a > >>> >> > Hadoop > >>> >> > environment. I'm trying to write the output of my ETL in a > >>> >> > compressed > >>> >> > and > >>> >> > still query-able format. Is there something I should be using > >>> >> > instead of > >>> >> > Parquet? > >>> >> > > >>> >> > Thanks for your time, > >>> >> > Scott > >>> > > >>> > > > > > >
