I think there is a good chance that installing the Hadoop client would solve the issue, but I can't say for sure since I don't have a Windows machine to test.
The processor depends on the Apache Parquet Java client library which depends on Apache Hadoop client [1], and the Hadoop client has a limitation on Windows where it requires something additional. [1] https://github.com/apache/parquet-mr/blob/master/parquet-avro/pom.xml#L62-L65 On Wed, Aug 15, 2018 at 10:16 AM, scott <tcots8...@gmail.com> wrote: > If I install a Hadoop client on my NiFi host, would I be able to get past > this error? > I don't understand why this processor depends on Hadoop. Other projects like > Drill and Spark don't have such a dependency to be able to write Parquet > files. > > On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella > <gardellajuanpa...@gmail.com> wrote: >> >> It's a warning. You can ignore that. >> >> On Tue, 14 Aug 2018 at 18:53 Bryan Bende <bbe...@gmail.com> wrote: >>> >>> Scott, >>> >>> Sorry I did not realize the Hadoop client would be looking for this >>> winutils.exe when running on Windows. >>> >>> On linux and MacOS you don't need anything external installed outside >>> of NiFi so I wasn't expecting this. >>> >>> Not sure if there is any other good option here regarding Parquet. >>> >>> Thanks, >>> >>> Bryan >>> >>> >>> On Tue, Aug 14, 2018 at 5:31 PM, scott <tcots8...@gmail.com> wrote: >>> > Hi Bryan, >>> > I'm fine if I have to trick the API, but don't I still need Hadoop >>> > installed >>> > somewhere? After creating the core-site.xml as you described, I get the >>> > following errors: >>> > >>> > Failed to locate the winutils binary in the hadoop binary path >>> > IOException: Could not locate executable null\bin\winutils.exe in the >>> > Hadoop >>> > binaries >>> > Unable to load native-hadoop library for your platform... using >>> > builtin-java >>> > classes where applicable >>> > Failed to write due to java.io.IOException: No FileSystem for scheme >>> > >>> > BTW, I'm using NiFi version 1.5 >>> > >>> > Thanks, >>> > Scott >>> > >>> > >>> > On Tue, Aug 14, 2018 at 12:44 PM, Bryan Bende <bbe...@gmail.com> wrote: >>> >> >>> >> Scott, >>> >> >>> >> Unfortunately the Parquet API itself is tied to the Hadoop Filesystem >>> >> object which is why NiFi can't read and write Parquet directly to flow >>> >> files (i.e. they don't provide a way to read/write to/from Java input >>> >> and output streams). >>> >> >>> >> The best you can do is trick the Hadoop API into using the local >>> >> file-system by creating a core-site.xml with the following: >>> >> >>> >> <configuration> >>> >> <property> >>> >> <name>fs.defaultFS</name> >>> >> <value>file:///</value> >>> >> </property> >>> >> </configuration> >>> >> >>> >> That will make PutParquet or FetchParquet work with your local >>> >> file-system. >>> >> >>> >> Thanks, >>> >> >>> >> Bryan >>> >> >>> >> >>> >> On Tue, Aug 14, 2018 at 3:22 PM, scott <tcots8...@gmail.com> wrote: >>> >> > Hello NiFi community, >>> >> > Is there a simple way to read CSV files and write them out as >>> >> > Parquet >>> >> > files >>> >> > without Hadoop? I run NiFi on Windows and don't have access to a >>> >> > Hadoop >>> >> > environment. I'm trying to write the output of my ETL in a >>> >> > compressed >>> >> > and >>> >> > still query-able format. Is there something I should be using >>> >> > instead of >>> >> > Parquet? >>> >> > >>> >> > Thanks for your time, >>> >> > Scott >>> > >>> > > >