I don't think you have to install Hadoop on Windows in order to get it to work, just the winutils.exe and I guess put it wherever it's looking for it (that might be configurable via an environment variable or something).
There are pre-built binaries [1] for various versions of Hadoop, even though you'll be writing to a local file system you'll want to match the version of winutils.exe with the version of Hadoop (usually 2.7.3 for slightly older NiFi versions or 3.0.0 for the latest version(s) I think) for best results. Regards, Matt [1] https://github.com/steveloughran/winutils On Wed, Aug 15, 2018 at 3:23 PM scott <[email protected]> wrote: > > Just tested in my Centos VM, worked like a charm without Hadoop. I'll open a > Jira bug on PutParquet, doesn't seem to run on Windows. > Still not sure what I can do. Converting our production Windows NiFi install > to Docker would be a major effort. > Has anyone heard of a Parquet writer tool I can download and call from NiFi? > > On Wed, Aug 15, 2018 at 12:01 PM, Mike Thomsen <[email protected]> wrote: >> >> > Mike, that's a good tip. I'll test that, but unfortunately, I've already >> > committed to Windows. >> >> You can run both Docker and the standard NiFi docker image on Windows. >> >> On Wed, Aug 15, 2018 at 2:52 PM scott <[email protected]> wrote: >>> >>> Mike, that's a good tip. I'll test that, but unfortunately, I've already >>> committed to Windows. >>> What about a script? Is there some tool you know of that can just be called >>> by NiFi to convert an input CSV file to a Parquet file? >>> >>> On Wed, Aug 15, 2018 at 8:32 AM, Mike Thomsen <[email protected]> >>> wrote: >>>> >>>> Scott, >>>> >>>> You can also try Docker on Windows. Something like this should work: >>>> >>>> docker run -d --name nifi-test -v C:/nifi_temp:/opt/data_output -p >>>> 8080:8080 apache/nifi:latest >>>> >>>> I don't have Windows either, but Docker seems to work fine for my >>>> colleagues that have to use it on Windows. That should bridge C:\nifi_temp >>>> and /opt/data_output between host and container and remap localhost:8080 >>>> to the container on 8080 so you don't have to mess with a Hadoop client >>>> just to try out some Parquet stuff. >>>> >>>> Mike >>>> >>>> On Wed, Aug 15, 2018 at 11:20 AM scott <[email protected]> wrote: >>>>> >>>>> Thanks Bryan. I'll give the Hadoop client a try. >>>>> >>>>> On Wed, Aug 15, 2018 at 7:51 AM, Bryan Bende <[email protected]> wrote: >>>>>> >>>>>> I think there is a good chance that installing the Hadoop client would >>>>>> solve the issue, but I can't say for sure since I don't have a Windows >>>>>> machine to test. >>>>>> >>>>>> The processor depends on the Apache Parquet Java client library which >>>>>> depends on Apache Hadoop client [1], and the Hadoop client has a >>>>>> limitation on Windows where it requires something additional. >>>>>> >>>>>> [1] >>>>>> https://github.com/apache/parquet-mr/blob/master/parquet-avro/pom.xml#L62-L65 >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Aug 15, 2018 at 10:16 AM, scott <[email protected]> wrote: >>>>>> > If I install a Hadoop client on my NiFi host, would I be able to get >>>>>> > past >>>>>> > this error? >>>>>> > I don't understand why this processor depends on Hadoop. Other >>>>>> > projects like >>>>>> > Drill and Spark don't have such a dependency to be able to write >>>>>> > Parquet >>>>>> > files. >>>>>> > >>>>>> > On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella >>>>>> > <[email protected]> wrote: >>>>>> >> >>>>>> >> It's a warning. You can ignore that. >>>>>> >> >>>>>> >> On Tue, 14 Aug 2018 at 18:53 Bryan Bende <[email protected]> wrote: >>>>>> >>> >>>>>> >>> Scott, >>>>>> >>> >>>>>> >>> Sorry I did not realize the Hadoop client would be looking for this >>>>>> >>> winutils.exe when running on Windows. >>>>>> >>> >>>>>> >>> On linux and MacOS you don't need anything external installed outside >>>>>> >>> of NiFi so I wasn't expecting this. >>>>>> >>> >>>>>> >>> Not sure if there is any other good option here regarding Parquet. >>>>>> >>> >>>>>> >>> Thanks, >>>>>> >>> >>>>>> >>> Bryan >>>>>> >>> >>>>>> >>> >>>>>> >>> On Tue, Aug 14, 2018 at 5:31 PM, scott <[email protected]> wrote: >>>>>> >>> > Hi Bryan, >>>>>> >>> > I'm fine if I have to trick the API, but don't I still need Hadoop >>>>>> >>> > installed >>>>>> >>> > somewhere? After creating the core-site.xml as you described, I >>>>>> >>> > get the >>>>>> >>> > following errors: >>>>>> >>> > >>>>>> >>> > Failed to locate the winutils binary in the hadoop binary path >>>>>> >>> > IOException: Could not locate executable null\bin\winutils.exe in >>>>>> >>> > the >>>>>> >>> > Hadoop >>>>>> >>> > binaries >>>>>> >>> > Unable to load native-hadoop library for your platform... using >>>>>> >>> > builtin-java >>>>>> >>> > classes where applicable >>>>>> >>> > Failed to write due to java.io.IOException: No FileSystem for >>>>>> >>> > scheme >>>>>> >>> > >>>>>> >>> > BTW, I'm using NiFi version 1.5 >>>>>> >>> > >>>>>> >>> > Thanks, >>>>>> >>> > Scott >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > On Tue, Aug 14, 2018 at 12:44 PM, Bryan Bende <[email protected]> >>>>>> >>> > wrote: >>>>>> >>> >> >>>>>> >>> >> Scott, >>>>>> >>> >> >>>>>> >>> >> Unfortunately the Parquet API itself is tied to the Hadoop >>>>>> >>> >> Filesystem >>>>>> >>> >> object which is why NiFi can't read and write Parquet directly to >>>>>> >>> >> flow >>>>>> >>> >> files (i.e. they don't provide a way to read/write to/from Java >>>>>> >>> >> input >>>>>> >>> >> and output streams). >>>>>> >>> >> >>>>>> >>> >> The best you can do is trick the Hadoop API into using the local >>>>>> >>> >> file-system by creating a core-site.xml with the following: >>>>>> >>> >> >>>>>> >>> >> <configuration> >>>>>> >>> >> <property> >>>>>> >>> >> <name>fs.defaultFS</name> >>>>>> >>> >> <value>file:///</value> >>>>>> >>> >> </property> >>>>>> >>> >> </configuration> >>>>>> >>> >> >>>>>> >>> >> That will make PutParquet or FetchParquet work with your local >>>>>> >>> >> file-system. >>>>>> >>> >> >>>>>> >>> >> Thanks, >>>>>> >>> >> >>>>>> >>> >> Bryan >>>>>> >>> >> >>>>>> >>> >> >>>>>> >>> >> On Tue, Aug 14, 2018 at 3:22 PM, scott <[email protected]> >>>>>> >>> >> wrote: >>>>>> >>> >> > Hello NiFi community, >>>>>> >>> >> > Is there a simple way to read CSV files and write them out as >>>>>> >>> >> > Parquet >>>>>> >>> >> > files >>>>>> >>> >> > without Hadoop? I run NiFi on Windows and don't have access to a >>>>>> >>> >> > Hadoop >>>>>> >>> >> > environment. I'm trying to write the output of my ETL in a >>>>>> >>> >> > compressed >>>>>> >>> >> > and >>>>>> >>> >> > still query-able format. Is there something I should be using >>>>>> >>> >> > instead of >>>>>> >>> >> > Parquet? >>>>>> >>> >> > >>>>>> >>> >> > Thanks for your time, >>>>>> >>> >> > Scott >>>>>> >>> > >>>>>> >>> > >>>>>> > >>>>>> > >>>>> >>>>> >>> >
