Scott, You can also try Docker on Windows. Something like this should work:
docker run -d --name nifi-test -v C:/nifi_temp:/opt/data_output -p 8080:8080 apache/nifi:latest I don't have Windows either, but Docker seems to work fine for my colleagues that have to use it on Windows. That should bridge C:\nifi_temp and /opt/data_output between host and container and remap localhost:8080 to the container on 8080 so you don't have to mess with a Hadoop client just to try out some Parquet stuff. Mike On Wed, Aug 15, 2018 at 11:20 AM scott <[email protected]> wrote: > Thanks Bryan. I'll give the Hadoop client a try. > > On Wed, Aug 15, 2018 at 7:51 AM, Bryan Bende <[email protected]> wrote: > >> I think there is a good chance that installing the Hadoop client would >> solve the issue, but I can't say for sure since I don't have a Windows >> machine to test. >> >> The processor depends on the Apache Parquet Java client library which >> depends on Apache Hadoop client [1], and the Hadoop client has a >> limitation on Windows where it requires something additional. >> >> [1] >> https://github.com/apache/parquet-mr/blob/master/parquet-avro/pom.xml#L62-L65 >> >> >> >> On Wed, Aug 15, 2018 at 10:16 AM, scott <[email protected]> wrote: >> > If I install a Hadoop client on my NiFi host, would I be able to get >> past >> > this error? >> > I don't understand why this processor depends on Hadoop. Other projects >> like >> > Drill and Spark don't have such a dependency to be able to write Parquet >> > files. >> > >> > On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella >> > <[email protected]> wrote: >> >> >> >> It's a warning. You can ignore that. >> >> >> >> On Tue, 14 Aug 2018 at 18:53 Bryan Bende <[email protected]> wrote: >> >>> >> >>> Scott, >> >>> >> >>> Sorry I did not realize the Hadoop client would be looking for this >> >>> winutils.exe when running on Windows. >> >>> >> >>> On linux and MacOS you don't need anything external installed outside >> >>> of NiFi so I wasn't expecting this. >> >>> >> >>> Not sure if there is any other good option here regarding Parquet. >> >>> >> >>> Thanks, >> >>> >> >>> Bryan >> >>> >> >>> >> >>> On Tue, Aug 14, 2018 at 5:31 PM, scott <[email protected]> wrote: >> >>> > Hi Bryan, >> >>> > I'm fine if I have to trick the API, but don't I still need Hadoop >> >>> > installed >> >>> > somewhere? After creating the core-site.xml as you described, I get >> the >> >>> > following errors: >> >>> > >> >>> > Failed to locate the winutils binary in the hadoop binary path >> >>> > IOException: Could not locate executable null\bin\winutils.exe in >> the >> >>> > Hadoop >> >>> > binaries >> >>> > Unable to load native-hadoop library for your platform... using >> >>> > builtin-java >> >>> > classes where applicable >> >>> > Failed to write due to java.io.IOException: No FileSystem for scheme >> >>> > >> >>> > BTW, I'm using NiFi version 1.5 >> >>> > >> >>> > Thanks, >> >>> > Scott >> >>> > >> >>> > >> >>> > On Tue, Aug 14, 2018 at 12:44 PM, Bryan Bende <[email protected]> >> wrote: >> >>> >> >> >>> >> Scott, >> >>> >> >> >>> >> Unfortunately the Parquet API itself is tied to the Hadoop >> Filesystem >> >>> >> object which is why NiFi can't read and write Parquet directly to >> flow >> >>> >> files (i.e. they don't provide a way to read/write to/from Java >> input >> >>> >> and output streams). >> >>> >> >> >>> >> The best you can do is trick the Hadoop API into using the local >> >>> >> file-system by creating a core-site.xml with the following: >> >>> >> >> >>> >> <configuration> >> >>> >> <property> >> >>> >> <name>fs.defaultFS</name> >> >>> >> <value>file:///</value> >> >>> >> </property> >> >>> >> </configuration> >> >>> >> >> >>> >> That will make PutParquet or FetchParquet work with your local >> >>> >> file-system. >> >>> >> >> >>> >> Thanks, >> >>> >> >> >>> >> Bryan >> >>> >> >> >>> >> >> >>> >> On Tue, Aug 14, 2018 at 3:22 PM, scott <[email protected]> >> wrote: >> >>> >> > Hello NiFi community, >> >>> >> > Is there a simple way to read CSV files and write them out as >> >>> >> > Parquet >> >>> >> > files >> >>> >> > without Hadoop? I run NiFi on Windows and don't have access to a >> >>> >> > Hadoop >> >>> >> > environment. I'm trying to write the output of my ETL in a >> >>> >> > compressed >> >>> >> > and >> >>> >> > still query-able format. Is there something I should be using >> >>> >> > instead of >> >>> >> > Parquet? >> >>> >> > >> >>> >> > Thanks for your time, >> >>> >> > Scott >> >>> > >> >>> > >> > >> > >> > >
