Just tested in my Centos VM, worked like a charm without Hadoop. I'll open a Jira bug on PutParquet, doesn't seem to run on Windows. Still not sure what I can do. Converting our production Windows NiFi install to Docker would be a major effort. Has anyone heard of a Parquet writer tool I can download and call from NiFi?
On Wed, Aug 15, 2018 at 12:01 PM, Mike Thomsen <[email protected]> wrote: > > Mike, that's a good tip. I'll test that, but unfortunately, I've already > committed to Windows. > > You can run both Docker and the standard NiFi docker image on Windows. > > On Wed, Aug 15, 2018 at 2:52 PM scott <[email protected]> wrote: > >> Mike, that's a good tip. I'll test that, but unfortunately, I've already >> committed to Windows. >> What about a script? Is there some tool you know of that can just be >> called by NiFi to convert an input CSV file to a Parquet file? >> >> On Wed, Aug 15, 2018 at 8:32 AM, Mike Thomsen <[email protected]> >> wrote: >> >>> Scott, >>> >>> You can also try Docker on Windows. Something like this should work: >>> >>> docker run -d --name nifi-test -v C:/nifi_temp:/opt/data_output -p >>> 8080:8080 apache/nifi:latest >>> >>> I don't have Windows either, but Docker seems to work fine for my >>> colleagues that have to use it on Windows. That should bridge C:\nifi_temp >>> and /opt/data_output between host and container and remap localhost:8080 to >>> the container on 8080 so you don't have to mess with a Hadoop client just >>> to try out some Parquet stuff. >>> >>> Mike >>> >>> On Wed, Aug 15, 2018 at 11:20 AM scott <[email protected]> wrote: >>> >>>> Thanks Bryan. I'll give the Hadoop client a try. >>>> >>>> On Wed, Aug 15, 2018 at 7:51 AM, Bryan Bende <[email protected]> wrote: >>>> >>>>> I think there is a good chance that installing the Hadoop client would >>>>> solve the issue, but I can't say for sure since I don't have a Windows >>>>> machine to test. >>>>> >>>>> The processor depends on the Apache Parquet Java client library which >>>>> depends on Apache Hadoop client [1], and the Hadoop client has a >>>>> limitation on Windows where it requires something additional. >>>>> >>>>> [1] https://github.com/apache/parquet-mr/blob/master/ >>>>> parquet-avro/pom.xml#L62-L65 >>>>> >>>>> >>>>> >>>>> On Wed, Aug 15, 2018 at 10:16 AM, scott <[email protected]> wrote: >>>>> > If I install a Hadoop client on my NiFi host, would I be able to get >>>>> past >>>>> > this error? >>>>> > I don't understand why this processor depends on Hadoop. Other >>>>> projects like >>>>> > Drill and Spark don't have such a dependency to be able to write >>>>> Parquet >>>>> > files. >>>>> > >>>>> > On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella >>>>> > <[email protected]> wrote: >>>>> >> >>>>> >> It's a warning. You can ignore that. >>>>> >> >>>>> >> On Tue, 14 Aug 2018 at 18:53 Bryan Bende <[email protected]> wrote: >>>>> >>> >>>>> >>> Scott, >>>>> >>> >>>>> >>> Sorry I did not realize the Hadoop client would be looking for this >>>>> >>> winutils.exe when running on Windows. >>>>> >>> >>>>> >>> On linux and MacOS you don't need anything external installed >>>>> outside >>>>> >>> of NiFi so I wasn't expecting this. >>>>> >>> >>>>> >>> Not sure if there is any other good option here regarding Parquet. >>>>> >>> >>>>> >>> Thanks, >>>>> >>> >>>>> >>> Bryan >>>>> >>> >>>>> >>> >>>>> >>> On Tue, Aug 14, 2018 at 5:31 PM, scott <[email protected]> >>>>> wrote: >>>>> >>> > Hi Bryan, >>>>> >>> > I'm fine if I have to trick the API, but don't I still need >>>>> Hadoop >>>>> >>> > installed >>>>> >>> > somewhere? After creating the core-site.xml as you described, I >>>>> get the >>>>> >>> > following errors: >>>>> >>> > >>>>> >>> > Failed to locate the winutils binary in the hadoop binary path >>>>> >>> > IOException: Could not locate executable null\bin\winutils.exe >>>>> in the >>>>> >>> > Hadoop >>>>> >>> > binaries >>>>> >>> > Unable to load native-hadoop library for your platform... using >>>>> >>> > builtin-java >>>>> >>> > classes where applicable >>>>> >>> > Failed to write due to java.io.IOException: No FileSystem for >>>>> scheme >>>>> >>> > >>>>> >>> > BTW, I'm using NiFi version 1.5 >>>>> >>> > >>>>> >>> > Thanks, >>>>> >>> > Scott >>>>> >>> > >>>>> >>> > >>>>> >>> > On Tue, Aug 14, 2018 at 12:44 PM, Bryan Bende <[email protected]> >>>>> wrote: >>>>> >>> >> >>>>> >>> >> Scott, >>>>> >>> >> >>>>> >>> >> Unfortunately the Parquet API itself is tied to the Hadoop >>>>> Filesystem >>>>> >>> >> object which is why NiFi can't read and write Parquet directly >>>>> to flow >>>>> >>> >> files (i.e. they don't provide a way to read/write to/from Java >>>>> input >>>>> >>> >> and output streams). >>>>> >>> >> >>>>> >>> >> The best you can do is trick the Hadoop API into using the local >>>>> >>> >> file-system by creating a core-site.xml with the following: >>>>> >>> >> >>>>> >>> >> <configuration> >>>>> >>> >> <property> >>>>> >>> >> <name>fs.defaultFS</name> >>>>> >>> >> <value>file:///</value> >>>>> >>> >> </property> >>>>> >>> >> </configuration> >>>>> >>> >> >>>>> >>> >> That will make PutParquet or FetchParquet work with your local >>>>> >>> >> file-system. >>>>> >>> >> >>>>> >>> >> Thanks, >>>>> >>> >> >>>>> >>> >> Bryan >>>>> >>> >> >>>>> >>> >> >>>>> >>> >> On Tue, Aug 14, 2018 at 3:22 PM, scott <[email protected]> >>>>> wrote: >>>>> >>> >> > Hello NiFi community, >>>>> >>> >> > Is there a simple way to read CSV files and write them out as >>>>> >>> >> > Parquet >>>>> >>> >> > files >>>>> >>> >> > without Hadoop? I run NiFi on Windows and don't have access >>>>> to a >>>>> >>> >> > Hadoop >>>>> >>> >> > environment. I'm trying to write the output of my ETL in a >>>>> >>> >> > compressed >>>>> >>> >> > and >>>>> >>> >> > still query-able format. Is there something I should be using >>>>> >>> >> > instead of >>>>> >>> >> > Parquet? >>>>> >>> >> > >>>>> >>> >> > Thanks for your time, >>>>> >>> >> > Scott >>>>> >>> > >>>>> >>> > >>>>> > >>>>> > >>>>> >>>> >>>> >>
