Mike, that's a good tip. I'll test that, but unfortunately, I've already committed to Windows. What about a script? Is there some tool you know of that can just be called by NiFi to convert an input CSV file to a Parquet file?
On Wed, Aug 15, 2018 at 8:32 AM, Mike Thomsen <[email protected]> wrote: > Scott, > > You can also try Docker on Windows. Something like this should work: > > docker run -d --name nifi-test -v C:/nifi_temp:/opt/data_output -p > 8080:8080 apache/nifi:latest > > I don't have Windows either, but Docker seems to work fine for my > colleagues that have to use it on Windows. That should bridge C:\nifi_temp > and /opt/data_output between host and container and remap localhost:8080 to > the container on 8080 so you don't have to mess with a Hadoop client just > to try out some Parquet stuff. > > Mike > > On Wed, Aug 15, 2018 at 11:20 AM scott <[email protected]> wrote: > >> Thanks Bryan. I'll give the Hadoop client a try. >> >> On Wed, Aug 15, 2018 at 7:51 AM, Bryan Bende <[email protected]> wrote: >> >>> I think there is a good chance that installing the Hadoop client would >>> solve the issue, but I can't say for sure since I don't have a Windows >>> machine to test. >>> >>> The processor depends on the Apache Parquet Java client library which >>> depends on Apache Hadoop client [1], and the Hadoop client has a >>> limitation on Windows where it requires something additional. >>> >>> [1] https://github.com/apache/parquet-mr/blob/master/ >>> parquet-avro/pom.xml#L62-L65 >>> >>> >>> >>> On Wed, Aug 15, 2018 at 10:16 AM, scott <[email protected]> wrote: >>> > If I install a Hadoop client on my NiFi host, would I be able to get >>> past >>> > this error? >>> > I don't understand why this processor depends on Hadoop. Other >>> projects like >>> > Drill and Spark don't have such a dependency to be able to write >>> Parquet >>> > files. >>> > >>> > On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella >>> > <[email protected]> wrote: >>> >> >>> >> It's a warning. You can ignore that. >>> >> >>> >> On Tue, 14 Aug 2018 at 18:53 Bryan Bende <[email protected]> wrote: >>> >>> >>> >>> Scott, >>> >>> >>> >>> Sorry I did not realize the Hadoop client would be looking for this >>> >>> winutils.exe when running on Windows. >>> >>> >>> >>> On linux and MacOS you don't need anything external installed outside >>> >>> of NiFi so I wasn't expecting this. >>> >>> >>> >>> Not sure if there is any other good option here regarding Parquet. >>> >>> >>> >>> Thanks, >>> >>> >>> >>> Bryan >>> >>> >>> >>> >>> >>> On Tue, Aug 14, 2018 at 5:31 PM, scott <[email protected]> wrote: >>> >>> > Hi Bryan, >>> >>> > I'm fine if I have to trick the API, but don't I still need Hadoop >>> >>> > installed >>> >>> > somewhere? After creating the core-site.xml as you described, I >>> get the >>> >>> > following errors: >>> >>> > >>> >>> > Failed to locate the winutils binary in the hadoop binary path >>> >>> > IOException: Could not locate executable null\bin\winutils.exe in >>> the >>> >>> > Hadoop >>> >>> > binaries >>> >>> > Unable to load native-hadoop library for your platform... using >>> >>> > builtin-java >>> >>> > classes where applicable >>> >>> > Failed to write due to java.io.IOException: No FileSystem for >>> scheme >>> >>> > >>> >>> > BTW, I'm using NiFi version 1.5 >>> >>> > >>> >>> > Thanks, >>> >>> > Scott >>> >>> > >>> >>> > >>> >>> > On Tue, Aug 14, 2018 at 12:44 PM, Bryan Bende <[email protected]> >>> wrote: >>> >>> >> >>> >>> >> Scott, >>> >>> >> >>> >>> >> Unfortunately the Parquet API itself is tied to the Hadoop >>> Filesystem >>> >>> >> object which is why NiFi can't read and write Parquet directly to >>> flow >>> >>> >> files (i.e. they don't provide a way to read/write to/from Java >>> input >>> >>> >> and output streams). >>> >>> >> >>> >>> >> The best you can do is trick the Hadoop API into using the local >>> >>> >> file-system by creating a core-site.xml with the following: >>> >>> >> >>> >>> >> <configuration> >>> >>> >> <property> >>> >>> >> <name>fs.defaultFS</name> >>> >>> >> <value>file:///</value> >>> >>> >> </property> >>> >>> >> </configuration> >>> >>> >> >>> >>> >> That will make PutParquet or FetchParquet work with your local >>> >>> >> file-system. >>> >>> >> >>> >>> >> Thanks, >>> >>> >> >>> >>> >> Bryan >>> >>> >> >>> >>> >> >>> >>> >> On Tue, Aug 14, 2018 at 3:22 PM, scott <[email protected]> >>> wrote: >>> >>> >> > Hello NiFi community, >>> >>> >> > Is there a simple way to read CSV files and write them out as >>> >>> >> > Parquet >>> >>> >> > files >>> >>> >> > without Hadoop? I run NiFi on Windows and don't have access to a >>> >>> >> > Hadoop >>> >>> >> > environment. I'm trying to write the output of my ETL in a >>> >>> >> > compressed >>> >>> >> > and >>> >>> >> > still query-able format. Is there something I should be using >>> >>> >> > instead of >>> >>> >> > Parquet? >>> >>> >> > >>> >>> >> > Thanks for your time, >>> >>> >> > Scott >>> >>> > >>> >>> > >>> > >>> > >>> >> >>
