Re: Simple CSV to Parquet without Hadoop

2018-08-21 Thread scott
Matt, After installing winutils, setting my PATH and HADOOP_HOME appropriately, I get past that one error. However, I still have this error: 2018-08-21 10:55:50,838 ERROR [Timer-Driven Process Thread-1] o.a.nifi.processors.parquet.PutParquet PutParquet[id=3e674cc6-0165-1000-d4ac-8d4b225485a2]

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread Matt Burgess
I don't think you have to install Hadoop on Windows in order to get it to work, just the winutils.exe and I guess put it wherever it's looking for it (that might be configurable via an environment variable or something). There are pre-built binaries [1] for various versions of Hadoop, even though

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread scott
Just tested in my Centos VM, worked like a charm without Hadoop. I'll open a Jira bug on PutParquet, doesn't seem to run on Windows. Still not sure what I can do. Converting our production Windows NiFi install to Docker would be a major effort. Has anyone heard of a Parquet writer tool I can

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread Mike Thomsen
> Mike, that's a good tip. I'll test that, but unfortunately, I've already committed to Windows. You can run both Docker and the standard NiFi docker image on Windows. On Wed, Aug 15, 2018 at 2:52 PM scott wrote: > Mike, that's a good tip. I'll test that, but unfortunately, I've already >

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread scott
Mike, that's a good tip. I'll test that, but unfortunately, I've already committed to Windows. What about a script? Is there some tool you know of that can just be called by NiFi to convert an input CSV file to a Parquet file? On Wed, Aug 15, 2018 at 8:32 AM, Mike Thomsen wrote: > Scott, > >

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread Mike Thomsen
Scott, You can also try Docker on Windows. Something like this should work: docker run -d --name nifi-test -v C:/nifi_temp:/opt/data_output -p 8080:8080 apache/nifi:latest I don't have Windows either, but Docker seems to work fine for my colleagues that have to use it on Windows. That should

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread scott
Thanks Bryan. I'll give the Hadoop client a try. On Wed, Aug 15, 2018 at 7:51 AM, Bryan Bende wrote: > I think there is a good chance that installing the Hadoop client would > solve the issue, but I can't say for sure since I don't have a Windows > machine to test. > > The processor depends on

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread Bryan Bende
I think there is a good chance that installing the Hadoop client would solve the issue, but I can't say for sure since I don't have a Windows machine to test. The processor depends on the Apache Parquet Java client library which depends on Apache Hadoop client [1], and the Hadoop client has a

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread scott
If I install a Hadoop client on my NiFi host, would I be able to get past this error? I don't understand why this processor depends on Hadoop. Other projects like Drill and Spark don't have such a dependency to be able to write Parquet files. On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella <

Re: Simple CSV to Parquet without Hadoop

2018-08-14 Thread Juan Pablo Gardella
It's a warning. You can ignore that. On Tue, 14 Aug 2018 at 18:53 Bryan Bende wrote: > Scott, > > Sorry I did not realize the Hadoop client would be looking for this > winutils.exe when running on Windows. > > On linux and MacOS you don't need anything external installed outside > of NiFi so I

Re: Simple CSV to Parquet without Hadoop

2018-08-14 Thread Bryan Bende
Scott, Sorry I did not realize the Hadoop client would be looking for this winutils.exe when running on Windows. On linux and MacOS you don't need anything external installed outside of NiFi so I wasn't expecting this. Not sure if there is any other good option here regarding Parquet. Thanks,

Re: Simple CSV to Parquet without Hadoop

2018-08-14 Thread scott
Hi Bryan, I'm fine if I have to trick the API, but don't I still need Hadoop installed somewhere? After creating the core-site.xml as you described, I get the following errors: Failed to locate the winutils binary in the hadoop binary path IOException: Could not locate executable

Re: Simple CSV to Parquet without Hadoop

2018-08-14 Thread Bryan Bende
Scott, Unfortunately the Parquet API itself is tied to the Hadoop Filesystem object which is why NiFi can't read and write Parquet directly to flow files (i.e. they don't provide a way to read/write to/from Java input and output streams). The best you can do is trick the Hadoop API into using

Simple CSV to Parquet without Hadoop

2018-08-14 Thread scott
Hello NiFi community, Is there a simple way to read CSV files and write them out as Parquet files without Hadoop? I run NiFi on Windows and don't have access to a Hadoop environment. I'm trying to write the output of my ETL in a compressed and still query-able format. Is there something I should