Re: Simple CSV to Parquet without Hadoop

Bryan Bende Wed, 15 Aug 2018 07:52:26 -0700

I think there is a good chance that installing the Hadoop client would
solve the issue, but I can't say for sure since I don't have a Windows
machine to test.


The processor depends on the Apache Parquet Java client library which
depends on Apache Hadoop client [1], and the Hadoop client has a
limitation on Windows where it requires something additional.

[1] 
https://github.com/apache/parquet-mr/blob/master/parquet-avro/pom.xml#L62-L65



On Wed, Aug 15, 2018 at 10:16 AM, scott <tcots8...@gmail.com> wrote:
> If I install a Hadoop client on my NiFi host, would I be able to get past
> this error?
> I don't understand why this processor depends on Hadoop. Other projects like
> Drill and Spark don't have such a dependency to be able to write Parquet
> files.
>
> On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella
> <gardellajuanpa...@gmail.com> wrote:
>>
>> It's a warning. You can ignore that.
>>
>> On Tue, 14 Aug 2018 at 18:53 Bryan Bende <bbe...@gmail.com> wrote:
>>>
>>> Scott,
>>>
>>> Sorry I did not realize the Hadoop client would be looking for this
>>> winutils.exe when running on Windows.
>>>
>>> On linux and MacOS you don't need anything external installed outside
>>> of NiFi so I wasn't expecting this.
>>>
>>> Not sure if there is any other good option here regarding Parquet.
>>>
>>> Thanks,
>>>
>>> Bryan
>>>
>>>
>>> On Tue, Aug 14, 2018 at 5:31 PM, scott <tcots8...@gmail.com> wrote:
>>> > Hi Bryan,
>>> > I'm fine if I have to trick the API, but don't I still need Hadoop
>>> > installed
>>> > somewhere? After creating the core-site.xml as you described, I get the
>>> > following errors:
>>> >
>>> > Failed to locate the winutils binary in the hadoop binary path
>>> > IOException: Could not locate executable null\bin\winutils.exe in the
>>> > Hadoop
>>> > binaries
>>> > Unable to load native-hadoop library for your platform... using
>>> > builtin-java
>>> > classes where applicable
>>> > Failed to write due to java.io.IOException: No FileSystem for scheme
>>> >
>>> > BTW, I'm using NiFi version 1.5
>>> >
>>> > Thanks,
>>> > Scott
>>> >
>>> >
>>> > On Tue, Aug 14, 2018 at 12:44 PM, Bryan Bende <bbe...@gmail.com> wrote:
>>> >>
>>> >> Scott,
>>> >>
>>> >> Unfortunately the Parquet API itself is tied to the Hadoop Filesystem
>>> >> object which is why NiFi can't read and write Parquet directly to flow
>>> >> files (i.e. they don't provide a way to read/write to/from Java input
>>> >> and output streams).
>>> >>
>>> >> The best you can do is trick the Hadoop API into using the local
>>> >> file-system by creating a core-site.xml with the following:
>>> >>
>>> >> <configuration>
>>> >>     <property>
>>> >>         <name>fs.defaultFS</name>
>>> >>         <value>file:///</value>
>>> >>     </property>
>>> >> </configuration>
>>> >>
>>> >> That will make PutParquet or FetchParquet work with your local
>>> >> file-system.
>>> >>
>>> >> Thanks,
>>> >>
>>> >> Bryan
>>> >>
>>> >>
>>> >> On Tue, Aug 14, 2018 at 3:22 PM, scott <tcots8...@gmail.com> wrote:
>>> >> > Hello NiFi community,
>>> >> > Is there a simple way to read CSV files and write them out as
>>> >> > Parquet
>>> >> > files
>>> >> > without Hadoop? I run NiFi on Windows and don't have access to a
>>> >> > Hadoop
>>> >> > environment. I'm trying to write the output of my ETL in a
>>> >> > compressed
>>> >> > and
>>> >> > still query-able format. Is there something I should be using
>>> >> > instead of
>>> >> > Parquet?
>>> >> >
>>> >> > Thanks for your time,
>>> >> > Scott
>>> >
>>> >
>
>

Re: Simple CSV to Parquet without Hadoop

Reply via email to