Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread Matt Burgess
I don't think you have to install Hadoop on Windows in order to get it to work, just the winutils.exe and I guess put it wherever it's looking for it (that might be configurable via an environment variable or something). There are pre-built binaries [1] for various versions of Hadoop, even though

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread scott
Just tested in my Centos VM, worked like a charm without Hadoop. I'll open a Jira bug on PutParquet, doesn't seem to run on Windows. Still not sure what I can do. Converting our production Windows NiFi install to Docker would be a major effort. Has anyone heard of a Parquet writer tool I can

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread Mike Thomsen
> Mike, that's a good tip. I'll test that, but unfortunately, I've already committed to Windows. You can run both Docker and the standard NiFi docker image on Windows. On Wed, Aug 15, 2018 at 2:52 PM scott wrote: > Mike, that's a good tip. I'll test that, but unfortunately, I've already >

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread scott
Mike, that's a good tip. I'll test that, but unfortunately, I've already committed to Windows. What about a script? Is there some tool you know of that can just be called by NiFi to convert an input CSV file to a Parquet file? On Wed, Aug 15, 2018 at 8:32 AM, Mike Thomsen wrote: > Scott, > >

Re: Detect a pattern in incoming json content

2018-08-15 Thread Mark Payne
Jim, I'd recommend RouteText. ScanContent would also be an alternative. Thanks -Mark > On Aug 15, 2018, at 2:02 PM, James McMahon wrote: > > Good afternoon. I have a requirement to search for and detect a pattern > "request":"false" is anywhere in the content of a flowfile. The content is

Detect a pattern in incoming json content

2018-08-15 Thread James McMahon
Good afternoon. I have a requirement to search for and detect a pattern "request":"false" is anywhere in the content of a flowfile. The content is json that spans multiple lines. My reuest key and value would be on its own line, embedded within a tag like this "options":"{ "abc":""12345"

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread Mike Thomsen
Scott, You can also try Docker on Windows. Something like this should work: docker run -d --name nifi-test -v C:/nifi_temp:/opt/data_output -p 8080:8080 apache/nifi:latest I don't have Windows either, but Docker seems to work fine for my colleagues that have to use it on Windows. That should

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread scott
Thanks Bryan. I'll give the Hadoop client a try. On Wed, Aug 15, 2018 at 7:51 AM, Bryan Bende wrote: > I think there is a good chance that installing the Hadoop client would > solve the issue, but I can't say for sure since I don't have a Windows > machine to test. > > The processor depends on

Re: Strange behavior with PutSQL

2018-08-15 Thread Lone Pine Account
That was it! The fragmented transactions settings were not correct. Thanks a million! On Wed, Aug 15, 2018 at 11:12 AM, Matt Burgess wrote: > Do you have a Split processor upstream? If so, is the setting of > Support Fragmented Transactions in PutSQL set to true? That > combination will have

Re: Strange behavior with PutSQL

2018-08-15 Thread Matt Burgess
Do you have a Split processor upstream? If so, is the setting of Support Fragmented Transactions in PutSQL set to true? That combination will have PutSQL try to find all the flow files with the same "fragment.id" attribute, and will only proceed if it gets "fragment.count" of them (all the flow

ExecuteSQL to support multiple statements

2018-08-15 Thread Boris Tyukin
Hi guys, I need to issue a query like below on Impala. it works fine from impala-shell but NiFi seems not to like multiple statements like that. set max_row_size=7mb; create table blabla as select blabla from blablabla; I thought it was addressed in 1.7 but I got it confused with Hive

Re: Strange behavior with PutSQL

2018-08-15 Thread Lone Pine Account
Where would I look to see if that was the case? Is that not logged as an error somewhere? On Wed, Aug 15, 2018 at 10:49 AM, Juan Pablo Gardella < gardellajuanpa...@gmail.com> wrote: > Probably connection pool is exhausted. > > On Wed, 15 Aug 2018 at 11:44 Lone Pine Account > wrote: > >> I have

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread Bryan Bende
I think there is a good chance that installing the Hadoop client would solve the issue, but I can't say for sure since I don't have a Windows machine to test. The processor depends on the Apache Parquet Java client library which depends on Apache Hadoop client [1], and the Hadoop client has a

Re: Strange behavior with PutSQL

2018-08-15 Thread Juan Pablo Gardella
Probably connection pool is exhausted. On Wed, 15 Aug 2018 at 11:44 Lone Pine Account wrote: > I have a simple flow that takes the output of a ReplaceText processor and > sends it to PutSQL. > > This has been working in the past with a "toy" configuration. Now that > I'm testing it on a larger

RE: [EXT] Re: Get all Processors

2018-08-15 Thread Karthik Kothareddy (karthikk) [CONT - Type 2]
Daniel, Thanks for the detailed explanation, however I have built a python client which we use internally to automate few things as well. Coming back to getting list of all processors, I use “/flow/process-groups/root/status? recursive=true” for getting all components in one call and then

Re: Simple CSV to Parquet without Hadoop

2018-08-15 Thread scott
If I install a Hadoop client on my NiFi host, would I be able to get past this error? I don't understand why this processor depends on Hadoop. Other projects like Drill and Spark don't have such a dependency to be able to write Parquet files. On Tue, Aug 14, 2018 at 2:58 PM, Juan Pablo Gardella <

Re: [EXT] Re: Get all Processors

2018-08-15 Thread Daniel Chaffelson
Hi Karthik, I have already implemented this in NiPyApi, assuming a Python automation client is useful to you. In the nipyapi.canvas.recurse_flow command ( https://github.com/Chaffelson/nipyapi/blob/28d7f74478e5e71253ce2de53fd22f56f455c338/nipyapi/canvas.py#L36 ) is the base functionality to step