Re: MergeContent/SplitText - Performance against large CSV files

2015-11-23 Thread Mark Petronic
processor hasn't been officially released yet, this > is the easiest time to affect large changes :) > > On Nov 13, 2015, at 6:46 PM, Mark Petronic <markpetro...@gmail.com> wrote: > > Got to say, Mark... Loving the RouteText processor!!! It definitely solved > multiple tasks

Re: [DISCUSS] Feature proposal: Streamline visual flow design

2015-11-13 Thread Mark Petronic
+1 for double-click and open config dialog on processors. Seems most intuitive to a user. On Fri, Nov 13, 2015 at 12:11 PM, Andrew Grande wrote: > I just had the same idea today. Would like to have double-click open the > Properties pane of a processor, this is the

Re: MergeContent/SplitText - Performance against large CSV files

2015-11-13 Thread Mark Petronic
s quite a > pain. We should have a RouteCSV processor as well. > Though it won't provide any features that RouteText can't provide, it will > make configuration far easier. I created a ticket > for this here [2]. I'm not sure that it will make it into the 0.4.0 > release, though. >

Re: [DISCUSS] Feature proposal: Streamline visual flow design

2015-11-13 Thread Mark Petronic
s across the > types of configurable components. > > On Fri, Nov 13, 2015 at 12:57 PM, Mark Petronic <markpetro...@gmail.com> > wrote: > >> Right now, when you move your mouse over the processor, that connection >> handle icon appears. If you double click it,

Re: MergeContent/SplitText - Performance against large CSV files

2015-11-13 Thread Mark Petronic
Thanks > -Mark > > Sent from my iPhone > > On Nov 13, 2015, at 12:12 PM, Mark Petronic <markpetro...@gmail.com> > wrote: > > Thank you, Mark, for the quick reply. My comments on your comments... > > "That's a great question! 200 million per day eq

Re: Why does PutFile create directories for you but PutHDFS does not?

2015-11-12 Thread Mark Petronic
isn't an option through the >> > properties. >> > >> > -Bryan >> > >> > On Thu, Nov 12, 2015 at 8:19 AM, Mark Payne <marka...@hotmail.com> >> > wrote: >> >> >> >> Mark, >> >> >> >>

Re: Managing flows

2015-11-11 Thread Mark Petronic
Regarding Nifi always running. Yes, it stays running. It is effectively a service with a REST and Web UI. Closing the web UI does not have any effect on the running processors - just your visibility to them. On Wed, Nov 11, 2015 at 9:54 AM, Mark Petronic <markpetro...@gmail.com> wrote:

Re: Managing flows

2015-11-11 Thread Mark Petronic
Look in your Nifi conf directory. The active flow is there as an aptly named .gz file. Guessing you could just rename that and restart Nifi which would create a blank new one. Build up another flow, then you could repeat the same "copy to new file name" and restore some other one to continue on

Re: Managing flows

2015-11-11 Thread Mark Petronic
(reasonably) have to >>> run more instances of nifi with appropriate >>> configuration to not conflict. Is that right? >>> >>> Darren >>> >>> >>> On 11/11/2015 09:54 AM, Mark Petronic wrote: >>>> >>>> Look in your

Why does PutFile create directories for you but PutHDFS does not?

2015-11-11 Thread Mark Petronic
Just wondering about the history behind why one has the logic to create them but the other does not?

Using InvokeHTTP to send GET but without content

2015-11-07 Thread Mark Petronic
I thought this seemed like a simple plan... I wanted to send an audit message to a REST server every time I process every file in my flow. The sub flow in question is: +---+ +--+ +-+ | UnpackContent +-->+ MergeContent +--merged+->+ PutFile |

Suggestion on how to parse field out of filename

2015-11-03 Thread Mark Petronic
Looking for some help on best way to extract a field from a filename. I need to parse out the date from the core filename attribute set by the UnpackContent processor. I am unzipping files that contain many CSV files and these CSV file names vary in format but each has a timestamp included in the

Re: ExecuteStreamCommand processor for "tail -n +2" not working as expected

2015-10-25 Thread Mark Petronic
e zipfile prior to extraction. I agree that > would be a useful feature. Maybe one of the NiFi devs will comment on the > possibility of including it as a feature in the future. > > Cheers, > Adam > > > [1] > https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-

Re: ExecuteStreamCommand processor for "tail -n +2" not working as expected

2015-10-25 Thread Mark Petronic
the >> possibility of including it as a feature in the future. >> >> Cheers, >> Adam >> >> >> [1] >> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/

ExecuteStreamCommand processor for "tail -n +2" not working as expected

2015-10-24 Thread Mark Petronic
Just starting to use Nifi and built a flow that implements the following: unzip -p my.zip *LMTD* | tail -n +2 | gzip --fast | hdfs dfs -put - /some/hdfs/file I used the following processor flow: ExecuteProcess(unzip -p) -> ExecuteStreamCommand(tail -n +2) -> CompressContent(gzip) -> PutHDFS

Re: Suggestions for good approach to ETL strategy

2015-10-24 Thread Mark Petronic
Reading some other posts, stumbled on this JIRA [1] which seems to directly relate to my question in this post. [1] https://issues.apache.org/jira/browse/NIFI-631 On Sat, Oct 24, 2015 at 11:44 PM, Mark Petronic <markpetro...@gmail.com> wrote: > So, I stumbled onto Nifi at a Laurel,

Suggestions for good approach to ETL strategy

2015-10-24 Thread Mark Petronic
So, I stumbled onto Nifi at a Laurel, MD Spark meetup and was pretty excited about using it. I'm running HDP and need to construct an ETL like flow and would like to try to start, as a new user to Nifi, using a "best practice" approach. Wondering if some of you more seasoned users might provide