Re: NiFi processor for Redis

2016-01-12 Thread Joe Witt
Sudeep, Hello. At this time there are no apache nifi redis processors to push/pull data with Redis that I am aware of. Something you might be interested in contributing or contributing to? Thanks Joe On Wed, Jan 13, 2016 at 12:13 AM, sudeep mishra wrote: > Hi, > > Do we have any processor to

Re: Data Ingestion forLarge Source Files and Masking

2016-01-12 Thread Joe Witt
Hello So the performance went from what sounded pretty good to what sounds pretty problematic. The rate now sounds like it is around 5MB/s which is indeed quite poor. Building on what Bryan said there does appear to be some good opportunities to improve the performance. The link he provided jus

NiFi processor for Redis

2016-01-12 Thread sudeep mishra
Hi, Do we have any processor to push and retrieve data from Redis? Thanks & Regards, Sudeep Shekhar Mishra

Re: PutDistributedMapCache

2016-01-12 Thread sudeep mishra
Thanks Joe. I do not have specific configuration as of now as I am still exploring NiFi. Though I think it would be helpful to let user store and retrieve the cache values in different formats json, avro etc. Thanks & Regards, Sudeep On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall wrote: > H

Re: Data Ingestion forLarge Source Files and Masking

2016-01-12 Thread Juan Sequeiros
Obaid, Since you mention that you will have dedicated ETL servers and assume they will also have a decent amount of ram on them, then I would not shy away from increasing your threads. Also in your staging directory if you do not need to keep originals, then might consider GetFile and on that one

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Russell Whitaker
On Tue, Jan 12, 2016 at 7:20 PM, Adam Lamar wrote: > Russell, > > Sorry - I meant that I use the FetchS3Object processor, not ListS3. Ah, that makes more sense, thanks. > You can > set the Object Key property with the name of the key you want to download. > This property supports the expression

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Adam Lamar
Russell, Sorry - I meant that I use the FetchS3Object processor, not ListS3. You can set the Object Key property with the name of the key you want to download. This property supports the expression language, so the object key can be sourced from each flowfile. I'm unsure if there is a good t

Re: Data Ingestion forLarge Source Files and Masking

2016-01-12 Thread Bryan Bende
Obaid, I can't say for sure how much this would improve performance, but you might want to wrap the OutputStream with BufferedOutputStream or BufferedWriter. Would be curious to here if that helps. A similar scenario from the standard processors is ReplaceText, here is one example where it uses t

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Russell Whitaker
On Tue, Jan 12, 2016 at 7:12 PM, Adam Lamar wrote: > > On 1/12/16 8:04 PM, Bryan Bende wrote: >> >> In the case of FetchS3, there is definitely intent to have a ListS3, but >> there are still ways to use it with out that... > > > To add to Bryan's comment, I use ListS3 in combination with GetSQS t

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Adam Lamar
On 1/12/16 8:04 PM, Bryan Bende wrote: In the case of FetchS3, there is definitely intent to have a ListS3, but there are still ways to use it with out that... To add to Bryan's comment, I use ListS3 in combination with GetSQS to fetch objects as they are newly placed into an s3 bucket. For m

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Russell Whitaker
On Tue, Jan 12, 2016 at 6:39 PM, Corey Flowers wrote: > Hello Russell, > >Sorry if that seemed short, I was running in to pick my son up from > my practice. Oh, I didn't mean to give the impression I'd taken it that way! > What I meant to say was that you are correct. Although I > haven'

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Bryan Bende
Russell/Corey, In 0.4.0 there is a new way for processors to indicate what they expect as far as input, it can be required, allowed, or forbidden. This prevents scenarios like ExecuteSQL which at one point required an input FlowFile, but the processor could be running and started with out an incom

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Corey Flowers
Hello Russell, Sorry if that seemed short, I was running in to pick my son up from my practice. What I meant to say was that you are correct. Although I haven't worked on those processors, I do believe it is expecting the listS3 processor to function and that is why you are getting that err

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Russell Whitaker
On Tue, Jan 12, 2016 at 6:11 PM, Corey Flowers wrote: > Ha ha! Well that would do it! :) > I don't know what that would "do" other than confirm that the FetchS3Object processor shipped with v0.4.1 needs its doc to reflect the fact it's not yet useable untl a ListS3* processor is implemented and i

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Corey Flowers
Ha ha! Well that would do it! :) Sent from my iPhone > On Jan 12, 2016, at 9:10 PM, Russell Whitaker > wrote: > >> On Tue, Jan 12, 2016 at 6:02 PM, Corey Flowers >> wrote: >> I haven't worked with this processor but I believe it is looking for >> the S3 list processor to generate the list of

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Russell Whitaker
On Tue, Jan 12, 2016 at 6:02 PM, Corey Flowers wrote: > I haven't worked with this processor but I believe it is looking for > the S3 list processor to generate the list of objects to fetch. Did > you try that yet? > I mentioned this: "There's no "ListS3Object" processor type which might hypothet

Re: "Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Corey Flowers
I haven't worked with this processor but I believe it is looking for the S3 list processor to generate the list of objects to fetch. Did you try that yet? Sent from my iPhone > On Jan 12, 2016, at 8:38 PM, Russell Whitaker > wrote: > > I'm running v0.4.1 Nifi, and seeing this (taken from nifi-a

Re: Data Ingestion forLarge Source Files and Masking

2016-01-12 Thread obaidul karim
Hi Joe, Yes, I took consideration of existinh RAID and HW settings. We have 10G NIC for all hadoop intra-connectivity and the server in question is an edge node of our hadoop cluster. In production scenario we will use dedicated ETL servers having high performance(>500MB/s) local disks. Sharing a

"Processor requires an upstream connection" for FetchS3Object?

2016-01-12 Thread Russell Whitaker
I'm running v0.4.1 Nifi, and seeing this (taken from nifi-app.log, also seeing on mouseover of the "!" icon on the processor on the canvas): 2016-01-12 17:08:50,357 ERROR [NiFi Web Server-18] o.a.nifi.groups.StandardProcessGroup Unable to start FetchS3Object[id=f4253204-a2e2-4ce6-ba09-9415e8024dca

Re: PutDistributedMapCache

2016-01-12 Thread Joe Percivall
Hello Sudeep, We are currently lacking a "GetDistributedMapCache" processor that corresponds to the "PutDistributedMapCache". I created a ticket[1] and will be working on it today. If you have any comments, configuration suggestions, etc. please let me know or comment on the ticket. [1] https://

Re: PutDistributedMapCache

2016-01-12 Thread sudeep mishra
Thanks Matt. In my data flow I am expected to perform certain validations on data. I am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For each record in HDFS file I have to query another database and then save the validated record again in HDFS which will be processed b

Re: PutDistributedMapCache

2016-01-12 Thread Matthew Clarke
Sudeep, I was a little off on my second scenario. The detectduplicate processor uses the distributedcache service all on its own.. Files that are route through it are loaded into the cache if they do not already exist in the cache. if they do already exist they are routed to duplicate. Th

Re: PutDistributedMapCache

2016-01-12 Thread Matthew Clarke
Sudeep, The DistributedMapCache is typically used to prevent the consumption of duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, and ListSFTP). NiFi uses the service to keep a listing of what has been consumed so the same files are not consumed multiple times. The Serv

PutDistributedMapCache

2016-01-12 Thread sudeep mishra
Hi, I can cache some data to be used in NiFi flow. I can see the processor PutDistributedMapCache in the documentation which saves key-value pairs in DistributedMapCache for NiFi but I do not see any processor to red this data. How can I read data from DistributedMapCache in my data flow? Thanks