Thanks Joe. I do not have specific configuration as of now as I am still exploring NiFi. Though I think it would be helpful to let user store and retrieve the cache values in different formats json, avro etc.
Thanks & Regards, Sudeep On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <[email protected]> wrote: > Hello Sudeep, > > We are currently lacking a "GetDistributedMapCache" processor that > corresponds to the "PutDistributedMapCache". I created a ticket[1] and will > be working on it today. If you have any comments, configuration > suggestions, etc. please let me know or comment on the ticket. > > [1] https://issues.apache.org/jira/browse/NIFI-1382 > > Joe > - - - - - - > *Joseph Percivall* > linkedin.com/in/Percivall > e: [email protected] > > > > On Tuesday, January 12, 2016 9:46 AM, sudeep mishra < > [email protected]> wrote: > > > Thanks Matt. > > In my data flow I am expected to perform certain validations on data. I am > loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). > For each record in HDFS file I have to query another database and then save > the validated record again in HDFS which will be processed bysome Spark > jobs. > > Since I have to query for each record thus I was planning to cache the > database records against which I have to validate the HDFS. Thus I was > evaluating the DistributedCacheServer. But looks like its purpose is > different. Alternatively can we integrate Redis or another distributed > cache with NiFi as I do not see any processor for it. > > Appreciate your help. > > Thanks & Regards, > > Sudeep > > > On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <[email protected] > > wrote: > > Sudeep, > I was a little off on my second scenario. The detectduplicate > processor uses the distributedcache service all on its own.. Files that are > route through it are loaded into the cache if they do not already exist in > the cache. if they do already exist they are routed to duplicate. The > putDistributedCache processor was a community contribution to which there > are no processor that make use of the info that it caches. > > We should probably build a processor that would make use of the > data that can be loaded by the putDistributeCache processor. Is there a > particular use case you are trying to solve where this would be applicable? > > Thanks, > Matt > > On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <[email protected] > > wrote: > > Sudeep, > The DistributedMapCache is typically used to prevent the consumption > of duplicate data by some of the ingest type processors (GetHBASE, > ListHDFS, and ListSFTP). NiFi uses the service to keep a listing of what > has been consumed so the same files are not consumed multiple times. The > Service can also be used to detect if duplicate data already exists within > a NiFi Instance or cluster. This would be the scenario where some source is > pushing data to your NiFi and perhaps they push the same data more than > once. You want to catch these duplicates so you can perhaps kick them out > of your flow. For this you would use the PutDistributedCache processor to > cache all incoming data and then use the DetectDuplicate processor to find > those duplicates. > > Was there a different use case you were looking to solve using the > Distributed cache service? > > Thanks, > Matt > > On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <[email protected]> > wrote: > > Hi, > > I can cache some data to be used in NiFi flow. I can see the > processor PutDistributedMapCache in the documentation which saves key-value > pairs in DistributedMapCache for NiFi but I do not see any processor to red > this data. How can I read data from DistributedMapCache in my data flow? > > > Thanks & Regards, > > Sudeep Shekhar Mishra > > > > > > > -- > Thanks & Regards, > > Sudeep Shekhar Mishra > > +91-9167519029 > [email protected] > > > -- Thanks & Regards, Sudeep Shekhar Mishra +91-9167519029 [email protected]
