Hello Sudeep, I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what you think.
The PutDistributedMapCache processor and GetDistributedMapCache work with the data as a byte[] so it should be format agnostic. That being said it will be up to you to know what is in there in order to use it later. [1] https://issues.apache.org/jira/browse/NIFI-1382 Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: [email protected] On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <[email protected]> wrote: Thanks Joe. I do not have specific configuration as of now as I am still exploring NiFi. Though I think it would be helpful to let user store and retrieve the cache values in different formats json, avro etc. Thanks & Regards, Sudeep On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <[email protected]> wrote: Hello Sudeep, > > >We are currently lacking a "GetDistributedMapCache" processor that corresponds >to the "PutDistributedMapCache". I created a ticket[1] and will be working on >it today. If you have any comments, configuration suggestions, etc. please let >me know or comment on the ticket. > > >[1] https://issues.apache.org/jira/browse/NIFI-1382 > >Joe >- - - - - - >Joseph Percivall >linkedin.com/in/Percivall >e: [email protected] > > > > > >On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <[email protected]> >wrote: > > > >Thanks Matt. > > >In my data flow I am expected to perform certain validations on data. I am >loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For >each record in HDFS file I have to query another database and then save the >validated record again in HDFS which will be processed bysome Spark jobs. > > >Since I have to query for each record thus I was planning to cache the >database records against which I have to validate the HDFS. Thus I was >evaluating the DistributedCacheServer. But looks like its purpose is >different. Alternatively can we integrate Redis or another distributed cache >with NiFi as I do not see any processor for it. > > >Appreciate your help. > > >Thanks & Regards, > > >Sudeep > > > > >On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <[email protected]> >wrote: > >Sudeep, >> I was a little off on my second scenario. The detectduplicate >> processor uses the distributedcache service all on its own.. Files that are >> route through it are loaded into the cache if they do not already exist in >> the cache. if they do already exist they are routed to duplicate. The >> putDistributedCache processor was a community contribution to which there >> are no processor that make use of the info that it caches. >> >> We should probably build a processor that would make use of the data >> that can be loaded by the putDistributeCache processor. Is there a >> particular use case you are trying to solve where this would be applicable? >> >> >>Thanks, >>Matt >> >> >>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <[email protected]> >>wrote: >> >>Sudeep, >>> The DistributedMapCache is typically used to prevent the consumption of >>> duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, >>> and ListSFTP). NiFi uses the service to keep a listing of what has been >>> consumed so the same files are not consumed multiple times. The Service can >>> also be used to detect if duplicate data already exists within a NiFi >>> Instance or cluster. This would be the scenario where some source is >>> pushing data to your NiFi and perhaps they push the same data more than >>> once. You want to catch these duplicates so you can perhaps kick them out >>> of your flow. For this you would use the PutDistributedCache processor to >>> cache all incoming data and then use the DetectDuplicate processor to find >>> those duplicates. >>> >>> Was there a different use case you were looking to solve using the >>> Distributed cache service? >>> >>> >>>Thanks, >>>Matt >>> >>> >>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <[email protected]> >>>wrote: >>> >>>Hi, >>>> >>>> >>>>I can cache some data to be used in NiFi flow. I can see the processor >>>>PutDistributedMapCache in the documentation which saves key-value pairs in >>>>DistributedMapCache for NiFi but I do not see any processor to red this >>>>data. How can I read data from DistributedMapCache in my data flow? >>>> >>>> >>>> >>>> >>>>Thanks & Regards, >>>> >>>> >>>>Sudeep Shekhar Mishra >>>> >>>> >>> >> > > > >-- > >Thanks & Regards, > > >Sudeep Shekhar Mishra > > >+91-9167519029 >[email protected] > > -- Thanks & Regards, Sudeep Shekhar Mishra +91-9167519029 [email protected]
