Thank you very much Joe. Can you please let me know how I can use the .patch file? I am using the NiFi via the binaries... Do I need to setup the source code and build the same along with the patch?
Thanks & Regards, Sudeep On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <[email protected]> wrote: > Hello Sudeep, > > I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what > you think. > > The PutDistributedMapCache processor and GetDistributedMapCache work with > the data as a byte[] so it should be format agnostic. That being said it > will be up to you to know what is in there in order to use it later. > > [1] https://issues.apache.org/jira/browse/NIFI-1382 > > Joe > - - - - - - > Joseph Percivall > linkedin.com/in/Percivall > e: [email protected] > > > > On Tuesday, January 12, 2016 11:34 PM, sudeep mishra < > [email protected]> wrote: > > > > Thanks Joe. > > I do not have specific configuration as of now as I am still exploring > NiFi. Though I think it would be helpful to let user store and retrieve the > cache values in different formats json, avro etc. > > Thanks & Regards, > > Sudeep > > > > > > On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <[email protected]> > wrote: > > Hello Sudeep, > > > > > >We are currently lacking a "GetDistributedMapCache" processor that > corresponds to the "PutDistributedMapCache". I created a ticket[1] and will > be working on it today. If you have any comments, configuration > suggestions, etc. please let me know or comment on the ticket. > > > > > >[1] https://issues.apache.org/jira/browse/NIFI-1382 > > > >Joe > >- - - - - - > >Joseph Percivall > >linkedin.com/in/Percivall > >e: [email protected] > > > > > > > > > > > >On Tuesday, January 12, 2016 9:46 AM, sudeep mishra < > [email protected]> wrote: > > > > > > > >Thanks Matt. > > > > > >In my data flow I am expected to perform certain validations on data. I > am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi > flow). For each record in HDFS file I have to query another database and > then save the validated record again in HDFS which will be processed bysome > Spark jobs. > > > > > >Since I have to query for each record thus I was planning to cache the > database records against which I have to validate the HDFS. Thus I was > evaluating the DistributedCacheServer. But looks like its purpose is > different. Alternatively can we integrate Redis or another distributed > cache with NiFi as I do not see any processor for it. > > > > > >Appreciate your help. > > > > > >Thanks & Regards, > > > > > >Sudeep > > > > > > > > > >On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke < > [email protected]> wrote: > > > >Sudeep, > >> I was a little off on my second scenario. The detectduplicate > processor uses the distributedcache service all on its own.. Files that are > route through it are loaded into the cache if they do not already exist in > the cache. if they do already exist they are routed to duplicate. The > putDistributedCache processor was a community contribution to which there > are no processor that make use of the info that it caches. > >> > >> We should probably build a processor that would make use of the > data that can be loaded by the putDistributeCache processor. Is there a > particular use case you are trying to solve where this would be applicable? > >> > >> > >>Thanks, > >>Matt > >> > >> > >>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke < > [email protected]> wrote: > >> > >>Sudeep, > >>> The DistributedMapCache is typically used to prevent the > consumption of duplicate data by some of the ingest type processors > (GetHBASE, ListHDFS, and ListSFTP). NiFi uses the service to keep a > listing of what has been consumed so the same files are not consumed > multiple times. The Service can also be used to detect if duplicate data > already exists within a NiFi Instance or cluster. This would be the > scenario where some source is pushing data to your NiFi and perhaps they > push the same data more than once. You want to catch these duplicates so > you can perhaps kick them out of your flow. For this you would use the > PutDistributedCache processor to cache all incoming data and then use the > DetectDuplicate processor to find those duplicates. > >>> > >>> Was there a different use case you were looking to solve using the > Distributed cache service? > >>> > >>> > >>>Thanks, > >>>Matt > >>> > >>> > >>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra < > [email protected]> wrote: > >>> > >>>Hi, > >>>> > >>>> > >>>>I can cache some data to be used in NiFi flow. I can see the processor > PutDistributedMapCache in the documentation which saves key-value pairs in > DistributedMapCache for NiFi but I do not see any processor to red this > data. How can I read data from DistributedMapCache in my data flow? > >>>> > >>>> > >>>> > >>>> > >>>>Thanks & Regards, > >>>> > >>>> > >>>>Sudeep Shekhar Mishra > >>>> > >>>> > >>> > >> > > > > > > > >-- > > > >Thanks & Regards, > > > > > >Sudeep Shekhar Mishra > > > > > >+91-9167519029 > >[email protected] > > > > > > > -- > > Thanks & Regards, > > Sudeep Shekhar Mishra > > +91-9167519029 > [email protected] > -- Thanks & Regards, Sudeep Shekhar Mishra +91-9167519029 [email protected]
