Re: PutDistributedMapCache

Joe Percivall Tue, 12 Jan 2016 07:46:48 -0800

Hello Sudeep,
We are currently lacking a "GetDistributedMapCache" processor that corresponds 
to the "PutDistributedMapCache". I created a ticket[1] and will be working on 
it today. If you have any comments, configuration suggestions, etc. please let 
me know or comment on the ticket.
[1] https://issues.apache.org/jira/browse/NIFI-1382 Joe- - - - - - Joseph 
Percivalllinkedin.com/in/Percivalle: [email protected]


    On Tuesday, January 12, 2016 9:46 AM, sudeep mishra 
<[email protected]> wrote:
 

 Thanks Matt.
In my data flow I am expected to perform certain validations on data. I am 
loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For 
each record in HDFS file I have to query another database and then save the 
validated record again in HDFS which will be processed bysome Spark jobs.
Since I have to query for each record thus I was planning to cache the database 
records against which I have to validate the HDFS. Thus I was evaluating the 
DistributedCacheServer. But looks like its purpose is different. Alternatively 
can we integrate Redis or another distributed cache with NiFi as I do not see 
any processor for it.
Appreciate your help.
Thanks & Regards,
Sudeep

On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <[email protected]> 
wrote:

Sudeep,       I was a little off on my second scenario.  The detectduplicate 
processor uses the distributedcache service all on its own.. Files that are 
route through it are loaded into the cache if they do not already exist in the 
cache.  if they do already exist they are routed to duplicate.  The 
putDistributedCache processor was a community contribution to which there are 
no processor that make use of the info that it caches.

       We should probably build a processor that would make use of the data 
that can be loaded by the putDistributeCache processor.  Is there a particular 
use case you are trying to solve where this would be applicable?
Thanks,Matt
On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <[email protected]> 
wrote:

Sudeep,    The DistributedMapCache is typically used to prevent the consumption 
of duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, 
and ListSFTP).  NiFi uses the service to keep a listing of what has been 
consumed so the same files are not consumed multiple times. The Service can 
also be used to detect if duplicate data already exists within a NiFi Instance 
or cluster. This would be the scenario where some source is pushing data to 
your NiFi and perhaps they push the same data more than once. You want to catch 
these duplicates so you can perhaps kick them out of your flow. For this you 
would use the PutDistributedCache processor to cache all incoming data and then 
use the DetectDuplicate processor to find those duplicates.

    Was there a different use case you were looking to solve using the 
Distributed cache service?
Thanks,Matt
On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <[email protected]> wrote:

Hi,
I can cache some data to be used in NiFi flow. I can see the processor 
PutDistributedMapCache in the documentation which saves key-value pairs in 
DistributedMapCache for NiFi but I do not see any processor to red this data. 
How can I read data from DistributedMapCache in my data flow?


Thanks & Regards,
Sudeep Shekhar Mishra








-- 
Thanks & Regards,
Sudeep Shekhar Mishra
[email protected]

Re: PutDistributedMapCache

Reply via email to