Hello Sudeep,
We are currently lacking a "GetDistributedMapCache" processor that corresponds
to the "PutDistributedMapCache". I created a ticket[1] and will be working on
it today. If you have any comments, configuration suggestions, etc. please let
me know or comment on the ticket.
[1] https://issues.apache.org/jira/browse/NIFI-1382 Joe- - - - - - Joseph
Percivalllinkedin.com/in/Percivalle: [email protected]
On Tuesday, January 12, 2016 9:46 AM, sudeep mishra
<[email protected]> wrote:
Thanks Matt.
In my data flow I am expected to perform certain validations on data. I am
loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For
each record in HDFS file I have to query another database and then save the
validated record again in HDFS which will be processed bysome Spark jobs.
Since I have to query for each record thus I was planning to cache the database
records against which I have to validate the HDFS. Thus I was evaluating the
DistributedCacheServer. But looks like its purpose is different. Alternatively
can we integrate Redis or another distributed cache with NiFi as I do not see
any processor for it.
Appreciate your help.
Thanks & Regards,
Sudeep
On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <[email protected]>
wrote:
Sudeep, I was a little off on my second scenario. The detectduplicate
processor uses the distributedcache service all on its own.. Files that are
route through it are loaded into the cache if they do not already exist in the
cache. if they do already exist they are routed to duplicate. The
putDistributedCache processor was a community contribution to which there are
no processor that make use of the info that it caches.
We should probably build a processor that would make use of the data
that can be loaded by the putDistributeCache processor. Is there a particular
use case you are trying to solve where this would be applicable?
Thanks,Matt
On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <[email protected]>
wrote:
Sudeep, The DistributedMapCache is typically used to prevent the consumption
of duplicate data by some of the ingest type processors (GetHBASE, ListHDFS,
and ListSFTP). NiFi uses the service to keep a listing of what has been
consumed so the same files are not consumed multiple times. The Service can
also be used to detect if duplicate data already exists within a NiFi Instance
or cluster. This would be the scenario where some source is pushing data to
your NiFi and perhaps they push the same data more than once. You want to catch
these duplicates so you can perhaps kick them out of your flow. For this you
would use the PutDistributedCache processor to cache all incoming data and then
use the DetectDuplicate processor to find those duplicates.
Was there a different use case you were looking to solve using the
Distributed cache service?
Thanks,Matt
On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <[email protected]> wrote:
Hi,
I can cache some data to be used in NiFi flow. I can see the processor
PutDistributedMapCache in the documentation which saves key-value pairs in
DistributedMapCache for NiFi but I do not see any processor to red this data.
How can I read data from DistributedMapCache in my data flow?
Thanks & Regards,
Sudeep Shekhar Mishra
--
Thanks & Regards,
Sudeep Shekhar Mishra
[email protected]