Re: PutDistributedMapCache

sudeep mishra Tue, 12 Jan 2016 20:35:26 -0800

Thanks Joe.

I do not have specific configuration as of now as I am still exploring
NiFi. Though I think it would be helpful to let user store and retrieve the
cache values in different formats json, avro etc.


Thanks & Regards,

Sudeep



On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <[email protected]>
wrote:

> Hello Sudeep,
>
> We are currently lacking a "GetDistributedMapCache" processor that
> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will
> be working on it today. If you have any comments, configuration
> suggestions, etc. please let me know or comment on the ticket.
>
> [1] https://issues.apache.org/jira/browse/NIFI-1382
>
> Joe
> - - - - - -
> *Joseph Percivall*
> linkedin.com/in/Percivall
> e: [email protected]
>
>
>
> On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <
> [email protected]> wrote:
>
>
> Thanks Matt.
>
> In my data flow I am expected to perform certain validations on data. I am
> loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow).
> For each record in HDFS file I have to query another database and then save
> the validated record again in HDFS which will be processed bysome Spark
> jobs.
>
> Since I have to query for each record thus I was planning to cache the
> database records against which I have to validate the HDFS. Thus I was
> evaluating the DistributedCacheServer. But looks like its purpose is
> different. Alternatively can we integrate Redis or another distributed
> cache with NiFi as I do not see any processor for it.
>
> Appreciate your help.
>
> Thanks & Regards,
>
> Sudeep
>
>
> On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <[email protected]
> > wrote:
>
> Sudeep,
>        I was a little off on my second scenario.  The detectduplicate
> processor uses the distributedcache service all on its own.. Files that are
> route through it are loaded into the cache if they do not already exist in
> the cache.  if they do already exist they are routed to duplicate.  The
> putDistributedCache processor was a community contribution to which there
> are no processor that make use of the info that it caches.
>
>        We should probably build a processor that would make use of the
> data that can be loaded by the putDistributeCache processor.  Is there a
> particular use case you are trying to solve where this would be applicable?
>
> Thanks,
> Matt
>
> On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <[email protected]
> > wrote:
>
> Sudeep,
>     The DistributedMapCache is typically used to prevent the consumption
> of duplicate data by some of the ingest type processors (GetHBASE,
> ListHDFS, and ListSFTP).  NiFi uses the service to keep a listing of what
> has been consumed so the same files are not consumed multiple times. The
> Service can also be used to detect if duplicate data already exists within
> a NiFi Instance or cluster. This would be the scenario where some source is
> pushing data to your NiFi and perhaps they push the same data more than
> once. You want to catch these duplicates so you can perhaps kick them out
> of your flow. For this you would use the PutDistributedCache processor to
> cache all incoming data and then use the DetectDuplicate processor to find
> those duplicates.
>
>     Was there a different use case you were looking to solve using the
> Distributed cache service?
>
> Thanks,
> Matt
>
> On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <[email protected]>
> wrote:
>
> Hi,
>
> I can cache some data to be used in NiFi flow. I can see the
> processor PutDistributedMapCache in the documentation which saves key-value
> pairs in DistributedMapCache for NiFi but I do not see any processor to red
> this data. How can I read data from DistributedMapCache in my data flow?
>
>
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
>
>
>
>
>
> --
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> [email protected]
>
>
>


-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
[email protected]

Re: PutDistributedMapCache

Reply via email to