Re: PutDistributedMapCache

sudeep mishra Wed, 13 Jan 2016 07:57:50 -0800

Thank you very much Joe.

Can you please let me know how I can use the .patch file? I am using the
NiFi via the binaries... Do I need to setup the source code and build the
same along with the patch?


Thanks & Regards,

Sudeep

On Wed, Jan 13, 2016 at 9:02 PM, Joe Percivall <[email protected]>
wrote:

> Hello Sudeep,
>
> I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what
> you think.
>
> The PutDistributedMapCache processor and GetDistributedMapCache work with
> the data as a byte[] so it should be format agnostic. That being said it
> will be up to you to know what is in there in order to use it later.
>
> [1] https://issues.apache.org/jira/browse/NIFI-1382
>
> Joe
> - - - - - -
> Joseph Percivall
> linkedin.com/in/Percivall
> e: [email protected]
>
>
>
> On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <
> [email protected]> wrote:
>
>
>
> Thanks Joe.
>
> I do not have specific configuration as of now as I am still exploring
> NiFi. Though I think it would be helpful to let user store and retrieve the
> cache values in different formats json, avro etc.
>
> Thanks & Regards,
>
> Sudeep
>
>
>
>
>
> On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <[email protected]>
> wrote:
>
> Hello Sudeep,
> >
> >
> >We are currently lacking a "GetDistributedMapCache" processor that
> corresponds to the "PutDistributedMapCache". I created a ticket[1] and will
> be working on it today. If you have any comments, configuration
> suggestions, etc. please let me know or comment on the ticket.
> >
> >
> >[1] https://issues.apache.org/jira/browse/NIFI-1382
> >
> >Joe
> >- - - - - -
> >Joseph Percivall
> >linkedin.com/in/Percivall
> >e: [email protected]
> >
> >
> >
> >
> >
> >On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <
> [email protected]> wrote:
> >
> >
> >
> >Thanks Matt.
> >
> >
> >In my data flow I am expected to perform certain validations on data. I
> am loading some SQLServer data into HDFSusing Sqoop (not part of NiFi
> flow). For each record in HDFS file I have to query another database and
> then save the validated record again in HDFS which will be processed bysome
> Spark jobs.
> >
> >
> >Since I have to query for each record thus I was planning to cache the
> database records against which I have to validate the HDFS. Thus I was
> evaluating the DistributedCacheServer. But looks like its purpose is
> different. Alternatively can we integrate Redis or another distributed
> cache with NiFi as I do not see any processor for it.
> >
> >
> >Appreciate your help.
> >
> >
> >Thanks & Regards,
> >
> >
> >Sudeep
> >
> >
> >
> >
> >On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <
> [email protected]> wrote:
> >
> >Sudeep,
> >>       I was a little off on my second scenario.  The detectduplicate
> processor uses the distributedcache service all on its own.. Files that are
> route through it are loaded into the cache if they do not already exist in
> the cache.  if they do already exist they are routed to duplicate.  The
> putDistributedCache processor was a community contribution to which there
> are no processor that make use of the info that it caches.
> >>
> >>       We should probably build a processor that would make use of the
> data that can be loaded by the putDistributeCache processor.  Is there a
> particular use case you are trying to solve where this would be applicable?
> >>
> >>
> >>Thanks,
> >>Matt
> >>
> >>
> >>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <
> [email protected]> wrote:
> >>
> >>Sudeep,
> >>>    The DistributedMapCache is typically used to prevent the
> consumption of duplicate data by some of the ingest type processors
> (GetHBASE, ListHDFS, and ListSFTP).  NiFi uses the service to keep a
> listing of what has been consumed so the same files are not consumed
> multiple times. The Service can also be used to detect if duplicate data
> already exists within a NiFi Instance or cluster. This would be the
> scenario where some source is pushing data to your NiFi and perhaps they
> push the same data more than once. You want to catch these duplicates so
> you can perhaps kick them out of your flow. For this you would use the
> PutDistributedCache processor to cache all incoming data and then use the
> DetectDuplicate processor to find those duplicates.
> >>>
> >>>    Was there a different use case you were looking to solve using the
> Distributed cache service?
> >>>
> >>>
> >>>Thanks,
> >>>Matt
> >>>
> >>>
> >>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <
> [email protected]> wrote:
> >>>
> >>>Hi,
> >>>>
> >>>>
> >>>>I can cache some data to be used in NiFi flow. I can see the processor
> PutDistributedMapCache in the documentation which saves key-value pairs in
> DistributedMapCache for NiFi but I do not see any processor to red this
> data. How can I read data from DistributedMapCache in my data flow?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>Thanks & Regards,
> >>>>
> >>>>
> >>>>Sudeep Shekhar Mishra
> >>>>
> >>>>
> >>>
> >>
> >
> >
> >
> >--
> >
> >Thanks & Regards,
> >
> >
> >Sudeep Shekhar Mishra
> >
> >
> >+91-9167519029
> >[email protected]
> >
> >
>
>
> --
>
> Thanks & Regards,
>
> Sudeep Shekhar Mishra
>
> +91-9167519029
> [email protected]
>



-- 
Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
[email protected]

Re: PutDistributedMapCache

Reply via email to