Hello Sudeep, 

I put up a patch on the GetDistributedMapCache ticket[1]. Let me know what you 
think.

The PutDistributedMapCache processor and GetDistributedMapCache work with the 
data as a byte[] so it should be format agnostic. That being said it will be up 
to you to know what is in there in order to use it later.

[1] https://issues.apache.org/jira/browse/NIFI-1382
 
Joe
- - - - - - 
Joseph Percivall
linkedin.com/in/Percivall
e: [email protected]



On Tuesday, January 12, 2016 11:34 PM, sudeep mishra <[email protected]> 
wrote:



Thanks Joe.

I do not have specific configuration as of now as I am still exploring NiFi. 
Though I think it would be helpful to let user store and retrieve the cache 
values in different formats json, avro etc.

Thanks & Regards,

Sudeep





On Tue, Jan 12, 2016 at 9:15 PM, Joe Percivall <[email protected]> wrote:

Hello Sudeep,
>
>
>We are currently lacking a "GetDistributedMapCache" processor that corresponds 
>to the "PutDistributedMapCache". I created a ticket[1] and will be working on 
>it today. If you have any comments, configuration suggestions, etc. please let 
>me know or comment on the ticket.
>
>
>[1] https://issues.apache.org/jira/browse/NIFI-1382
> 
>Joe
>- - - - - - 
>Joseph Percivall
>linkedin.com/in/Percivall
>e: [email protected]
>
>
>
>
>
>On Tuesday, January 12, 2016 9:46 AM, sudeep mishra <[email protected]> 
>wrote:
>
>
>
>Thanks Matt.
>
>
>In my data flow I am expected to perform certain validations on data. I am 
>loading some SQLServer data into HDFSusing Sqoop (not part of NiFi flow). For 
>each record in HDFS file I have to query another database and then save the 
>validated record again in HDFS which will be processed bysome Spark jobs.
>
>
>Since I have to query for each record thus I was planning to cache the 
>database records against which I have to validate the HDFS. Thus I was 
>evaluating the DistributedCacheServer. But looks like its purpose is 
>different. Alternatively can we integrate Redis or another distributed cache 
>with NiFi as I do not see any processor for it.
>
>
>Appreciate your help.
>
>
>Thanks & Regards,
>
>
>Sudeep
>
>
>
>
>On Tue, Jan 12, 2016 at 6:59 PM, Matthew Clarke <[email protected]> 
>wrote:
>
>Sudeep,
>>       I was a little off on my second scenario.  The detectduplicate 
>> processor uses the distributedcache service all on its own.. Files that are 
>> route through it are loaded into the cache if they do not already exist in 
>> the cache.  if they do already exist they are routed to duplicate.  The 
>> putDistributedCache processor was a community contribution to which there 
>> are no processor that make use of the info that it caches.
>>
>>       We should probably build a processor that would make use of the data 
>> that can be loaded by the putDistributeCache processor.  Is there a 
>> particular use case you are trying to solve where this would be applicable?
>>
>>
>>Thanks,
>>Matt
>>
>>
>>On Tue, Jan 12, 2016 at 8:11 AM, Matthew Clarke <[email protected]> 
>>wrote:
>>
>>Sudeep,
>>>    The DistributedMapCache is typically used to prevent the consumption of 
>>> duplicate data by some of the ingest type processors (GetHBASE, ListHDFS, 
>>> and ListSFTP).  NiFi uses the service to keep a listing of what has been 
>>> consumed so the same files are not consumed multiple times. The Service can 
>>> also be used to detect if duplicate data already exists within a NiFi 
>>> Instance or cluster. This would be the scenario where some source is 
>>> pushing data to your NiFi and perhaps they push the same data more than 
>>> once. You want to catch these duplicates so you can perhaps kick them out 
>>> of your flow. For this you would use the PutDistributedCache processor to 
>>> cache all incoming data and then use the DetectDuplicate processor to find 
>>> those duplicates.
>>>
>>>    Was there a different use case you were looking to solve using the 
>>> Distributed cache service?
>>>
>>>
>>>Thanks,
>>>Matt
>>>
>>>
>>>On Tue, Jan 12, 2016 at 4:36 AM, sudeep mishra <[email protected]> 
>>>wrote:
>>>
>>>Hi,
>>>>
>>>>
>>>>I can cache some data to be used in NiFi flow. I can see the processor 
>>>>PutDistributedMapCache in the documentation which saves key-value pairs in 
>>>>DistributedMapCache for NiFi but I do not see any processor to red this 
>>>>data. How can I read data from DistributedMapCache in my data flow?
>>>>
>>>>
>>>>
>>>>
>>>>Thanks & Regards,
>>>>
>>>>
>>>>Sudeep Shekhar Mishra
>>>>
>>>>
>>>
>>
>
>
>
>-- 
>
>Thanks & Regards,
>
>
>Sudeep Shekhar Mishra
>
>
>+91-9167519029
>[email protected]
>
>


-- 

Thanks & Regards,

Sudeep Shekhar Mishra

+91-9167519029
[email protected]

Reply via email to