Re: How to proper use DistributedCacheServer ?

2021-02-15 Thread Steven Matison
Mike Thomsen did some work for a cassandra backing of DistributedMapCache:


https://issues.apache.org/jira/browse/NIFI-7821 



I am excited to see this coming in 1.13 and I will be using it on a few use 
cases here with both Cassandra and Astra (cassandra as a service).



Steven Matison | Data Architect | datastax.com

> On Feb 15, 2021, at 3:00 AM, Pierre Villard  
> wrote:
> 
> There will be an Hazelcast implementation (embedded into NiFi) that will be 
> released with NiFi 1.13. It comes with some limitations that you need to be 
> aware of (example: the data is not persisted in case of a cluster restart) 
> but I believe all of this is documented in the "additional details" of the 
> new controller services.
> 
> Thanks,
> Pierre
> 
> Le ven. 12 févr. 2021 à 19:21, Jorge Machado  > a écrit :
> Thanks to all ! 
> I was thinking about hazelcast hashmap 
> 
> 
>> On 12. Feb 2021, at 16:08, Chris Sampson > > wrote:
>> 
>> I'm pretty sure they don't, you need to use an external implementation, such 
>> as Redis.
>> 
>> The DistributedMapCacheServer is a reference implementation but there are 
>> alternatives - see the list of available implementations in the "Distributed 
>> Cache Service" property of the PutDistributedMapCache processor docs [1].
>> 
>> You can also implement your own by extending the DistributedCacheServer 
>> class.
>> 
>> 
>> [1]: 
>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apache.nifi.processors.standard.PutDistributedMapCache/index.html
>>  
>> 
>> 
>> ---
>> Chris Sampson
>> IT Consultant
>> chris.samp...@naimuri.com 
>>  
>> 
>> 
>> 
>> On Fri, 12 Feb 2021 at 14:48, Jorge Machado > > wrote:
>> Hey everyone, 
>> 
>> Is there any documentation on how to use DistributedCacheServer ? Currently 
>> from what I see this is single point of failure  or does it really sync the 
>> data between nodes ? 
>> 
>> I want to have something similar to zookeeper state but not in zookeeper 
>> because it needs to be available between processors. My rough problem: 
>> 
>> 
>> Flow : 
>> 
>> *  Start the flow and store a state (Similar to the QueryTable, which uses 
>> zookeeper to store the max row)
>> *  do some processing
>> *  update the state
>> 
>> 
>> This needs to be reachable between servers of course. What I tested the 
>> DistributedMapCacheClientService  needs a server to connect which I point to 
>> localhost. But if the entry that I need is on another server how do they 
>> sync the data ? 
>> 
>> Thx
> 



Re: How to proper use DistributedCacheServer ?

2021-02-15 Thread Pierre Villard
There will be an Hazelcast implementation (embedded into NiFi) that will be
released with NiFi 1.13. It comes with some limitations that you need to be
aware of (example: the data is not persisted in case of a cluster restart)
but I believe all of this is documented in the "additional details" of the
new controller services.

Thanks,
Pierre

Le ven. 12 févr. 2021 à 19:21, Jorge Machado  a écrit :

> Thanks to all !
> I was thinking about hazelcast hashmap
>
>
> On 12. Feb 2021, at 16:08, Chris Sampson 
> wrote:
>
> I'm pretty sure they don't, you need to use an external implementation,
> such as Redis.
>
> The DistributedMapCacheServer is a reference implementation but there are
> alternatives - see the list of available implementations in the
> "Distributed Cache Service" property of the PutDistributedMapCache
> processor docs [1].
>
> You can also implement your own by extending the
> DistributedCacheServer class.
>
>
> [1]:
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apache.nifi.processors.standard.PutDistributedMapCache/index.html
>
> ---
> *Chris Sampson*
> IT Consultant
> chris.samp...@naimuri.com
> 
>
>
> On Fri, 12 Feb 2021 at 14:48, Jorge Machado  wrote:
>
>> Hey everyone,
>>
>> Is there any documentation on how to use DistributedCacheServer ?
>> Currently from what I see this is single point of failure  or does it
>> really sync the data between nodes ?
>>
>> I want to have something similar to zookeeper state but not in zookeeper
>> because it needs to be available between processors. My rough problem:
>>
>>
>> Flow :
>>
>> *  Start the flow and store a state (Similar to the QueryTable, which
>> uses zookeeper to store the max row)
>> *  do some processing
>> *  update the state
>>
>>
>> This needs to be reachable between servers of course. What I tested the
>> DistributedMapCacheClientService  needs a server to connect which I point
>> to localhost. But if the entry that I need is on another server how do they
>> sync the data ?
>>
>> Thx
>
>
>


Re: How to proper use DistributedCacheServer ?

2021-02-12 Thread Jorge Machado
Thanks to all ! 
I was thinking about hazelcast hashmap 


> On 12. Feb 2021, at 16:08, Chris Sampson  wrote:
> 
> I'm pretty sure they don't, you need to use an external implementation, such 
> as Redis.
> 
> The DistributedMapCacheServer is a reference implementation but there are 
> alternatives - see the list of available implementations in the "Distributed 
> Cache Service" property of the PutDistributedMapCache processor docs [1].
> 
> You can also implement your own by extending the DistributedCacheServer class.
> 
> 
> [1]: 
> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apache.nifi.processors.standard.PutDistributedMapCache/index.html
>  
> 
> 
> ---
> Chris Sampson
> IT Consultant
> chris.samp...@naimuri.com 
>  
> 
> 
> On Fri, 12 Feb 2021 at 14:48, Jorge Machado  > wrote:
> Hey everyone, 
> 
> Is there any documentation on how to use DistributedCacheServer ? Currently 
> from what I see this is single point of failure  or does it really sync the 
> data between nodes ? 
> 
> I want to have something similar to zookeeper state but not in zookeeper 
> because it needs to be available between processors. My rough problem: 
> 
> 
> Flow : 
> 
> *  Start the flow and store a state (Similar to the QueryTable, which uses 
> zookeeper to store the max row)
> *  do some processing
> *  update the state
> 
> 
> This needs to be reachable between servers of course. What I tested the 
> DistributedMapCacheClientService  needs a server to connect which I point to 
> localhost. But if the entry that I need is on another server how do they sync 
> the data ? 
> 
> Thx



Re: How to proper use DistributedCacheServer ?

2021-02-12 Thread Chris Sampson
I'm pretty sure they don't, you need to use an external implementation,
such as Redis.

The DistributedMapCacheServer is a reference implementation but there are
alternatives - see the list of available implementations in the
"Distributed Cache Service" property of the PutDistributedMapCache
processor docs [1].

You can also implement your own by extending the
DistributedCacheServer class.


[1]:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apache.nifi.processors.standard.PutDistributedMapCache/index.html

---
*Chris Sampson*
IT Consultant
chris.samp...@naimuri.com



On Fri, 12 Feb 2021 at 14:48, Jorge Machado  wrote:

> Hey everyone,
>
> Is there any documentation on how to use DistributedCacheServer ?
> Currently from what I see this is single point of failure  or does it
> really sync the data between nodes ?
>
> I want to have something similar to zookeeper state but not in zookeeper
> because it needs to be available between processors. My rough problem:
>
>
> Flow :
>
> *  Start the flow and store a state (Similar to the QueryTable, which uses
> zookeeper to store the max row)
> *  do some processing
> *  update the state
>
>
> This needs to be reachable between servers of course. What I tested the
> DistributedMapCacheClientService  needs a server to connect which I point
> to localhost. But if the entry that I need is on another server how do they
> sync the data ?
>
> Thx


Re: How to proper use DistributedCacheServer ?

2021-02-12 Thread Bryan Bende
If you use the provided DMC client and server, then you have to point it at
one of the servers (not local host), and it is single point of failure.

There are other client implementations that use Redis, HBase, Couchbase,
and maybe others. These can give you HA.

On Fri, Feb 12, 2021 at 9:48 AM Jorge Machado  wrote:

> Hey everyone,
>
> Is there any documentation on how to use DistributedCacheServer ?
> Currently from what I see this is single point of failure  or does it
> really sync the data between nodes ?
>
> I want to have something similar to zookeeper state but not in zookeeper
> because it needs to be available between processors. My rough problem:
>
>
> Flow :
>
> *  Start the flow and store a state (Similar to the QueryTable, which uses
> zookeeper to store the max row)
> *  do some processing
> *  update the state
>
>
> This needs to be reachable between servers of course. What I tested the
> DistributedMapCacheClientService  needs a server to connect which I point
> to localhost. But if the entry that I need is on another server how do they
> sync the data ?
>
> Thx

-- 
Sent from Gmail Mobile


How to proper use DistributedCacheServer ?

2021-02-12 Thread Jorge Machado
Hey everyone, 

Is there any documentation on how to use DistributedCacheServer ? Currently 
from what I see this is single point of failure  or does it really sync the 
data between nodes ? 

I want to have something similar to zookeeper state but not in zookeeper 
because it needs to be available between processors. My rough problem: 


Flow : 

*  Start the flow and store a state (Similar to the QueryTable, which uses 
zookeeper to store the max row)
*  do some processing
*  update the state


This needs to be reachable between servers of course. What I tested the 
DistributedMapCacheClientService  needs a server to connect which I point to 
localhost. But if the entry that I need is on another server how do they sync 
the data ? 

Thx