Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-02-05 Thread Bela Ban


On 2/5/12 5:06 PM, Dan Berindei wrote:
> On Sun, Feb 5, 2012 at 5:56 PM, Bela Ban  wrote:
>>
>>
>> On 2/5/12 4:40 PM, Dan Berindei wrote:
>>
>>
 Remember that a message might be retransmitted, so it is placed into a
 retransmit buffer. If M1 has destination A and M2 has destination B, and
 we send M1 first (to A), then change M1's destination to B, and send it,
 everything is fine. However, if we later get a retransmit request from
 B, we'd resend the message to A instead ! This is just 1 example,
 modifications of headers is another one.

 Note that the copy does *not* copy the buffer (payload) itself, but only
 references it, so this is fast. Of course, nobody is supposed to modify
 the contents of the buffer itself...

>>>
>>> I wasn't clear enough, but I didn't mean we should reuse the entire
>>> Message object. I meant we should copy the Message but not the buffer
>>> or the headers. I see now that protocols may be adding new headers, so
>>> it wouldn't be safe to reuse the headers collection.
>>>
>>> I think this line in
>>> RequestCorrelator.sendRequest(RequestCorrelator.java:152) means that
>>> the contents of the buffer is copied in the new message, not just the
>>> buffer reference:
>>>
>>>   Message copy=msg.copy(true);
>>
>>
>> No, this does *not* copy the buffer, but simply references the same buffer.
>>
>
> Aha, I thought copy_buffer == true meant "copy the contents" and
> copy_buffer == false meant "share the contents". I see copy_buffer ==
> true actually means "copy the reference, share the contents" and
> copy_buffer == false means "don't copy anything".


Correct.


> I will modify our CommandAwareRpcDispatcher to use GroupRequest and
> see how they compare, then we can continue this discussion with the results.

Excellent, I'm curious to see whether this makes any difference !
Cheers,

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-02-05 Thread Dan Berindei
On Sun, Feb 5, 2012 at 5:56 PM, Bela Ban  wrote:
>
>
> On 2/5/12 4:40 PM, Dan Berindei wrote:
>
>
>>> Remember that a message might be retransmitted, so it is placed into a
>>> retransmit buffer. If M1 has destination A and M2 has destination B, and
>>> we send M1 first (to A), then change M1's destination to B, and send it,
>>> everything is fine. However, if we later get a retransmit request from
>>> B, we'd resend the message to A instead ! This is just 1 example,
>>> modifications of headers is another one.
>>>
>>> Note that the copy does *not* copy the buffer (payload) itself, but only
>>> references it, so this is fast. Of course, nobody is supposed to modify
>>> the contents of the buffer itself...
>>>
>>
>> I wasn't clear enough, but I didn't mean we should reuse the entire
>> Message object. I meant we should copy the Message but not the buffer
>> or the headers. I see now that protocols may be adding new headers, so
>> it wouldn't be safe to reuse the headers collection.
>>
>> I think this line in
>> RequestCorrelator.sendRequest(RequestCorrelator.java:152) means that
>> the contents of the buffer is copied in the new message, not just the
>> buffer reference:
>>
>>                  Message copy=msg.copy(true);
>
>
> No, this does *not* copy the buffer, but simply references the same buffer.
>

Aha, I thought copy_buffer == true meant "copy the contents" and
copy_buffer == false meant "share the contents". I see copy_buffer ==
true actually means "copy the reference, share the contents" and
copy_buffer == false means "don't copy anything".

I will modify our CommandAwareRpcDispatcher to use GroupRequest and
see how they compare, then we can continue this discussion with the
results.

Cheers
Dan

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-02-05 Thread Bela Ban


On 2/5/12 4:40 PM, Dan Berindei wrote:


>> Remember that a message might be retransmitted, so it is placed into a
>> retransmit buffer. If M1 has destination A and M2 has destination B, and
>> we send M1 first (to A), then change M1's destination to B, and send it,
>> everything is fine. However, if we later get a retransmit request from
>> B, we'd resend the message to A instead ! This is just 1 example,
>> modifications of headers is another one.
>>
>> Note that the copy does *not* copy the buffer (payload) itself, but only
>> references it, so this is fast. Of course, nobody is supposed to modify
>> the contents of the buffer itself...
>>
>
> I wasn't clear enough, but I didn't mean we should reuse the entire
> Message object. I meant we should copy the Message but not the buffer
> or the headers. I see now that protocols may be adding new headers, so
> it wouldn't be safe to reuse the headers collection.
>
> I think this line in
> RequestCorrelator.sendRequest(RequestCorrelator.java:152) means that
> the contents of the buffer is copied in the new message, not just the
> buffer reference:
>
>  Message copy=msg.copy(true);


No, this does *not* copy the buffer, but simply references the same buffer.

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-02-05 Thread Dan Berindei
On Sun, Feb 5, 2012 at 12:24 PM, Bela Ban  wrote:
>
>
> On 2/5/12 9:44 AM, Dan Berindei wrote:
>> On Sat, Feb 4, 2012 at 5:23 PM, Bela Ban  wrote:
>>> No, Socket.send() is not a hotspot in the Java code.
>>>
>>> On the networking level, with UDP datagrams, a packet is placed into a
>>> buffer and send() returns after the packet has been placed into the
>>> buffer successfully. Some network stacks simply discard the packet if
>>> the buffer is full, I assume.
>>>
>>> With TCP, send() blocks until the packet has been placed into the buffer
>>> and an ack has been received. If the receiver can't keep up with the
>>> sending rate, it'll reduce the window to 0, so send() blocks until the
>>> receiver catches up.
>>>
>>> So if you send 2 requests in paralle over TCP, both threads would
>>> block() on the send.
>>>
>>> So I don't think parallelization would make sense here, unless of course
>>> you're doing something else, such as serialization, lock acquisition etc...
>>>
>>
>> You're probably right...
>>
>> When I sent the email I had only seen that DataGramSocket.send takes
>> 0.4% of the get worker's total time (which is huge considering that
>> 99.3% is spent waiting for the remote node's response). I didn't
>> notice that it's synchronized and the actual sending only takes half
>> of that - sending the messages in parallel would create more lock
>> contention on the socket *and* in JGroups.
>>
>> On the other hand, we're sending the messages to two different nodes,
>> on two different sockets, so sending in parallel *may* improve the
>> response time in scenarios with few worker threads. Certainly not
>> worth making our heavy-load performance worse, but I thought it was
>> worth mentioning.
>
>
> If you think this makes a difference, why don't you make a temporary
> code change and measure its effect on performance ?
>

I will give it a try, it's just that writing emails is a bit easier ;-)

>
>> Anyway, my primary motivation for this question was that I believe we
>> could use GroupRequest instead of our FutureCollator to send our
>> commands in parallel. They both do essentially the same thing, except
>> FutureCollator has and extra indirection layer because it uses
>> UnicastRequest.
>
>
> Ah, ok. Yes, I think that 1 GroupRequest of 2 might be more efficient
> than 2 UnicastRequests for sending an 'anycast' message to 2 members.
> Perhaps the marshalling is done only once (I'd have to confirm that
> though) and we're only creating 1 data structure and add it to a hashmap
> (request/response correlation)... Certainly worth giving a try...
>

We are only serializing once with the FutureCollator approach, and we
don't copy the buffer either.

>
>>  If there is any performance discrepancy at the moment
>> between GroupRequest and FutureCollator+UnicastRequest, I don't think
>> there is any fundamental reason why we can't "fix" GroupRequest to be
>> just as efficient as FutureCollator+UnicastRequest.
>
>
> You're assuming that FutureCollator+UnicastRequest is faster than
> GroupRequest, what are you basing your assumption on ? As I outlined
> above, I'd rather assume the opposite, although I don't know FutureCollator.
>

I did say *if*...

> Maybe we should have a chat next week to go through the code that sends
> an anycast to 2 members, wdyt ?
>

Sounds good, we should also also talk about how to implement staggered
get requests best.

>
>> I think I just saw one such fix: in RequestCorrelator.sendRequest, if
>> anycasting is enabled then it's making a copy of the buffer for each
>> target member. I don't think that is necessary at all, in fact I think
>> it should reuse both the buffer and the headers and only change the
>> destination address.
>
> No, this *is* necessary, I don't just make copies because I think this
> is fun !! :-)
>
> Remember that a message might be retransmitted, so it is placed into a
> retransmit buffer. If M1 has destination A and M2 has destination B, and
> we send M1 first (to A), then change M1's destination to B, and send it,
> everything is fine. However, if we later get a retransmit request from
> B, we'd resend the message to A instead ! This is just 1 example,
> modifications of headers is another one.
>
> Note that the copy does *not* copy the buffer (payload) itself, but only
> references it, so this is fast. Of course, nobody is supposed to modify
> the contents of the buffer itself...
>

I wasn't clear enough, but I didn't mean we should reuse the entire
Message object. I meant we should copy the Message but not the buffer
or the headers. I see now that protocols may be adding new headers, so
it wouldn't be safe to reuse the headers collection.

I think this line in
RequestCorrelator.sendRequest(RequestCorrelator.java:152) means that
the contents of the buffer is copied in the new message, not just the
buffer reference:

Message copy=msg.copy(true);


Cheers
Dan

___
infinispan-dev mailing list
infinispan-dev@

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-02-05 Thread Bela Ban


On 2/5/12 9:44 AM, Dan Berindei wrote:
> On Sat, Feb 4, 2012 at 5:23 PM, Bela Ban  wrote:
>> No, Socket.send() is not a hotspot in the Java code.
>>
>> On the networking level, with UDP datagrams, a packet is placed into a
>> buffer and send() returns after the packet has been placed into the
>> buffer successfully. Some network stacks simply discard the packet if
>> the buffer is full, I assume.
>>
>> With TCP, send() blocks until the packet has been placed into the buffer
>> and an ack has been received. If the receiver can't keep up with the
>> sending rate, it'll reduce the window to 0, so send() blocks until the
>> receiver catches up.
>>
>> So if you send 2 requests in paralle over TCP, both threads would
>> block() on the send.
>>
>> So I don't think parallelization would make sense here, unless of course
>> you're doing something else, such as serialization, lock acquisition etc...
>>
>
> You're probably right...
>
> When I sent the email I had only seen that DataGramSocket.send takes
> 0.4% of the get worker's total time (which is huge considering that
> 99.3% is spent waiting for the remote node's response). I didn't
> notice that it's synchronized and the actual sending only takes half
> of that - sending the messages in parallel would create more lock
> contention on the socket *and* in JGroups.
>
> On the other hand, we're sending the messages to two different nodes,
> on two different sockets, so sending in parallel *may* improve the
> response time in scenarios with few worker threads. Certainly not
> worth making our heavy-load performance worse, but I thought it was
> worth mentioning.


If you think this makes a difference, why don't you make a temporary 
code change and measure its effect on performance ?


> Anyway, my primary motivation for this question was that I believe we
> could use GroupRequest instead of our FutureCollator to send our
> commands in parallel. They both do essentially the same thing, except
> FutureCollator has and extra indirection layer because it uses
> UnicastRequest.


Ah, ok. Yes, I think that 1 GroupRequest of 2 might be more efficient 
than 2 UnicastRequests for sending an 'anycast' message to 2 members. 
Perhaps the marshalling is done only once (I'd have to confirm that 
though) and we're only creating 1 data structure and add it to a hashmap 
(request/response correlation)... Certainly worth giving a try...


>  If there is any performance discrepancy at the moment
> between GroupRequest and FutureCollator+UnicastRequest, I don't think
> there is any fundamental reason why we can't "fix" GroupRequest to be
> just as efficient as FutureCollator+UnicastRequest.


You're assuming that FutureCollator+UnicastRequest is faster than 
GroupRequest, what are you basing your assumption on ? As I outlined 
above, I'd rather assume the opposite, although I don't know FutureCollator.

Maybe we should have a chat next week to go through the code that sends 
an anycast to 2 members, wdyt ?


> I think I just saw one such fix: in RequestCorrelator.sendRequest, if
> anycasting is enabled then it's making a copy of the buffer for each
> target member. I don't think that is necessary at all, in fact I think
> it should reuse both the buffer and the headers and only change the
> destination address.

No, this *is* necessary, I don't just make copies because I think this 
is fun !! :-)

Remember that a message might be retransmitted, so it is placed into a 
retransmit buffer. If M1 has destination A and M2 has destination B, and 
we send M1 first (to A), then change M1's destination to B, and send it, 
everything is fine. However, if we later get a retransmit request from 
B, we'd resend the message to A instead ! This is just 1 example, 
modifications of headers is another one.

Note that the copy does *not* copy the buffer (payload) itself, but only 
references it, so this is fast. Of course, nobody is supposed to modify 
the contents of the buffer itself...

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-02-05 Thread Dan Berindei
On Sat, Feb 4, 2012 at 5:23 PM, Bela Ban  wrote:
> No, Socket.send() is not a hotspot in the Java code.
>
> On the networking level, with UDP datagrams, a packet is placed into a
> buffer and send() returns after the packet has been placed into the
> buffer successfully. Some network stacks simply discard the packet if
> the buffer is full, I assume.
>
> With TCP, send() blocks until the packet has been placed into the buffer
> and an ack has been received. If the receiver can't keep up with the
> sending rate, it'll reduce the window to 0, so send() blocks until the
> receiver catches up.
>
> So if you send 2 requests in paralle over TCP, both threads would
> block() on the send.
>
> So I don't think parallelization would make sense here, unless of course
> you're doing something else, such as serialization, lock acquisition etc...
>

You're probably right...

When I sent the email I had only seen that DataGramSocket.send takes
0.4% of the get worker's total time (which is huge considering that
99.3% is spent waiting for the remote node's response). I didn't
notice that it's synchronized and the actual sending only takes half
of that - sending the messages in parallel would create more lock
contention on the socket *and* in JGroups.

On the other hand, we're sending the messages to two different nodes,
on two different sockets, so sending in parallel *may* improve the
response time in scenarios with few worker threads. Certainly not
worth making our heavy-load performance worse, but I thought it was
worth mentioning.


Anyway, my primary motivation for this question was that I believe we
could use GroupRequest instead of our FutureCollator to send our
commands in parallel. They both do essentially the same thing, except
FutureCollator has and extra indirection layer because it uses
UnicastRequest. If there is any performance discrepancy at the moment
between GroupRequest and FutureCollator+UnicastRequest, I don't think
there is any fundamental reason why we can't "fix" GroupRequest to be
just as efficient as FutureCollator+UnicastRequest.

I think I just saw one such fix: in RequestCorrelator.sendRequest, if
anycasting is enabled then it's making a copy of the buffer for each
target member. I don't think that is necessary at all, in fact I think
it should reuse both the buffer and the headers and only change the
destination address.

Cheers
Dan


>
> On 2/4/12 3:43 PM, Manik Surtani wrote:
>> Is that a micro-optimisation?  Do we know that socket.send() really is a 
>> hotspot?
>>
>> On 1 Feb 2012, at 00:11, Dan Berindei wrote:
>>
>>> It's true, but then JGroups' GroupRequest does exactly the same thing...
>>>
>>> socket.send() takes some time too, I thought sending the request in
>>> parallel would mean calling socket.send() on a separate thread for
>>> each recipient.
>>>
>>> Cheers
>>> Dan
>>>
>>>
>>> On Fri, Jan 27, 2012 at 6:41 PM, Manik Surtani  wrote:
 Doesn't setBlockForResults(false) mean that we're not waiting on a 
 response, and can proceed to the next message to the next recipient?

 On 27 Jan 2012, at 16:34, Dan Berindei wrote:

> Manik, Bela, I think we send the requests sequentially as well. In
> ReplicationTask.call:
>
>                for (Address a : targets) {
>                   NotifyingFuture  f =
> sendMessageWithFuture(constructMessage(buf, a), opts);
>                   futureCollator.watchFuture(f, a);
>                }
>
>
> In MessageDispatcher.sendMessageWithFuture:
>
>         UnicastRequest  req=new UnicastRequest(msg, corr, dest, 
> options);
>         req.setBlockForResults(false);
>         req.execute();
>
>
> Did we use to send each request on a separate thread?
>
>
> Cheers
> Dan
>
>
> On Fri, Jan 27, 2012 at 1:21 PM, Bela Ban  wrote:
>> yes.
>>
>> On 1/27/12 12:13 PM, Manik Surtani wrote:
>>>
>>> On 25 Jan 2012, at 09:42, Bela Ban wrote:
>>>
 No, parallel unicasts will be faster, as an anycast to A,B,C sends the
 unicasts sequentially
>>>
>>> Is this still the case in JG 3.x?
>>
>>
>> --
>> Bela Ban
>> Lead JGroups (http://www.jgroups.org)
>> JBoss / Red Hat
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

 --
 Manik Surtani
 ma...@jboss.org
 twitter.com/maniksurtani

 Lead, Infinispan
 http://www.infinispan.org




 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> 

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-02-04 Thread Bela Ban
No, Socket.send() is not a hotspot in the Java code.

On the networking level, with UDP datagrams, a packet is placed into a 
buffer and send() returns after the packet has been placed into the 
buffer successfully. Some network stacks simply discard the packet if 
the buffer is full, I assume.

With TCP, send() blocks until the packet has been placed into the buffer 
and an ack has been received. If the receiver can't keep up with the 
sending rate, it'll reduce the window to 0, so send() blocks until the 
receiver catches up.

So if you send 2 requests in paralle over TCP, both threads would 
block() on the send.

So I don't think parallelization would make sense here, unless of course 
you're doing something else, such as serialization, lock acquisition etc...


On 2/4/12 3:43 PM, Manik Surtani wrote:
> Is that a micro-optimisation?  Do we know that socket.send() really is a 
> hotspot?
>
> On 1 Feb 2012, at 00:11, Dan Berindei wrote:
>
>> It's true, but then JGroups' GroupRequest does exactly the same thing...
>>
>> socket.send() takes some time too, I thought sending the request in
>> parallel would mean calling socket.send() on a separate thread for
>> each recipient.
>>
>> Cheers
>> Dan
>>
>>
>> On Fri, Jan 27, 2012 at 6:41 PM, Manik Surtani  wrote:
>>> Doesn't setBlockForResults(false) mean that we're not waiting on a 
>>> response, and can proceed to the next message to the next recipient?
>>>
>>> On 27 Jan 2012, at 16:34, Dan Berindei wrote:
>>>
 Manik, Bela, I think we send the requests sequentially as well. In
 ReplicationTask.call:

for (Address a : targets) {
   NotifyingFuture  f =
 sendMessageWithFuture(constructMessage(buf, a), opts);
   futureCollator.watchFuture(f, a);
}


 In MessageDispatcher.sendMessageWithFuture:

 UnicastRequest  req=new UnicastRequest(msg, corr, dest, 
 options);
 req.setBlockForResults(false);
 req.execute();


 Did we use to send each request on a separate thread?


 Cheers
 Dan


 On Fri, Jan 27, 2012 at 1:21 PM, Bela Ban  wrote:
> yes.
>
> On 1/27/12 12:13 PM, Manik Surtani wrote:
>>
>> On 25 Jan 2012, at 09:42, Bela Ban wrote:
>>
>>> No, parallel unicasts will be faster, as an anycast to A,B,C sends the
>>> unicasts sequentially
>>
>> Is this still the case in JG 3.x?
>
>
> --
> Bela Ban
> Lead JGroups (http://www.jgroups.org)
> JBoss / Red Hat
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>> --
>>> Manik Surtani
>>> ma...@jboss.org
>>> twitter.com/maniksurtani
>>>
>>> Lead, Infinispan
>>> http://www.infinispan.org
>>>
>>>
>>>
>>>
>>> ___
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> ma...@jboss.org
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-02-04 Thread Manik Surtani
Is that a micro-optimisation?  Do we know that socket.send() really is a 
hotspot?

On 1 Feb 2012, at 00:11, Dan Berindei wrote:

> It's true, but then JGroups' GroupRequest does exactly the same thing...
> 
> socket.send() takes some time too, I thought sending the request in
> parallel would mean calling socket.send() on a separate thread for
> each recipient.
> 
> Cheers
> Dan
> 
> 
> On Fri, Jan 27, 2012 at 6:41 PM, Manik Surtani  wrote:
>> Doesn't setBlockForResults(false) mean that we're not waiting on a response, 
>> and can proceed to the next message to the next recipient?
>> 
>> On 27 Jan 2012, at 16:34, Dan Berindei wrote:
>> 
>>> Manik, Bela, I think we send the requests sequentially as well. In
>>> ReplicationTask.call:
>>> 
>>>   for (Address a : targets) {
>>>  NotifyingFuture f =
>>> sendMessageWithFuture(constructMessage(buf, a), opts);
>>>  futureCollator.watchFuture(f, a);
>>>   }
>>> 
>>> 
>>> In MessageDispatcher.sendMessageWithFuture:
>>> 
>>>UnicastRequest req=new UnicastRequest(msg, corr, dest, 
>>> options);
>>>req.setBlockForResults(false);
>>>req.execute();
>>> 
>>> 
>>> Did we use to send each request on a separate thread?
>>> 
>>> 
>>> Cheers
>>> Dan
>>> 
>>> 
>>> On Fri, Jan 27, 2012 at 1:21 PM, Bela Ban  wrote:
 yes.
 
 On 1/27/12 12:13 PM, Manik Surtani wrote:
> 
> On 25 Jan 2012, at 09:42, Bela Ban wrote:
> 
>> No, parallel unicasts will be faster, as an anycast to A,B,C sends the
>> unicasts sequentially
> 
> Is this still the case in JG 3.x?
 
 
 --
 Bela Ban
 Lead JGroups (http://www.jgroups.org)
 JBoss / Red Hat
 ___
 infinispan-dev mailing list
 infinispan-dev@lists.jboss.org
 https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>> ___
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> 
>> --
>> Manik Surtani
>> ma...@jboss.org
>> twitter.com/maniksurtani
>> 
>> Lead, Infinispan
>> http://www.infinispan.org
>> 
>> 
>> 
>> 
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> 
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-31 Thread Dan Berindei
It's true, but then JGroups' GroupRequest does exactly the same thing...

socket.send() takes some time too, I thought sending the request in
parallel would mean calling socket.send() on a separate thread for
each recipient.

Cheers
Dan


On Fri, Jan 27, 2012 at 6:41 PM, Manik Surtani  wrote:
> Doesn't setBlockForResults(false) mean that we're not waiting on a response, 
> and can proceed to the next message to the next recipient?
>
> On 27 Jan 2012, at 16:34, Dan Berindei wrote:
>
>> Manik, Bela, I think we send the requests sequentially as well. In
>> ReplicationTask.call:
>>
>>               for (Address a : targets) {
>>                  NotifyingFuture f =
>> sendMessageWithFuture(constructMessage(buf, a), opts);
>>                  futureCollator.watchFuture(f, a);
>>               }
>>
>>
>> In MessageDispatcher.sendMessageWithFuture:
>>
>>        UnicastRequest req=new UnicastRequest(msg, corr, dest, options);
>>        req.setBlockForResults(false);
>>        req.execute();
>>
>>
>> Did we use to send each request on a separate thread?
>>
>>
>> Cheers
>> Dan
>>
>>
>> On Fri, Jan 27, 2012 at 1:21 PM, Bela Ban  wrote:
>>> yes.
>>>
>>> On 1/27/12 12:13 PM, Manik Surtani wrote:

 On 25 Jan 2012, at 09:42, Bela Ban wrote:

> No, parallel unicasts will be faster, as an anycast to A,B,C sends the
> unicasts sequentially

 Is this still the case in JG 3.x?
>>>
>>>
>>> --
>>> Bela Ban
>>> Lead JGroups (http://www.jgroups.org)
>>> JBoss / Red Hat
>>> ___
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> ma...@jboss.org
> twitter.com/maniksurtani
>
> Lead, Infinispan
> http://www.infinispan.org
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-27 Thread Manik Surtani
Doesn't setBlockForResults(false) mean that we're not waiting on a response, 
and can proceed to the next message to the next recipient?

On 27 Jan 2012, at 16:34, Dan Berindei wrote:

> Manik, Bela, I think we send the requests sequentially as well. In
> ReplicationTask.call:
> 
>   for (Address a : targets) {
>  NotifyingFuture f =
> sendMessageWithFuture(constructMessage(buf, a), opts);
>  futureCollator.watchFuture(f, a);
>   }
> 
> 
> In MessageDispatcher.sendMessageWithFuture:
> 
>UnicastRequest req=new UnicastRequest(msg, corr, dest, options);
>req.setBlockForResults(false);
>req.execute();
> 
> 
> Did we use to send each request on a separate thread?
> 
> 
> Cheers
> Dan
> 
> 
> On Fri, Jan 27, 2012 at 1:21 PM, Bela Ban  wrote:
>> yes.
>> 
>> On 1/27/12 12:13 PM, Manik Surtani wrote:
>>> 
>>> On 25 Jan 2012, at 09:42, Bela Ban wrote:
>>> 
 No, parallel unicasts will be faster, as an anycast to A,B,C sends the
 unicasts sequentially
>>> 
>>> Is this still the case in JG 3.x?
>> 
>> 
>> --
>> Bela Ban
>> Lead JGroups (http://www.jgroups.org)
>> JBoss / Red Hat
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org




___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-27 Thread Dan Berindei
Manik, Bela, I think we send the requests sequentially as well. In
ReplicationTask.call:

   for (Address a : targets) {
  NotifyingFuture f =
sendMessageWithFuture(constructMessage(buf, a), opts);
  futureCollator.watchFuture(f, a);
   }


In MessageDispatcher.sendMessageWithFuture:

UnicastRequest req=new UnicastRequest(msg, corr, dest, options);
req.setBlockForResults(false);
req.execute();


Did we use to send each request on a separate thread?


Cheers
Dan


On Fri, Jan 27, 2012 at 1:21 PM, Bela Ban  wrote:
> yes.
>
> On 1/27/12 12:13 PM, Manik Surtani wrote:
>>
>> On 25 Jan 2012, at 09:42, Bela Ban wrote:
>>
>>> No, parallel unicasts will be faster, as an anycast to A,B,C sends the
>>> unicasts sequentially
>>
>> Is this still the case in JG 3.x?
>
>
> --
> Bela Ban
> Lead JGroups (http://www.jgroups.org)
> JBoss / Red Hat
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-27 Thread Manik Surtani

On 25 Jan 2012, at 14:22, Mircea Markus wrote:

> Agreed that having a configurable remote get policy makes sense. 
> We already have a JIRA for this[1], I'll start working on it as the 
> performance results are hunting me.
> I'd like to have Dan's input on this as well first, as he has worked with 
> remote gets and I still don't know why null results are not considered valid 
> :)
> 
> [1] https://issues.jboss.org/browse/ISPN-825

IMO we should work on a configurable scheme such as:

numInitialRemoteGets (default 1)
remoteGetTimeout (default 500ms?  What's our average remote GET time anyway?)

So with this scheme, we'd:

* Randomly select 'numInitialRemoteGets' from dataOwners
* Send them the remote get
* After remoteGetTimeout, send a remote get to the next data owner, 
until we run out of data owners
* Return the first valid response.

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-27 Thread Manik Surtani

On 25 Jan 2012, at 17:09, Dan Berindei wrote:

> I don't think the OOB thread is that costly, it doesn't block on
> anything (not even on state transfer!) so the most expensive part is
> reading the key and writing the value. BTW Sanne, we may want to run
> Transactional with a smaller payload size ;)

It does though - as Bela mentioned OOB threads compete with regular incoming 
threads for NAKACK windows, etc.  But in any case, this is an unnecessary cost.

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-27 Thread Bela Ban
yes.

On 1/27/12 12:13 PM, Manik Surtani wrote:
>
> On 25 Jan 2012, at 09:42, Bela Ban wrote:
>
>> No, parallel unicasts will be faster, as an anycast to A,B,C sends the
>> unicasts sequentially
>
> Is this still the case in JG 3.x?


-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-27 Thread Manik Surtani

On 25 Jan 2012, at 17:09, Dan Berindei wrote:

> I think we already have a JIRA to make PutKeyValueCommands return the
> previous value, that would eliminate lots of GetKeyValueCommands and
> it would actually improve the performance of puts - we should probably
> make this a priority.

Yes, this is definitely important.

https://issues.jboss.org/browse/ISPN-317



--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-27 Thread Manik Surtani

On 25 Jan 2012, at 17:09, Dan Berindei wrote:

> 
> Keep in mind that we also want to introduce eventual consistency - I
> think that's going to eliminate our optimization opportunity here
> because we'll need to get the values from a majority of owners (if not
> all the owners).


Also keep in mind that an eventually consistent mode will be a (non-default) 
option.  I still see most people using Infinispan in a strongly consistent mode.
--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-27 Thread Manik Surtani

On 25 Jan 2012, at 09:42, Bela Ban wrote:

> No, parallel unicasts will be faster, as an anycast to A,B,C sends the 
> unicasts sequentially

Is this still the case in JG 3.x?

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-27 Thread Manik Surtani

On 25 Jan 2012, at 08:51, Dan Berindei wrote:

> Slightly related, I wonder if Manik's comment is still true:
> 
>if at all possible, try not to use JGroups' ANYCAST for now.
> Multiple (parallel) UNICASTs are much faster.)
> 
> Intuitively it shouldn't be true, unicasts+FutureCollator do basically
> the same thing as anycast+GroupRequest.

Yes, this is outdated and may not be an issue with JGroups 3.x anymore.  The 
problem, IIRC, was that ANYCAST would end up doing UNICASTs in sequence and not 
in parallel.

--
Manik Surtani
ma...@jboss.org
twitter.com/maniksurtani

Lead, Infinispan
http://www.infinispan.org



___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-26 Thread Mircea Markus

On 26 Jan 2012, at 14:52, Sanne Grinovero wrote:

> On 26 January 2012 14:37, Mircea Markus  wrote:
 I'd like to have Dan's input on this as well first, as he has worked with
 remote gets and I still don't know why null results are not considered 
 valid
 :)
>>> 
>>> Pre-5.0 during state transfer an owner could return null to mean "I'm
>>> not sure", so the caller would ignore it unless every target returned
>>> null.
>>> That's no longer necessary, but it wasn't broken so I didn't fix it...
>> 
>> Thinking some more about this, it actually has an performance impact: if the 
>> entry does not exist then there's no point in waiting for all replies, but 
>> juts for the first one.
> 
> Of course, and there are more performance-affecting details: maybe you
> know it's not existing, but this information is not cacheable in L1.
> When having passivation, you might need to look into the cachestore to
> make sure, etc...
> 
> It might be worth to consider having alternative strategies to store
> null, like using some marker.
> This introduces however the problem of
> garbage collection of entries pointing to this marker: but this
> problem might not be a problem for some applications, or we use
> LIRS/LRU as it's just an optimization: we can afford losing the
> specific information.
+1, having null markers in L1 might increase performance for certain scenarios.
> 
> As I've mentioned to Manik in London, we will likely need markers
> anyway to resolve conflicts with MVCC: if a thread replaces K1,
> another removes it, and then another adds it back.. the version
> information was lost during the delete operation.
that's actually applicable for Optimistic locking as well :-)


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-26 Thread Sanne Grinovero
On 26 January 2012 14:37, Mircea Markus  wrote:
>>> I'd like to have Dan's input on this as well first, as he has worked with
>>> remote gets and I still don't know why null results are not considered valid
>>> :)
>>
>> Pre-5.0 during state transfer an owner could return null to mean "I'm
>> not sure", so the caller would ignore it unless every target returned
>> null.
>> That's no longer necessary, but it wasn't broken so I didn't fix it...
>
> Thinking some more about this, it actually has an performance impact: if the 
> entry does not exist then there's no point in waiting for all replies, but 
> juts for the first one.

Of course, and there are more performance-affecting details: maybe you
know it's not existing, but this information is not cacheable in L1.
When having passivation, you might need to look into the cachestore to
make sure, etc...

It might be worth to consider having alternative strategies to store
null, like using some marker. This introduces however the problem of
garbage collection of entries pointing to this marker: but this
problem might not be a problem for some applications, or we use
LIRS/LRU as it's just an optimization: we can afford losing the
specific information.

As I've mentioned to Manik in London, we will likely need markers
anyway to resolve conflicts with MVCC: if a thread replaces K1,
another removes it, and then another adds it back.. the version
information was lost during the delete operation.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-26 Thread Mircea Markus
>> I'd like to have Dan's input on this as well first, as he has worked with
>> remote gets and I still don't know why null results are not considered valid
>> :)
> 
> Pre-5.0 during state transfer an owner could return null to mean "I'm
> not sure", so the caller would ignore it unless every target returned
> null.
> That's no longer necessary, but it wasn't broken so I didn't fix it...

Thinking some more about this, it actually has an performance impact: if the 
entry does not exist then there's no point in waiting for all replies, but juts 
for the first one.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Sanne Grinovero
On 25 January 2012 17:09, Dan Berindei  wrote:
> On Wed, Jan 25, 2012 at 4:22 PM, Mircea Markus  
> wrote:
>>
>> One node might be busy doing GC and stay unresponsive for a whole
>>
>> second or longer, another one might be actually crashed and you didn't
>>
>> know that yet, these are unlikely but possible.
>>
>> All these are possible but I would rather consider them as exceptional
>> situations, possibly handled by a retry logic. We should *not* optimise for
>> that these situations IMO.
>>
>
> As Sanne pointed out, an exceptional situation on a node becomes
> ordinary with 100s or 1000s of nodes.
> So the default policy should scale the initial number of requests with
> numOwners.
>
>>
>> More likely, a rehash is in progress, you could then be asking a node
>>
>> which doesn't yet (or anymore) have the value.
>>
>>
>> this is a consistency issue and I think we can find a way to handle it some
>> other way.
>>
>
> With the current state transfer we always send ClusteredGetCommands to
> the old owners (and only the old owners). If a node didn't receive the
> entire state, it means that state transfer hasn't finished yet and the
> CH will not return it as an owner. But the CH could also return owners
> that are no longer members of the cluster, so we have to check for
> that before picking one owner to send the command to.
>
> In Sanne's non-blocking state transfer proposal I think a new owner
> may have to ask the old owner for the key value, so it would still
> never return null. But it might be less expensive to ask the old owner
> directly (assuming it's safe from a consistency POV).
>
>>
>> All good reasons for which imho it makes sense to send out "a couple"
>>
>> of requests in parallel, but I'd unlikely want to send more than 2,
>>
>> and I agree often 1 might be enough.
>>
>> Maybe it should even optimize for the most common case: send out just
>>
>> one, have a more aggressive timeout and in case of trouble ask for the
>>
>> next node.
>>
>> +1
>>
>
> -1 for aggressive timeouts... you're going to do the same work as you
> do now, except you're going to wait a bit between sending requests. If
> you're really unlucky the first target will return first but you'll
> ignore its response because the timeout already expired.


Agreed, what I meant with "more aggressive timeouts" is not the
overall timeout to fail the get, but we might have a second one which
is more aggressive by starting to send the next GET when the first one
is "starting to not look good"; so we would have a timeout for the
whole operation, and one which decides at which point after a single
GET RPC didn't return yet we start to ask to another node.
So even if the global timeout is something high like "10 seconds", if
after 40 ms I still didn't get a reply from the first node I think we
can start sending the next one.. but still wait to eventually get an
answer on the first.

>
>>
>> In addition, sending a single request might spare us some Future,
>>
>> await+notify messing in terms of CPU cost of sending the request.
>>
>> it's the remote OOB thread that's the most costly resource imo.
>>
>
> I don't think the OOB thread is that costly, it doesn't block on
> anything (not even on state transfer!) so the most expensive part is
> reading the key and writing the value. BTW Sanne, we may want to run
> Transactional with a smaller payload size ;)
>
> We could implement our own GroupRequest that sends the requests in
> parallel instead implementing FutureCollator on top of UnicastRequest
> and save some of that overhead on the caller.
>
> I think we already have a JIRA to make PutKeyValueCommands return the
> previous value, that would eliminate lots of GetKeyValueCommands and
> it would actually improve the performance of puts - we should probably
> make this a priority.

+1 !!

>
>>
>> I think I agree on all points, it makes more sense.
>> Just that in a large cluster, let's say
>> 1000 nodes, maybe I want 20 owners as a sweet spot for read/write
>> performance tradeoff, and with such high numbers I guess doing 2-3
>> gets in parallel might make sense as those "unlikely" events, suddenly
>> are an almost certain.. especially the rehash in progress.
>>
>> So I'd propose a separate configuration option for # parallel get
>> events, and one to define a "try next node" policy. Or this policy
>> should be the whole strategy, and the #gets one of the options for the
>> default implementation.
>>
>> Agreed that having a configurable remote get policy makes sense.
>> We already have a JIRA for this[1], I'll start working on it as the
>> performance results are hunting me.
>
> I'd rather focus on implementing one remote get policy that works
> instead of making it configurable - even if we make it configurable
> we'll have to focus our optimizations on the default policy.
>
> Keep in mind that we also want to introduce eventual consistency - I
> think that's going to eliminate our optimization opportunity here
> because we'll need to get the v

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Mircea Markus

On 25 Jan 2012, at 17:09, Dan Berindei wrote:

> On Wed, Jan 25, 2012 at 4:22 PM, Mircea Markus  
> wrote:
>> 
>> One node might be busy doing GC and stay unresponsive for a whole
>> 
>> second or longer, another one might be actually crashed and you didn't
>> 
>> know that yet, these are unlikely but possible.
>> 
>> All these are possible but I would rather consider them as exceptional
>> situations, possibly handled by a retry logic. We should *not* optimise for
>> that these situations IMO.
>> 
> 
> As Sanne pointed out, an exceptional situation on a node becomes
> ordinary with 100s or 1000s of nodes.

possible, but still that's not our every day use case yet. 
For the *default* value I'd rather consider an 4-20 cluster. The idea is to 
have numGets configurable, or even dynamic.
> So the default policy should scale the initial number of requests with
> numOwners.
Not sure what you mean by that. 
As you mention, there might be a correlation between the number of nodes to 
which to send the remote get and the cluster size.

> 
>> 
>> More likely, a rehash is in progress, you could then be asking a node
>> 
>> which doesn't yet (or anymore) have the value.
>> 
>> 
>> this is a consistency issue and I think we can find a way to handle it some
>> other way.
>> 
> 
> With the current state transfer we always send ClusteredGetCommands to
> the old owners (and only the old owners). If a node didn't receive the
> entire state, it means that state transfer hasn't finished yet and the
> CH will not return it as an owner. But the CH could also return owners
> that are no longer members of the cluster, so we have to check for
> that before picking one owner to send the command to.
> 
> In Sanne's non-blocking state transfer proposal I think a new owner
> may have to ask the old owner for the key value, so it would still
> never return null. But it might be less expensive to ask the old owner
> directly (assuming it's safe from a consistency POV).
> 
>> 
>> All good reasons for which imho it makes sense to send out "a couple"
>> 
>> of requests in parallel, but I'd unlikely want to send more than 2,
>> 
>> and I agree often 1 might be enough.
>> 
>> Maybe it should even optimize for the most common case: send out just
>> 
>> one, have a more aggressive timeout and in case of trouble ask for the
>> 
>> next node.
>> 
>> +1
>> 
> 
> -1 for aggressive timeouts... you're going to do the same work as you
> do now, except you're going to wait a bit between sending requests. If
> you're really unlucky the first target will return first but you'll
> ignore its response because the timeout already expired.
> 
>> 
>> In addition, sending a single request might spare us some Future,
>> 
>> await+notify messing in terms of CPU cost of sending the request.
>> 
>> it's the remote OOB thread that's the most costly resource imo.
>> 
> 
> I don't think the OOB thread is that costly, it doesn't block on
> anything (not even on state transfer!) so the most expensive part is
> reading the key and writing the value. BTW Sanne, we may want to run
> Transactional with a smaller payload size ;)
Yes, besides using the OOB pool unnecessarily, other resource are also 
costumed. Not sure I agree that OOB thread usage is not costly: this pool is 
also used for releasing locks and exhausting it might result in a chained 
performance degradation.
> 
> We could implement our own GroupRequest that sends the requests in
> parallel instead implementing FutureCollator on top of UnicastRequest
> and save some of that overhead on the caller.
> 
> I think we already have a JIRA to make PutKeyValueCommands return the
> previous value, that would eliminate lots of GetKeyValueCommands and
> it would actually improve the performance of puts - we should probably
> make this a priority.
Not saying that sending requests in parallel doesn't make sense: just 
questioning weather it makes sense to *always* send them in parallel. 
> 
>> 
>> I think I agree on all points, it makes more sense.
>> Just that in a large cluster, let's say
>> 1000 nodes, maybe I want 20 owners as a sweet spot for read/write
>> performance tradeoff, and with such high numbers I guess doing 2-3
>> gets in parallel might make sense as those "unlikely" events, suddenly
>> are an almost certain.. especially the rehash in progress.
>> 
>> So I'd propose a separate configuration option for # parallel get
>> events, and one to define a "try next node" policy. Or this policy
>> should be the whole strategy, and the #gets one of the options for the
>> default implementation.
>> 
>> Agreed that having a configurable remote get policy makes sense.
>> We already have a JIRA for this[1], I'll start working on it as the
>> performance results are hunting me.
> 
> I'd rather focus on implementing one remote get policy that works
> instead of making it configurable - even if we make it configurable
> we'll have to focus our optimizations on the default policy.

This *might* make a significant differen

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Dan Berindei
On Wed, Jan 25, 2012 at 4:22 PM, Mircea Markus  wrote:
>
> One node might be busy doing GC and stay unresponsive for a whole
>
> second or longer, another one might be actually crashed and you didn't
>
> know that yet, these are unlikely but possible.
>
> All these are possible but I would rather consider them as exceptional
> situations, possibly handled by a retry logic. We should *not* optimise for
> that these situations IMO.
>

As Sanne pointed out, an exceptional situation on a node becomes
ordinary with 100s or 1000s of nodes.
So the default policy should scale the initial number of requests with
numOwners.

>
> More likely, a rehash is in progress, you could then be asking a node
>
> which doesn't yet (or anymore) have the value.
>
>
> this is a consistency issue and I think we can find a way to handle it some
> other way.
>

With the current state transfer we always send ClusteredGetCommands to
the old owners (and only the old owners). If a node didn't receive the
entire state, it means that state transfer hasn't finished yet and the
CH will not return it as an owner. But the CH could also return owners
that are no longer members of the cluster, so we have to check for
that before picking one owner to send the command to.

In Sanne's non-blocking state transfer proposal I think a new owner
may have to ask the old owner for the key value, so it would still
never return null. But it might be less expensive to ask the old owner
directly (assuming it's safe from a consistency POV).

>
> All good reasons for which imho it makes sense to send out "a couple"
>
> of requests in parallel, but I'd unlikely want to send more than 2,
>
> and I agree often 1 might be enough.
>
> Maybe it should even optimize for the most common case: send out just
>
> one, have a more aggressive timeout and in case of trouble ask for the
>
> next node.
>
> +1
>

-1 for aggressive timeouts... you're going to do the same work as you
do now, except you're going to wait a bit between sending requests. If
you're really unlucky the first target will return first but you'll
ignore its response because the timeout already expired.

>
> In addition, sending a single request might spare us some Future,
>
> await+notify messing in terms of CPU cost of sending the request.
>
> it's the remote OOB thread that's the most costly resource imo.
>

I don't think the OOB thread is that costly, it doesn't block on
anything (not even on state transfer!) so the most expensive part is
reading the key and writing the value. BTW Sanne, we may want to run
Transactional with a smaller payload size ;)

We could implement our own GroupRequest that sends the requests in
parallel instead implementing FutureCollator on top of UnicastRequest
and save some of that overhead on the caller.

I think we already have a JIRA to make PutKeyValueCommands return the
previous value, that would eliminate lots of GetKeyValueCommands and
it would actually improve the performance of puts - we should probably
make this a priority.

>
> I think I agree on all points, it makes more sense.
> Just that in a large cluster, let's say
> 1000 nodes, maybe I want 20 owners as a sweet spot for read/write
> performance tradeoff, and with such high numbers I guess doing 2-3
> gets in parallel might make sense as those "unlikely" events, suddenly
> are an almost certain.. especially the rehash in progress.
>
> So I'd propose a separate configuration option for # parallel get
> events, and one to define a "try next node" policy. Or this policy
> should be the whole strategy, and the #gets one of the options for the
> default implementation.
>
> Agreed that having a configurable remote get policy makes sense.
> We already have a JIRA for this[1], I'll start working on it as the
> performance results are hunting me.

I'd rather focus on implementing one remote get policy that works
instead of making it configurable - even if we make it configurable
we'll have to focus our optimizations on the default policy.

Keep in mind that we also want to introduce eventual consistency - I
think that's going to eliminate our optimization opportunity here
because we'll need to get the values from a majority of owners (if not
all the owners).

> I'd like to have Dan's input on this as well first, as he has worked with
> remote gets and I still don't know why null results are not considered valid
> :)

Pre-5.0 during state transfer an owner could return null to mean "I'm
not sure", so the caller would ignore it unless every target returned
null.
That's no longer necessary, but it wasn't broken so I didn't fix it...

Cheers
Dan

>
> [1] https://issues.jboss.org/browse/ISPN-825
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Mircea Markus
>>> 
>>> One node might be busy doing GC and stay unresponsive for a whole
>>> second or longer, another one might be actually crashed and you didn't
>>> know that yet, these are unlikely but possible.
>> All these are possible but I would rather consider them as exceptional 
>> situations, possibly handled by a retry logic. We should *not* optimise for 
>> that these situations IMO.
>> Thinking about our last performance results, we have avg 26kgets per 
>> second. Now with numOwners = 2, these means that each node handles 26k 
>> *redundant* gets every second: I'm not concerned about the network load, as 
>> Bela mentioned in a previous mail the network link should not be the 
>> bottleneck, but there's a huge unnecessary activity in OOB threads which 
>> should rather be used for releasing locks or whatever needed. On top of 
>> that, this consuming activity highly encourages GC pauses, as the effort for 
>> a get is practically numOwners higher than it should be.
>> 
>>> More likely, a rehash is in progress, you could then be asking a node
>>> which doesn't yet (or anymore) have the value.
>> 
>> this is a consistency issue and I think we can find a way to handle it some 
>> other way.
>>> 
>>> All good reasons for which imho it makes sense to send out "a couple"
>>> of requests in parallel, but I'd unlikely want to send more than 2,
>>> and I agree often 1 might be enough.
>>> Maybe it should even optimize for the most common case: send out just
>>> one, have a more aggressive timeout and in case of trouble ask for the
>>> next node.
>> +1
>>> 
>>> In addition, sending a single request might spare us some Future,
>>> await+notify messing in terms of CPU cost of sending the request.
>> it's the remote OOB thread that's the most costly resource imo.
> 
> I think I agree on all points, it makes more sense.
> Just that in a large cluster, let's say
> 1000 nodes, maybe I want 20 owners as a sweet spot for read/write
> performance tradeoff, and with such high numbers I guess doing 2-3
> gets in parallel might make sense as those "unlikely" events, suddenly
> are an almost certain.. especially the rehash in progress.
> So I'd propose a separate configuration option for # parallel get
> events, and one to define a "try next node" policy. Or this policy
> should be the whole strategy, and the #gets one of the options for the
> default implementation.

Agreed that having a configurable remote get policy makes sense. 
We already have a JIRA for this[1], I'll start working on it as the performance 
results are hunting me.
I'd like to have Dan's input on this as well first, as he has worked with 
remote gets and I still don't know why null results are not considered valid :)

[1] https://issues.jboss.org/browse/ISPN-825___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Sanne Grinovero
On 25 January 2012 13:41, Mircea Markus  wrote:
>
> On 25 Jan 2012, at 13:25, Sanne Grinovero wrote:
>
>> [cut]
 I agree, we should not ask all replicas for the same information.
 Asking only one is the opposite though: I think this should be a
 configuration option to ask for any value between (1 and numOwner).
 That's because I understand it might be beneficial to ask to more than
 one node immediately,
>>> why is it more beneficial to ask multiple members than a single one? I 
>>> guess it doesn't have to do with consistency, as in that case it would be 
>>> required (vs beneficial).
>>> Is it because one of the nodes might reply faster? I'm not that sure that 
>>> compensates the burden of numOwner-1 additional RPCs, but a benchmark will 
>>> tell us just that.
>>
>> One node might be busy doing GC and stay unresponsive for a whole
>> second or longer, another one might be actually crashed and you didn't
>> know that yet, these are unlikely but possible.
> All these are possible but I would rather consider them as exceptional 
> situations, possibly handled by a retry logic. We should *not* optimise for 
> that these situations IMO.
> Thinking about our last performance results, we have avg 26k    gets per 
> second. Now with numOwners = 2, these means that each node handles 26k 
> *redundant* gets every second: I'm not concerned about the network load, as 
> Bela mentioned in a previous mail the network link should not be the 
> bottleneck, but there's a huge unnecessary activity in OOB threads which 
> should rather be used for releasing locks or whatever needed. On top of that, 
> this consuming activity highly encourages GC pauses, as the effort for a get 
> is practically numOwners higher than it should be.
>
>> More likely, a rehash is in progress, you could then be asking a node
>> which doesn't yet (or anymore) have the value.
>
> this is a consistency issue and I think we can find a way to handle it some 
> other way.
>>
>> All good reasons for which imho it makes sense to send out "a couple"
>> of requests in parallel, but I'd unlikely want to send more than 2,
>> and I agree often 1 might be enough.
>> Maybe it should even optimize for the most common case: send out just
>> one, have a more aggressive timeout and in case of trouble ask for the
>> next node.
> +1
>>
>> In addition, sending a single request might spare us some Future,
>> await+notify messing in terms of CPU cost of sending the request.
> it's the remote OOB thread that's the most costly resource imo.

I think I agree on all points, it makes more sense.
Just that in a large cluster, let's say
1000 nodes, maybe I want 20 owners as a sweet spot for read/write
performance tradeoff, and with such high numbers I guess doing 2-3
gets in parallel might make sense as those "unlikely" events, suddenly
are an almost certain.. especially the rehash in progress.

So I'd propose a separate configuration option for # parallel get
events, and one to define a "try next node" policy. Or this policy
should be the whole strategy, and the #gets one of the options for the
default implementation.

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Mircea Markus

On 25 Jan 2012, at 13:25, Sanne Grinovero wrote:

> [cut]
>>> I agree, we should not ask all replicas for the same information.
>>> Asking only one is the opposite though: I think this should be a
>>> configuration option to ask for any value between (1 and numOwner).
>>> That's because I understand it might be beneficial to ask to more than
>>> one node immediately,
>> why is it more beneficial to ask multiple members than a single one? I guess 
>> it doesn't have to do with consistency, as in that case it would be required 
>> (vs beneficial).
>> Is it because one of the nodes might reply faster? I'm not that sure that 
>> compensates the burden of numOwner-1 additional RPCs, but a benchmark will 
>> tell us just that.
> 
> One node might be busy doing GC and stay unresponsive for a whole
> second or longer, another one might be actually crashed and you didn't
> know that yet, these are unlikely but possible.
All these are possible but I would rather consider them as exceptional 
situations, possibly handled by a retry logic. We should *not* optimise for 
that these situations IMO. 
Thinking about our last performance results, we have avg 26kgets per 
second. Now with numOwners = 2, these means that each node handles 26k 
*redundant* gets every second: I'm not concerned about the network load, as 
Bela mentioned in a previous mail the network link should not be the 
bottleneck, but there's a huge unnecessary activity in OOB threads which should 
rather be used for releasing locks or whatever needed. On top of that, this 
consuming activity highly encourages GC pauses, as the effort for a get is 
practically numOwners higher than it should be.

> More likely, a rehash is in progress, you could then be asking a node
> which doesn't yet (or anymore) have the value.

this is a consistency issue and I think we can find a way to handle it some 
other way. 
> 
> All good reasons for which imho it makes sense to send out "a couple"
> of requests in parallel, but I'd unlikely want to send more than 2,
> and I agree often 1 might be enough.
> Maybe it should even optimize for the most common case: send out just
> one, have a more aggressive timeout and in case of trouble ask for the
> next node.
+1
> 
> In addition, sending a single request might spare us some Future,
> await+notify messing in terms of CPU cost of sending the request.
it's the remote OOB thread that's the most costly resource imo.


___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Sanne Grinovero
[cut]
>> I agree, we should not ask all replicas for the same information.
>> Asking only one is the opposite though: I think this should be a
>> configuration option to ask for any value between (1 and numOwner).
>> That's because I understand it might be beneficial to ask to more than
>> one node immediately,
> why is it more beneficial to ask multiple members than a single one? I guess 
> it doesn't have to do with consistency, as in that case it would be required 
> (vs beneficial).
> Is it because one of the nodes might reply faster? I'm not that sure that 
> compensates the burden of numOwner-1 additional RPCs, but a benchmark will 
> tell us just that.

One node might be busy doing GC and stay unresponsive for a whole
second or longer, another one might be actually crashed and you didn't
know that yet, these are unlikely but possible.
More likely, a rehash is in progress, you could then be asking a node
which doesn't yet (or anymore) have the value.

All good reasons for which imho it makes sense to send out "a couple"
of requests in parallel, but I'd unlikely want to send more than 2,
and I agree often 1 might be enough.
Maybe it should even optimize for the most common case: send out just
one, have a more aggressive timeout and in case of trouble ask for the
next node.

In addition, sending a single request might spare us some Future,
await+notify messing in terms of CPU cost of sending the request.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Mircea Markus

On 25 Jan 2012, at 12:06, Sanne Grinovero wrote:

> On 25 January 2012 11:48, Mircea Markus  wrote:
>> 
>> On 25 Jan 2012, at 08:51, Dan Berindei wrote:
>> 
>>> Hi Sanne
>>> 
>>> On Wed, Jan 25, 2012 at 1:22 AM, Sanne Grinovero  
>>> wrote:
 Hello,
 in the method:
 org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(Object,
 InvocationContext, boolean)
 
 we have:
 
  List targets = locate(key);
  // if any of the recipients has left the cluster since the
 command was issued, just don't wait for its response
  targets.retainAll(rpcManager.getTransport().getMembers());
 
 But then then we use ResponseMode.WAIT_FOR_VALID_RESPONSE, which means
 we're not going to wait for all responses anyway, and I think we might
 assume to get a reply by a node which actually is in the cluster.
 
 So the retainAll method is unneeded and can be removed? I'm wondering,
 because it's not safe anyway, actually it seems very unlikely to me
 that just between a locate(key) and the retainAll the view is being
 changed, so not something we should be relying on anyway.
 I'd rather assume that such a get method might be checked and
 eventually dropped by the receiver.
 
>>> 
>>> The locate method will return a list of owners based on the
>>> "committed" cache view, so there is a non-zero probability that one of
>>> the owners has already left.
>>> 
>>> If I remember correctly, I added the retainAll call because otherwise
>>> ClusteredGetResponseFilter.needMoreResponses() would keep returning
>>> true if one of the targets left the cluster. Coupled with the fact
>>> that null responses are not considered valid (unless *all* responses
>>> are null), this meant that a remote get for a key that doesn't exist
>>> would throw a TimeoutException after 15 seconds instead of returning
>>> null immediately.
>>> 
>>> We could revisit the decision to make null responses invalid, and then
>>> as long as there is still one of the old owners left in the cluster
>>> you'll get the null result immediately.
>> can't we just directly to a single node for getting a remote value? The main 
>> data owner perhaps. If the node is down we can retry to another node. Going 
>> to multiple nodes seems like a waist of resources - network in this case.
> 
> I agree, we should not ask all replicas for the same information.
> Asking only one is the opposite though: I think this should be a
> configuration option to ask for any value between (1 and numOwner).
> That's because I understand it might be beneficial to ask to more than
> one node immediately,
why is it more beneficial to ask multiple members than a single one? I guess it 
doesn't have to do with consistency, as in that case it would be required (vs 
beneficial).
Is it because one of the nodes might reply faster? I'm not that sure that 
compensates the burden of numOwner-1 additional RPCs, but a benchmark will tell 
us just that. 
> but assuming we have many owners it would be
> nice to pick only a subset.
> 
> A second configuration option should care about which strategy the
> subset is selected. In case we ask only one node, I'm not sure if the
> first node would be the best option.
The main data owner would be a good fit if the distribution is even (we can 
assume that's the case) and all the keys are accessed with the same frequency.
The later is out of our control,  even though, if L1 is enabled (default) then 
we can assume it as well. Or we can use a RND.
> 
> Dan, thank you for your explanation, this makes much more sense now.
> Indeed I agree as well that we should revisit the interpretation of
> "return null", that's not an unlikely case since every put will run a
> get first.
I don't see why null responses are not considered valid unless all the 
responses are null, Dan, can you perhaps comment on this?
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Mircea Markus

On 25 Jan 2012, at 12:06, Bela Ban wrote:

> 
> 
> On 1/25/12 12:58 PM, Mircea Markus wrote:
>> 
>> On 25 Jan 2012, at 09:42, Bela Ban wrote:
>> 
>>> 
>>> 
>>> On 1/25/12 9:51 AM, Dan Berindei wrote:
>>> 
 Slightly related, I wonder if Manik's comment is still true:
 
 if at all possible, try not to use JGroups' ANYCAST for now.
 Multiple (parallel) UNICASTs are much faster.)
 
 Intuitively it shouldn't be true, unicasts+FutureCollator do basically
 the same thing as anycast+GroupRequest.
>>> 
>>> 
>>> No, parallel unicasts will be faster, as an anycast to A,B,C sends the
>>> unicasts sequentially
>> Thanks, very good to know that.
>> 
>> I'm a a bit confused by the jgroups terminology though :-)
>> My understanding of the term ANYCAST is that the message is sent to *one* of 
>> the A,B,C. But from what I read here it is sent to A, B and C - that's what 
>> I know as MULTICAST.
> 
> 
> No, here's the definition:
> * anycast: message sent to a subset S of members N. The message is sent 
> to all members in S as sequential unicasts. S <= N
> * multicast: cluster-wide message, sent to all members N of a cluster. 
> This can be done via UDP (IP multicast) or TCP
> * IP multicast: the network level datagram packet with a class D address 
> as destination
> * broadcast: IP packet sent to all hosts on a given range same host, 
> subnet or higher)
Thanks for the clarification Bela. I've been using wikipedia[1]  as a reference 
and the terms have a slightly different meaning there.

[1] http://en.wikipedia.org/wiki/Anycast#Addressing_methodologies___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Sanne Grinovero
On 25 January 2012 11:48, Mircea Markus  wrote:
>
> On 25 Jan 2012, at 08:51, Dan Berindei wrote:
>
>> Hi Sanne
>>
>> On Wed, Jan 25, 2012 at 1:22 AM, Sanne Grinovero  
>> wrote:
>>> Hello,
>>> in the method:
>>> org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(Object,
>>> InvocationContext, boolean)
>>>
>>> we have:
>>>
>>>      List targets = locate(key);
>>>      // if any of the recipients has left the cluster since the
>>> command was issued, just don't wait for its response
>>>      targets.retainAll(rpcManager.getTransport().getMembers());
>>>
>>> But then then we use ResponseMode.WAIT_FOR_VALID_RESPONSE, which means
>>> we're not going to wait for all responses anyway, and I think we might
>>> assume to get a reply by a node which actually is in the cluster.
>>>
>>> So the retainAll method is unneeded and can be removed? I'm wondering,
>>> because it's not safe anyway, actually it seems very unlikely to me
>>> that just between a locate(key) and the retainAll the view is being
>>> changed, so not something we should be relying on anyway.
>>> I'd rather assume that such a get method might be checked and
>>> eventually dropped by the receiver.
>>>
>>
>> The locate method will return a list of owners based on the
>> "committed" cache view, so there is a non-zero probability that one of
>> the owners has already left.
>>
>> If I remember correctly, I added the retainAll call because otherwise
>> ClusteredGetResponseFilter.needMoreResponses() would keep returning
>> true if one of the targets left the cluster. Coupled with the fact
>> that null responses are not considered valid (unless *all* responses
>> are null), this meant that a remote get for a key that doesn't exist
>> would throw a TimeoutException after 15 seconds instead of returning
>> null immediately.
>>
>> We could revisit the decision to make null responses invalid, and then
>> as long as there is still one of the old owners left in the cluster
>> you'll get the null result immediately.
> can't we just directly to a single node for getting a remote value? The main 
> data owner perhaps. If the node is down we can retry to another node. Going 
> to multiple nodes seems like a waist of resources - network in this case.

I agree, we should not ask all replicas for the same information.
Asking only one is the opposite though: I think this should be a
configuration option to ask for any value between (1 and numOwner).
That's because I understand it might be beneficial to ask to more than
one node immediately, but assuming we have many owners it would be
nice to pick only a subset.

A second configuration option should care about which strategy the
subset is selected. In case we ask only one node, I'm not sure if the
first node would be the best option.

Dan, thank you for your explanation, this makes much more sense now.
Indeed I agree as well that we should revisit the interpretation of
"return null", that's not an unlikely case since every put will run a
get first.

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Bela Ban


On 1/25/12 12:58 PM, Mircea Markus wrote:
>
> On 25 Jan 2012, at 09:42, Bela Ban wrote:
>
>>
>>
>> On 1/25/12 9:51 AM, Dan Berindei wrote:
>>
>>> Slightly related, I wonder if Manik's comment is still true:
>>>
>>>  if at all possible, try not to use JGroups' ANYCAST for now.
>>> Multiple (parallel) UNICASTs are much faster.)
>>>
>>> Intuitively it shouldn't be true, unicasts+FutureCollator do basically
>>> the same thing as anycast+GroupRequest.
>>
>>
>> No, parallel unicasts will be faster, as an anycast to A,B,C sends the
>> unicasts sequentially
> Thanks, very good to know that.
>
> I'm a a bit confused by the jgroups terminology though :-)
> My understanding of the term ANYCAST is that the message is sent to *one* of 
> the A,B,C. But from what I read here it is sent to A, B and C - that's what I 
> know as MULTICAST.


No, here's the definition:
* anycast: message sent to a subset S of members N. The message is sent 
to all members in S as sequential unicasts. S <= N
* multicast: cluster-wide message, sent to all members N of a cluster. 
This can be done via UDP (IP multicast) or TCP
* IP multicast: the network level datagram packet with a class D address 
as destination
* broadcast: IP packet sent to all hosts on a given range same host, 
subnet or higher)


-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Mircea Markus

On 25 Jan 2012, at 09:42, Bela Ban wrote:

> 
> 
> On 1/25/12 9:51 AM, Dan Berindei wrote:
> 
>> Slightly related, I wonder if Manik's comment is still true:
>> 
>> if at all possible, try not to use JGroups' ANYCAST for now.
>> Multiple (parallel) UNICASTs are much faster.)
>> 
>> Intuitively it shouldn't be true, unicasts+FutureCollator do basically
>> the same thing as anycast+GroupRequest.
> 
> 
> No, parallel unicasts will be faster, as an anycast to A,B,C sends the 
> unicasts sequentially
Thanks, very good to know that.

I'm a a bit confused by the jgroups terminology though :-)
My understanding of the term ANYCAST is that the message is sent to *one* of 
the A,B,C. But from what I read here it is sent to A, B and C - that's what I 
know as MULTICAST.
More, on the discussion we had on IRC, jgroup's MULTICAST seemed to mean 
BROADCAST...
I hope I don't sound pedant here, I just want to understand the things 
correctly :-) 
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Mircea Markus

On 25 Jan 2012, at 08:51, Dan Berindei wrote:

> Hi Sanne
> 
> On Wed, Jan 25, 2012 at 1:22 AM, Sanne Grinovero  wrote:
>> Hello,
>> in the method:
>> org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(Object,
>> InvocationContext, boolean)
>> 
>> we have:
>> 
>>  List targets = locate(key);
>>  // if any of the recipients has left the cluster since the
>> command was issued, just don't wait for its response
>>  targets.retainAll(rpcManager.getTransport().getMembers());
>> 
>> But then then we use ResponseMode.WAIT_FOR_VALID_RESPONSE, which means
>> we're not going to wait for all responses anyway, and I think we might
>> assume to get a reply by a node which actually is in the cluster.
>> 
>> So the retainAll method is unneeded and can be removed? I'm wondering,
>> because it's not safe anyway, actually it seems very unlikely to me
>> that just between a locate(key) and the retainAll the view is being
>> changed, so not something we should be relying on anyway.
>> I'd rather assume that such a get method might be checked and
>> eventually dropped by the receiver.
>> 
> 
> The locate method will return a list of owners based on the
> "committed" cache view, so there is a non-zero probability that one of
> the owners has already left.
> 
> If I remember correctly, I added the retainAll call because otherwise
> ClusteredGetResponseFilter.needMoreResponses() would keep returning
> true if one of the targets left the cluster. Coupled with the fact
> that null responses are not considered valid (unless *all* responses
> are null), this meant that a remote get for a key that doesn't exist
> would throw a TimeoutException after 15 seconds instead of returning
> null immediately.
> 
> We could revisit the decision to make null responses invalid, and then
> as long as there is still one of the old owners left in the cluster
> you'll get the null result immediately. 
can't we just directly to a single node for getting a remote value? The main 
data owner perhaps. If the node is down we can retry to another node. Going to 
multiple nodes seems like a waist of resources - network in this case.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Bela Ban


On 1/25/12 9:51 AM, Dan Berindei wrote:

> Slightly related, I wonder if Manik's comment is still true:
>
>  if at all possible, try not to use JGroups' ANYCAST for now.
> Multiple (parallel) UNICASTs are much faster.)
>
> Intuitively it shouldn't be true, unicasts+FutureCollator do basically
> the same thing as anycast+GroupRequest.


No, parallel unicasts will be faster, as an anycast to A,B,C sends the 
unicasts sequentially

-- 
Bela Ban
Lead JGroups (http://www.jgroups.org)
JBoss / Red Hat
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev


Re: [infinispan-dev] DIST.retrieveFromRemoteSource

2012-01-25 Thread Dan Berindei
Hi Sanne

On Wed, Jan 25, 2012 at 1:22 AM, Sanne Grinovero  wrote:
> Hello,
> in the method:
> org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(Object,
> InvocationContext, boolean)
>
> we have:
>
>      List targets = locate(key);
>      // if any of the recipients has left the cluster since the
> command was issued, just don't wait for its response
>      targets.retainAll(rpcManager.getTransport().getMembers());
>
> But then then we use ResponseMode.WAIT_FOR_VALID_RESPONSE, which means
> we're not going to wait for all responses anyway, and I think we might
> assume to get a reply by a node which actually is in the cluster.
>
> So the retainAll method is unneeded and can be removed? I'm wondering,
> because it's not safe anyway, actually it seems very unlikely to me
> that just between a locate(key) and the retainAll the view is being
> changed, so not something we should be relying on anyway.
> I'd rather assume that such a get method might be checked and
> eventually dropped by the receiver.
>

The locate method will return a list of owners based on the
"committed" cache view, so there is a non-zero probability that one of
the owners has already left.

If I remember correctly, I added the retainAll call because otherwise
ClusteredGetResponseFilter.needMoreResponses() would keep returning
true if one of the targets left the cluster. Coupled with the fact
that null responses are not considered valid (unless *all* responses
are null), this meant that a remote get for a key that doesn't exist
would throw a TimeoutException after 15 seconds instead of returning
null immediately.

We could revisit the decision to make null responses invalid, and then
as long as there is still one of the old owners left in the cluster
you'll get the null result immediately. You may still get an exception
if all the old owners left the cluster, but I'm not sure. I wish I had
added a test for this...

We may also be able to add a workaround in FutureCollator as well -
just remember that we use the same FutureCollator for writes in REPL
mode so it needs to work with GET_ALL as well as with GET_FIRST.

Slightly related, I wonder if Manik's comment is still true:

if at all possible, try not to use JGroups' ANYCAST for now.
Multiple (parallel) UNICASTs are much faster.)

Intuitively it shouldn't be true, unicasts+FutureCollator do basically
the same thing as anycast+GroupRequest.

Cheers
Dan


> Cheers,
> Sanne
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev

___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev