from:"\"Dan Berindei\""

Re: [infinispan-dev] Translating our events to JSR 107 events

2013-01-28 Thread Dan Berindei

On 25 Jan 2013 23:26, "Vladimir Blagojevic"  wrote:
>
> Hey,
>
> I figured out why cache listeners notifications were not fired. We have
> to add listener *after* cache.start() has been called. If listener is
> added before start it will not be registered.
>

This doesn't sound right... DefaultCacheManager calls cache.start()
automatically, so you should never have to call it explicitly.

Looking at the code, the listener should be registered even if the cache
wasn't started yet (e.g. ListenerRegistrationTest never calls start()). The
only odd thing is that if you stop a cache, all the listeners are lost.
Maybe that's what happened in your case?

> That aside I found some problems mapping our events to jsr 107 events.
> The problem is specifically with JSR107 CacheEntryCreatedListener and
> CacheEntryExpiredListener.
>
> The first one is not easy to implement because we need both key/value
> pair for jsr listener and our CacheEntryCreatedEvent does not provide
> value. I found some references where people used CacheEntryModified
> instead with pre being null and post being value to detect new entry in
> cache. In that case listener translator class would have to keep state
> and track pre(true/false) for CacheEntryModified, right? Any other way
> to do it?
>

You could add an interceptor to trigger your events, without using the
cache's notifications  at all.

If you're ok with changing the core, you could add a getValue() method to
CacheEntryCreatedEvent, and an isCreated() method to
CacheEntryModifiedEvent (as I suppose you don't want to call the updates
listener when an entry is created). Both changes should be
backwards-compatible.

> The second I have no idea how to implement as we do not have
> CacheEntryExpired event. True, spec is not rigorous that such an event
> has to be fired immediately after an entry has expired but eventually
> (which might be on access). Either way, I am all ears on suggestions how
> to implement this one.
>

I guess @CacheEntryEvicted/@CacheEntriesEvicted would be the closest thing
we have in Infinispan. But you can't check in the listener if the entry was
evicted because it expired or because there wasn't enough space in the data
container (yet).

Come to think of it, if an entry is evicted because of size constraints,
there isn't any way to keep track of its expiration time and invoke the
expiration listener on access either. Is the expiration event mandatory in
the JSR?

Cheers

Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] StoreByValueTest tck test

2013-01-28 Thread Dan Berindei

On Mon, Jan 28, 2013 at 1:56 PM, Manik Surtani  wrote:

> Let me clarify a few things on this thread.  THere seems to be a bit of
> confusion here.  :)
>
> storeAsBinary in Infinispan was designed with the following purposes in
> mind, in order of importance:
>
> 1) Performance.  Prevent serialising/deserializing an entry multiple times
> (e.g., to write through to disk, to replicate over the network, concurrent
> threads needing to read the object representation).
>
>
TBH I don't think storeAsBinary as it works now is that good for
performance, because MarshalledValueInterceptor compacts keys/values after
every operation (see MarshalledValueInterceptor.java:320 and its callers).
Once a key/value is deserialized, its serialized form is deleted, and it
has to be serialized again if a remote node asks for it.

So it would save at most one serialization compared to storing the entries
as references (and only if the entry also needs to be written to a cache
store). Instead it adds a bit of overhead on each operation to keep track
of the marshalled value status.



> 2) Classloader isolation (as Galder mentioned).  This became a secondary
> purpose of this feature (originally observed as a side-effect).  Enhanced
> by allowing storeKeyAsBinary and storeValueAsBinary options for more
> fine-grained control of this behaviour.
>
>
I'd say this has become the reason most people use this feature now, even
though a regular cache should work fine in AS7 there are still other
environments where it is needed.



> Now lets consider what JSR 107 needs.  Similarly named, the feature in JSR
> 107 serves a completely different purpose, and this is referential
> integrity.  Think database-style isolation (repeatable read, etc) where
> concurrent threads holding object references to the same value, and
> mutating the same value, are not visible until a commit.
>
> I originally thought that Infinispan's storeAsBinary can be used for this,
> but apparently not without some additional changes/tweaks.  Maybe we need:
>
> 1) A new config option for this behaviour.   defensive="true" /> ?
>
2) If enabled, maybe use a subclass of MarshalledValue
> (DefensiveMarshalledValue?) that *always* stores a byte[] and never caches
> the object representation?
>
>
I think we'd still need to cache the object instance while the command is
executing, otherwise we'll have too many deserializations. But perhaps the
new setting could control whether MarshalledValueInterceptor calls
MarshalledValue.compact with preferSerializedRepresentation == true instead
of false, as it does now.



> What do you think?
>
> Cheers
> Manik
>
> On 28 Jan 2013, at 10:00, Sanne Grinovero  wrote:
>
> > I remember Manik and me pair-programming on that class to simplify it
> > a bit - especially as there are some performance complexities - but we
> > ended up not touching it as any change would have violated some
> > expectations of one feature or another.
> >
> > Let's put this on the list of cleanups to be performed for 6.0?
> >
> > On 28 January 2013 09:14, Galder Zamarreño  wrote:
> >>
> >> On Jan 25, 2013, at 11:37 AM, Sanne Grinovero 
> wrote:
> >>
> >>> On 25 January 2013 11:11, Galder Zamarreño  wrote:
> 
>  On Jan 24, 2013, at 4:26 PM, Sanne Grinovero 
> wrote:
> 
> > It's important to note that Infinispan's implementation of storing as
> > binary isn't guaranteeing different instances of objects are returned
> > to different get() invocations (especially when they happen in
> > parallel).
> 
>  ^ Do you have a test for this?
> >>>
> >>> No, it's self-evident by reading the code. I'd venture saying it's a
> >>> design choice: the option was not designed to provide isolation,
> >>> people should not abuse of it for a different purpose.
> >>>
>  Could this be related to the fact that a get(), unless it had
> received that entry from another node, will held as reference?
> 
>  It'd be interesting if that test works if after a put() you call
> compact()...
> 
> > This is the reason for example that Hibernate OGM can't use this flag
> > to have safe and independent instances, but needs to make defensive
> > copies if returned values. As I read in your first post, you want to
> > use this for defensive copies: that doesn't work, especially if the
> > TCK is performing concurrent requests.
> 
>  ^ As I said, the storeAsBinary feature is heavily optimised for
> performance, hence why it initially keeps instances as references, so that
> if another thread requests the entry soon later, a reference is sent back
> (no need to serialize/deserialize the entry just put)
> >>>
> >>> As you say "the reference is sent back", even if it's the same
> >>> instance as a previous request. I have no doubt that's for performance
> >>> reasons: I patched that code myself and have carefully kept that
> >>> "feature" of instance reuse available.
> >>> I'm not sure it can provide much of a benefit generally spe

Re: [infinispan-dev] StoreByValueTest tck test

2013-01-28 Thread Dan Berindei

On Mon, Jan 28, 2013 at 2:43 PM, Manik Surtani  wrote:

>
> On 28 Jan 2013, at 12:35, Dan Berindei  wrote:
>
>
>
> On Mon, Jan 28, 2013 at 1:56 PM, Manik Surtani wrote:
>
>> Let me clarify a few things on this thread.  THere seems to be a bit of
>> confusion here.  :)
>>
>> storeAsBinary in Infinispan was designed with the following purposes in
>> mind, in order of importance:
>>
>> 1) Performance.  Prevent serialising/deserializing an entry multiple
>> times (e.g., to write through to disk, to replicate over the network,
>> concurrent threads needing to read the object representation).
>>
>>
> TBH I don't think storeAsBinary as it works now is that good for
> performance, because MarshalledValueInterceptor compacts keys/values after
> every operation (see MarshalledValueInterceptor.java:320 and its callers).
> Once a key/value is deserialized, its serialized form is deleted, and it
> has to be serialized again if a remote node asks for it.
>
>
> That is only correct on the node where you're running the operation.  The
> remote node has different characteristics.  The byte array is never
> deserialized when reading off the wire, always kept as a byte array, and
> when asked for again (via a remote GET) it just needs to do a buffer copy.
>  Now this is breaks the moment a thread local to that remote node looks up
> an entry,  but if you have some form of key affinity then you really see
> this benefit.
>
>
Correct, except if you ever do a local GET on the "remote" node, it will
deserialize the object and from that moment on it will have to serialize it
again for each remote GET.

Most of the time nodes are treated as interchangeable, and the probability
of a key being accessed from the "remote" node is the same as from any
other node.


> So it would save at most one serialization compared to storing the entries
> as references (and only if the entry also needs to be written to a cache
> store). Instead it adds a bit of overhead on each operation to keep track
> of the marshalled value status.
>
>
> Well, if you have > 1 cache store enabled, etc etc.
>
>
Or you could have no cache store...


>  2) Classloader isolation (as Galder mentioned).  This became a secondary
>> purpose of this feature (originally observed as a side-effect).  Enhanced
>> by allowing storeKeyAsBinary and storeValueAsBinary options for more
>> fine-grained control of this behaviour.
>>
>>
> I'd say this has become the reason most people use this feature now, even
> though a regular cache should work fine in AS7 there are still other
> environments where it is needed.
>
>
> Yes, they are both as important today; I was just stating what the
> original intentions were.  :)
>
>
>
>
>> Now lets consider what JSR 107 needs.  Similarly named, the feature in
>> JSR 107 serves a completely different purpose, and this is referential
>> integrity.  Think database-style isolation (repeatable read, etc) where
>> concurrent threads holding object references to the same value, and
>> mutating the same value, are not visible until a commit.
>>
>> I originally thought that Infinispan's storeAsBinary can be used for
>> this, but apparently not without some additional changes/tweaks.  Maybe we
>> need:
>>
>> 1) A new config option for this behaviour.  > defensive="true" /> ?
>>
> 2) If enabled, maybe use a subclass of MarshalledValue
>> (DefensiveMarshalledValue?) that *always* stores a byte[] and never caches
>> the object representation?
>>
>>
> I think we'd still need to cache the object instance while the command is
> executing, otherwise we'll have too many deserializations. But perhaps the
> new setting could control whether MarshalledValueInterceptor calls
> MarshalledValue.compact with preferSerializedRepresentation == true instead
> of false, as it does now.
>
>
> Well, you will want eager serialisation too, even in local mode.  So that
> would have to be built in.  So maybe rather than a MarshalledValue
> subclass, we really need a MarshalledValueInterceptor subclass.  Even
> easier/better encapsulated.  :)
>
>
That sounds good to me.


>
>
>
>> What do you think?
>>
>> Cheers
>> Manik
>>
>> On 28 Jan 2013, at 10:00, Sanne Grinovero  wrote:
>>
>> > I remember Manik and me pair-programming on that class to simplify it
>> > a bit - especially as there are some performance complexities - but we
>> > ended up not touching it as any change would have violated some
>> > expectations of one feature or another.
>> >
&

Re: [infinispan-dev] Translating our events to JSR 107 events

2013-01-29 Thread Dan Berindei

On Mon, Jan 28, 2013 at 5:41 PM, Manik Surtani  wrote:

>
> On 28 Jan 2013, at 15:22, Vladimir Blagojevic  wrote:
>
>  On 13-01-28 7:31 AM, Manik Surtani wrote:
>
>  If you're ok with changing the core, you could add a getValue() method
> to CacheEntryCreatedEvent, and an isCreated() method to
> CacheEntryModifiedEvent (as I suppose you don't want to call the updates
> listener when an entry is created). Both changes should be
> backwards-compatible.
>
> That could work.
>
>
> The second one Manik? As I was researching how to do this I found you were
> against the first option https://issues.jboss.org/browse/ISPN-881
>
>
> Yes, isCreated().
>
>
>
> > The second I have no idea how to implement as we do not have
> > CacheEntryExpired event. True, spec is not rigorous that such an event
> > has to be fired immediately after an entry has expired but eventually
> > (which might be on access). Either way, I am all ears on suggestions how
> > to implement this one.
> >
>
> I guess @CacheEntryEvicted/@CacheEntriesEvicted would be the closest thing
> we have in Infinispan. But you can't check in the listener if the entry was
> evicted because it expired or because there wasn't enough space in the data
> container (yet).
>
> There could definitely be something clever we could do here.  Adding the
> (expired or evicted) entry to a queue for later notification.  But that
> would definitely need to be something we explicitly enable rather than have
> running all the time, since it kinda defeats the purpose of evicting
> something to save memory only to have it put in a different queue elsewhere
> until an event is fired.
>
>
> Exactly! One thing we could do is what RI does. Check for expired on entry
> access from JSR module and during normal expired cleanup cycle.
>
>
> Be mindful of performance considerations here - this could get very
> expensive.
>
>
Exactly. I don't think you can use a queue for later notification, because
you also have to cancel the notification if the user updates the same key
again. I mean if an entry was supposed to expire at 12:00 and the user did
a put at 12:59 with expiration time 12:30, then he shouldn't get an expired
notification at 12:00 - even if the entry was evicted in the meantime.

I think you'd need something like a cache store that only keeps the keys
and their expiration time, and with passivation enabled. I don't think you
can reuse the cache store code, though.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Lucene 4 / Infinispan performance

2013-01-29 Thread Dan Berindei

Very nice, looking forward to the Lucene bench results.

I hope you'll run it with a distributed cache as well!



On Tue, Jan 29, 2013 at 3:30 PM, Sanne Grinovero wrote:

> No I'm not comparing with Lucene 3.6 now with this configuration. It
> is well known that Lucene 4 is significantly faster than Lucene 3, so
> that would be unfair.
>
> What is interesting is that when comparing our implementations vs. the
> Apache stock ones while using Lucene 3 we where "very close", often a
> bit faster but not too exciting.
> Now comparing with the stock ones using Lucene 4 it seems we're
> getting into a better position... and I didn't even profile it, this
> is the first run after finishing coding the functional requirements.
>
> Still these figures are produced by a stress test whose primary
> purpose is to verify consistency and no-corruptions under stress.. I
> happened to add some metrics for fun, but to provide realistic figures
> one should run Lucene's own bench suite.. I'll do that next week.
>
> @Manik yes I'm not using a CacheStore, I would presume the same. This
> is why I've created the Lucene-specific CacheLoader, maybe I should
> complete the job and make it a CacheStore.
>
> Sanne
>
> On 29 January 2013 13:14, Tristan Tarrant  wrote:
> > Have you got numbers comparing against Lucene 3.6 ?
> >
> > Tristan
> >
> > On 01/29/2013 12:53 AM, Sanne Grinovero wrote:
> >> These are preliminary results of our stressor; looks quite promising
> >> as I haven't yet looked into profiling / tuning:
> >>
> >> Stock Lucene RAMDirectory
> >> Searches: 14.799.852
> >> Writes: 195.935
> >>
> >> Stock Lucene FSDirectory (Memory mapping on SSD)
> >> Searches: 9.628.593
> >> Writes: 105.930
> >>
> >> Our custom Infinispan Directory (LOCAL)
> >> Searches: 17.815.874
> >> Writes: 184.140
> >>
> >> Figures represent operations performed in 15 minutes on a relatively
> >> small index.
> >>
> >> Cheers,
> >> Sanne
> >> ___
> >> infinispan-dev mailing list
> >> infinispan-dev@lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] ClockService

2013-01-30 Thread Dan Berindei

Manik, I think that JDK bug is pretty out-of-date, at least on Fedora.

I ran the micro-benchmark in the bug (with some modifications:
https://github.com/danberindei/infinispan/blob/t_time_sources_test/core/src/test/java/org/infinispan/TimeSourcesTest.java)
when we had the last round of discussions on this:

nanoTime: 4209836189827226918, time/call: 24ns
currentTimeMillis: 4209836189827226918, time/call: 31ns

The bug initially reported 7ns/call with an optimization that cached the
last currentTimeMillis() value, so I'm not sure how much better we could
get with our own ClockService implementation. I'm pretty sure a 3% overall
improvement is out of reach, though.



On Wed, Jan 30, 2013 at 10:53 AM, Manik Surtani  wrote:

>
> On 30 Jan 2013, at 08:41, Bela Ban  wrote:
>
> >
> > On 1/29/13 6:45 PM, Manik Surtani wrote:
> >> On 29 Jan 2013, at 17:17, Bela Ban  wrote:
> >>
> >>> On 1/29/13 5:25 PM, Sanne Grinovero wrote:
>  Glad you started work on that :)
> 
>  Any currentTimeMillis() even today will blow away your cache line and
>  probably trigger a context switch.
> >>> I understand the context switch (in general, it's not recommended
> anyway
> >>> to invoke a system call in synchronized code), but I fail to see why
> >>> this would blow the cache line. Are you referring to the cached Date
> >>> value here ?
> >> No, if you have a separate maint thread that updates a reusable
> currentTimeMillis value.
> >>
> >> Do you use nanoTime() a lot then?  Because that too is inefficient (as
> per the Oracle blog) ...
> >
> > Define inefficient !
>
> There was once a misconception that nanoTime() was faster (by an order of
> magnitude) that currentTimeMillis().  And a similar misconception going the
> other way.  The reality, it would seem, is that they're both *fairly
> inefficient*, depending on OS architecture.
>
> http://bugs.sun.com/view_bug.do?bug_id=6876279
>
> > I'm sure we're talking about nanosec / microsec
> > ranges here, so 3% faster won't cut it for me. If you contrast that to
> > my current work, where I try to deliver a batch of N messages and
> > therefore can skip N-1 lock acquitions/releases for M protocols, then
> > the latter wins…
>
> Right, I'm not entirely sure it is a hotspot for optimisation though.  I'm
> going by some research that Sanne did and I'm doing a bit more homework
> around that.
>
> > I still think a clock service is interesting, but for different reasons.
> > As Sanne mentioned in Palma, it would be interesting to 'control' time,
> > e.g. deliver 2 messages at the same time, or even go backwards in time.
> > In the case of JGroups, we could use a clock service to screw up message
> > reception (e.g. in testing) and therefore to test the correctness of
> > some protocols.
>
> Right, but for me that would be an additional benefit and I would
> de-prioritise if that was all I was getting from it.  If it is even a
> moderate performance boost though, say over 3% overall for such a
> small/simple change, then I'd do it.
>
> - M
>
> >
> > --
> > Bela Ban, JGroups lead (http://www.jgroups.org)
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> ma...@jboss.org
> twitter.com/maniksurtani
>
> Platform Architect, JBoss Data Grid
> http://red.ht/data-grid
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

[infinispan-dev] Branch completion in zsh for remove_topic_branch

2013-01-31 Thread Dan Berindei

Hi guys

I hated the fact that I didn't have branch name completions for
remove_topic_branch in zsh, so I wrote an auto-completion script for it:
https://github.com/danberindei/scripts/blob/master/zsh/_remove_topic_branch

Just save it to any directory and then add the directory to fpath in your
.zshrc/.zshenv, like this:

fpath=( $fpath  )

or like this

FPATH=$FPATH:


Hope this helps.

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] scala code in Infinispan

2013-01-31 Thread Dan Berindei

On Thu, Jan 31, 2013 at 2:42 PM, Mircea Markus  wrote:

>
> On 31 Jan 2013, at 12:37, Manik Surtani wrote:
>
>  I don't think that encouraging scala code is good purely for maintenance
> reasons. If there's a choice, it should be java. Not saying that learning a
> new language is not cool - but in practice people are a bit put off by
> maintaining Scala code. Its not only about what the writer of the code
> prefers as a language: it's more important what the maintainers of the code
> will has to work with.
>
>
> Would such maintainers also be put off by new language features (lambdas)
> in Java 8 when we (eventually) baseline to it?  :-)
>
> It's really NOT the same thing: any decent java programmer keeps up with
> all the enhancements in Java.
> What I might not want to - as an ISPN programmer - is to keep up with the
> language enhancements in Scala. And I might need to do that because of
> Scala language enhancements used in ISPN.
>
>
I guess my main problem with Scala is that it's evolving at a furious pace,
and as a result it's accumulating a lot of different ways to do the same
thing. So different parts of the code evolve to do the same thing in many
different ways, depending on who wrote that bit and when.

As to debugging Java-Scala interop, I never had any problems. There is one
general annoyance with debugging Scala code in that almost everything is a
function call, so you almost never use step-over in the debugger.

Profiling Scala code may be a bit more complicated than profiling Java
code, but I haven't had to do that yet (I was profiling the HotRod server,
but most of the spent time was in the core module).
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Dan Berindei

Radim, do these problems happen with the HotRod server, or only with
memcached?

HotRod requests handled by non-owners should be very rare, instead the vast
majority should be handled by the primary owner directly. So if this
happens with HotRod, we should focus on fixing the HotRod routing instead
of focusing on how to handle a large number of requests from non-owners.

That being said, even if a HotRod put request is handled by the primary
owner, it "generates" (numOwners - 1) extra OOB requests. So if you have
160 HotRod worker threads per node, you can expect 4 * 160 OOB messages per
node. Multiply that by 2, because responses are OOB as well, and you can
get 1280 OOB messages before you even start reusing any HotRod worker
thread. Have you tried decreasing the number of HotRod workers?

The thing is, our OOB thread pool can't use queueing because we'd get a
queue full of commit commands while all the OOB threads are waiting on keys
that those commit commands would unlock. As the OOB thread pool is full, we
discard messages, which I suspect slows things down quite a bit (especially
if it's a credit request/response message). So it may well be that a lower
number of HotRod working threads would perform better.

On the other hand, why is increasing the number of OOB threads a solution?
With -Xss 512k, you can get 2000 threads with only 1 GB of virtual memory
(the actual used memory is probably even less, unless you're using huge
pages). AFAIK the Linux kernel doesn't break a sweat with 10 threads
running, so having 2000 threads just hanging around, waiting for a
response, should be such a problem.

I did chat with Bela (or was it a break-out session?) about moving
Infinispan's request processing to another thread pool during the team
meeting in Palma. That would leave the OOB thread pool free to receive
response messages, FD heartbeats, credit requests/responses etc. The
downside, I guess, is that each request would have to be passed to another
thread, and the context switch may slow things down a bit. But since the
new thread pool would be in Infinispan, we could even do tricks like
executing a commit/rollback directly on the OOB thread.

In the end, I just didn't feel that working on this was justified,
considering the number of critical bugs we had. But maybe now's the time to
start experimenting...

On Fri, Feb 1, 2013 at 10:04 AM, Radim Vansa  wrote:

> Hi guys,
>
> after dealing with the large cluster for a while I find the way how we use
> OOB threads in synchronous configuration non-robust.
> Imagine a situation where node which is not an owner of the key calls PUT.
> Then the a RPC is called to the primary owner of that key, which reroutes
> the request to all other owners and after these reply, it replies back.
> There are two problems:
> 1) If we do simultanously X requests from non-owners to the primary owner
> where X is OOB TP size, all the OOB threads are waiting for the responses
> and there is no thread to process the OOB response and release the thread.
> 2) Node A is primary owner of keyA, non-primary owner of keyB and B is
> primary of keyB and non-primary of keyA. We got many requests for both keyA
> and keyB from other nodes, therefore, all OOB threads from both nodes call
> RPC to the non-primary owner but there's noone who could process the
> request.
>
> While we wait for the requests to timeout, the nodes with depleted OOB
> threadpools start suspecting all other nodes because they can't receive
> heartbeats etc...
>
> You can say "increase your OOB tp size", but that's not always an option,
> I have currently set it to 1000 threads and it's not enough. In the end, I
> will be always limited by RAM and something tells me that even nodes with
> few gigs of RAM should be able to form a huge cluster. We use 160 hotrod
> worker threads in JDG, that means that 160 * clusterSize = 10240 (64 nodes
> in my cluster) parallel requests can be executed, and if 10% targets the
> same node with 1000 OOB threads, it stucks. It's about scaling and
> robustness.
>
> Not that I'd have any good solution, but I'd really like to start a
> discussion.
> Thinking about it a bit, the problem is that blocking call (calling RPC on
> primary owner from message handler) can block non-blocking calls (such as
> RPC response or command that never sends any more messages). Therefore,
> having a flag on message "this won't send another message" could let the
> message be executed in different threadpool, which will be never
> deadlocked. In fact, the pools could share the threads but the non-blocking
> would have always a few threads spare.
> It's a bad solution as maintaining which message could block in the other
> node is really, really hard (we can be sure only in case of RPC responses),
> especially when some locks come. I will welcome anything better.
>
> Radim
>
>
> ---
> Radim Vansa
> Quality Assurance Engineer
> JBoss Datagrid
> tel. +420532

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Dan Berindei

Yeah, I wouldn't call this a "simple" solution...

The distribution/replication interceptors are quite high in the interceptor
stack, so we'd have to save the state of the interceptor stack (basically
the thread's stack) somehow and resume processing it on the thread
receiving the responses. In a language that supports continuations that
would be a piece of cake, but since we're in Java we'd have to completely
change the way the interceptor stack works.

Actually we do hold the lock on modified keys while the command is
replicated to the other owners. But think locking wouldn't be a problem: we
already allow locks to be owned by transactions instead of threads, so it
would just be a matter of creating a "lite transaction" for
non-transactional caches. Obviously the TransactionSynchronizerInterceptor
would have to go, but I see that as a positive thing ;)

So yeah, it could work, but it would take a huge amount of effort and it's
going to obfuscate the code. Plus, I'm not at all convinced that it's going
to improve performance that much compared to a new thread pool.

Cheers
Dan


On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa  wrote:

> Yeah, that would work if it is possible to break execution path into the
> FutureListener from the middle of interceptor stack - I am really not sure
> about that but as in current design no locks should be held when a RPC is
> called, it may be possible.
>
> Let's see what someone more informed (Dan?) would think about that.
>
> Thanks, Bela
>
> Radim
>
> - Original Message -
> | From: "Bela Ban" 
> | To: infinispan-dev@lists.jboss.org
> | Sent: Friday, February 1, 2013 9:39:43 AM
> | Subject: Re: [infinispan-dev] Threadpools in a large cluster
> |
> | It looks like the core problem is an incoming RPC-1 which triggers
> | another blocking RPC-2: the thread delivering RPC-1 is blocked
> | waiting
> | for the response from RPC-2, and can therefore not be used to serve
> | other requests for the duration of RPC-2. If RPC-2 takes a while,
> | e.g.
> | waiting to acquire a lock in the remote node, then it is clear that
> | the
> | thread pool will quickly exceed its max size.
> |
> | A simple solution would be to prevent invoking blocking RPCs *from
> | within* a received RPC. Let's take a look at an example:
> | - A invokes a blocking PUT-1 on B
> | - B forwards the request as blocking PUT-2 to C and D
> | - When PUT-2 returns and B gets the responses from C and D (or the
> | first
> | one to respond, don't know exactly how this is implemented), it sends
> | the response back to A (PUT-1 terminates now at A)
> |
> | We could change this to the following:
> | - A invokes a blocking PUT-1 on B
> | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
> | it
> | does the following:
> |   - B invokes PUT-2 and gets a future
> |   - B adds itself as a FutureListener, and it also stores the
> | address of the original sender (A)
> |   - When the FutureListener is invoked, B sends back the result
> |   as a
> | response to A
> | - Whenever a member leaves the cluster, the corresponding futures are
> | cancelled and removed from the hashmaps
> |
> | This could probably be done differently (e.g. by sending asynchronous
> | messages and implementing a finite state machine), but the core of
> | the
> | solution is the same; namely to avoid having an incoming thread block
> | on
> | a sync RPC.
> |
> | Thoughts ?
> |
> |
> |
> |
> | On 2/1/13 9:04 AM, Radim Vansa wrote:
> | > Hi guys,
> | >
> | > after dealing with the large cluster for a while I find the way how
> | > we use OOB threads in synchronous configuration non-robust.
> | > Imagine a situation where node which is not an owner of the key
> | > calls PUT. Then the a RPC is called to the primary owner of that
> | > key, which reroutes the request to all other owners and after
> | > these reply, it replies back.
> | > There are two problems:
> | > 1) If we do simultanously X requests from non-owners to the primary
> | > owner where X is OOB TP size, all the OOB threads are waiting for
> | > the responses and there is no thread to process the OOB response
> | > and release the thread.
> | > 2) Node A is primary owner of keyA, non-primary owner of keyB and B
> | > is primary of keyB and non-primary of keyA. We got many requests
> | > for both keyA and keyB from other nodes, therefore, all OOB
> | > threads from both nodes call RPC to the non-primary owner but
> | > there's noone who could process the request.
> | >
> | > While we wait for the requests to timeout, the nodes with depleted
> | > OOB threadpools start suspecting all other nodes because they
> | > can't receive heartbeats etc...
> | >
> | > You can say "increase your OOB tp size", but that's not always an
> | > option, I have currently set it to 1000 threads and it's not
> | > enough. In the end, I will be always limited by RAM and something
> | > tells me that even nodes with few gigs of RAM should be able to
> | > form a huge cluster.

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Dan Berindei

On Fri, Feb 1, 2013 at 12:40 PM, Radim Vansa  wrote:

> |
> | Radim, do these problems happen with the HotRod server, or only with
> | memcached?
>
> I didn't test memcached, only HotRod. The thing I was seing were many OOB
> threads stuck when sending messages from handleRemoteWrite.
>
> |
> | HotRod requests handled by non-owners should be very rare, instead
> | the vast majority should be handled by the primary owner directly.
> | So if this happens with HotRod, we should focus on fixing the HotRod
> | routing instead of focusing on how to handle a large number of
> | requests from non-owners.
> |
> |
> |
> | That being said, even if a HotRod put request is handled by the
> | primary owner, it "generates" (numOwners - 1) extra OOB requests. So
> | if you have 160 HotRod worker threads per node, you can expect 4 *
> | 160 OOB messages per node. Multiply that by 2, because responses are
> | OOB as well, and you can get 1280 OOB messages before you even start
> | reusing any HotRod worker thread. Have you tried decreasing the
> | number of HotRod workers?
>
> Decreasing the number of workers would be obvious solution how to scale it
> down. To be honest, I haven't tried that because it would certainly lower
> the overall throughput, and it is not a systematic solution IMO.
>
>
You never know, a lower number of worker threads may mean lower contention,
fewer context switches, so I think it's worth experimenting.


> |
> | The thing is, our OOB thread pool can't use queueing because we'd get
> | a queue full of commit commands while all the OOB threads are
> | waiting on keys that those commit commands would unlock. As the OOB
> | thread pool is full, we discard messages, which I suspect slows
> | things down quite a bit (especially if it's a credit
> | request/response message). So it may well be that a lower number of
> | HotRod working threads would perform better.
>
> We have already had similar talk where you convinced me that having queue
> for the thread pool wouldn't help much.
>
>
Having a queue for the OOB thread pool definitely won't help for
transactional caches, as you'll get deadlocks between prepare commands
waiting for cache keys and commit commands waiting in the OOB queue. I'm
guessing with the primary owners in non-transactional caches now forwarding
commands to the backup owners you can get deadlocks there as well if you
enable queueing. So yeah, enabling queueing still isn't an option...


> |
> | On the other hand, why is increasing the number of OOB threads a
> | solution? With -Xss 512k, you can get 2000 threads with only 1 GB of
> | virtual memory (the actual used memory is probably even less, unless
> | you're using huge pages). AFAIK the Linux kernel doesn't break a
> | sweat with 10 threads running, so having 2000 threads just
> | hanging around, waiting for a response, should be such a problem.
> |
> I don't say that it won't work (you're right that it's just virtual
> memory), but I have thought that Infinispan should scale and be robust even
> for the edge cases.
>
>
Define edge cases :)

The users usually require a certain number of simultaneous clients, a
certain throughput, etc. I don't think anyone will say "Yeah, we'll use
Infinispan, but only if it uses less than 1000 threads".


> |
> | I did chat with Bela (or was it a break-out session?) about moving
> | Infinispan's request processing to another thread pool during the
> | team meeting in Palma. That would leave the OOB thread pool free to
> | receive response messages, FD heartbeats, credit requests/responses
> | etc. The downside, I guess, is that each request would have to be
> | passed to another thread, and the context switch may slow things
> | down a bit. But since the new thread pool would be in Infinispan, we
> | could even do tricks like executing a commit/rollback directly on
> | the OOB thread.
>
> Hmm, for some messages (nonblocking) the context switch could be spared.
> It depends how complicated is to determine whether the message will block
> before entering the interceptor chain.
>
>
You don't have to be very specific, optimizing for
ClusteredGet/Commit/Rollback commands should be enough.


>  |
> |
> | In the end, I just didn't feel that working on this was justified,
> | considering the number of critical bugs we had. But maybe now's the
> | time to start experimenting...
> |
>
> I agree and I'm happy that ISPN is mostly working now.
>
> I have tried to rerun the scenario with upper OOB limit 2000 and it did
> not help (originaly I was using 200 and increased to 1000), node stops
> responding in one moment... So maybe OOB is not the only villain. I'll keep
> investigating.
>
>
I'm wondering if you could collect some statistics about the JGroups thread
pools, how many threads are busy at each point during the test. How many
HotRod workers are busy in the entire cluster when the OOB thread pool gets
full should be interesting as well...



> Radim
>
>
> |
> |
> |
> |
> |
> | On Fri, Feb 1, 2013 at

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-01 Thread Dan Berindei

On Fri, Feb 1, 2013 at 12:13 PM, Manik Surtani  wrote:

>
> On 1 Feb 2013, at 09:39, Dan Berindei  wrote:
>
> > Radim, do these problems happen with the HotRod server, or only with
> memcached?
> >
> > HotRod requests handled by non-owners should be very rare, instead the
> vast majority should be handled by the primary owner directly. So if this
> happens with HotRod, we should focus on fixing the HotRod routing instead
> of focusing on how to handle a large number of requests from non-owners.
>
> Well, even Hot Rod only optionally uses smart routing.  Some client
> libraries don't have this capability.
>
>
True, and I meant to say that with memcached it should be much worse, but
at least in Radim's tests I hope smart routing is enabled.



> >
> > That being said, even if a HotRod put request is handled by the primary
> owner, it "generates" (numOwners - 1) extra OOB requests. So if you have
> 160 HotRod worker threads per node, you can expect 4 * 160 OOB messages per
> node. Multiply that by 2, because responses are OOB as well, and you can
> get 1280 OOB messages before you even start reusing any HotRod worker
> thread. Have you tried decreasing the number of HotRod workers?
> >
> > The thing is, our OOB thread pool can't use queueing because we'd get a
> queue full of commit commands while all the OOB threads are waiting on keys
> that those commit commands would unlock. As the OOB thread pool is full, we
> discard messages, which I suspect slows things down quite a bit (especially
> if it's a credit request/response message). So it may well be that a lower
> number of HotRod working threads would perform better.
> >
> > On the other hand, why is increasing the number of OOB threads a
> solution? With -Xss 512k, you can get 2000 threads with only 1 GB of
> virtual memory (the actual used memory is probably even less, unless you're
> using huge pages). AFAIK the Linux kernel doesn't break a sweat with 10
> threads running, so having 2000 threads just hanging around, waiting for a
> response, should be such a problem.
> >
> > I did chat with Bela (or was it a break-out session?) about moving
> Infinispan's request processing to another thread pool during the team
> meeting in Palma. That would leave the OOB thread pool free to receive
> response messages, FD heartbeats, credit requests/responses etc. The
> downside, I guess, is that each request would have to be passed to another
> thread, and the context switch may slow things down a bit. But since the
> new thread pool would be in Infinispan, we could even do tricks like
> executing a commit/rollback directly on the OOB thread.
>
> Right.  I always got the impression we were abusing the OOB pool.  But in
> the end, I think it makes sense (in JGroups) to separate a service thread
> pool (for heartbeats, credits, etc) and an application thread pool (what
> we'd use instead of OOB).  This way you could even tune your service thread
> pool to just have, say, 2 threads, and the application thread pool to 1000
> or whatever.
>
>
A separate service pool would be good, but I think we could go further and
treat ClusteredGet/Commit/Rollback commands the same way, because they
can't block waiting for other commands to be processed.



> > In the end, I just didn't feel that working on this was justified,
> considering the number of critical bugs we had. But maybe now's the time to
> start experimenting…
> >
> >
> >
> > On Fri, Feb 1, 2013 at 10:04 AM, Radim Vansa  wrote:
> > Hi guys,
> >
> > after dealing with the large cluster for a while I find the way how we
> use OOB threads in synchronous configuration non-robust.
> > Imagine a situation where node which is not an owner of the key calls
> PUT. Then the a RPC is called to the primary owner of that key, which
> reroutes the request to all other owners and after these reply, it replies
> back.
> > There are two problems:
> > 1) If we do simultanously X requests from non-owners to the primary
> owner where X is OOB TP size, all the OOB threads are waiting for the
> responses and there is no thread to process the OOB response and release
> the thread.
> > 2) Node A is primary owner of keyA, non-primary owner of keyB and B is
> primary of keyB and non-primary of keyA. We got many requests for both keyA
> and keyB from other nodes, therefore, all OOB threads from both nodes call
> RPC to the non-primary owner but there's noone who could process the
> request.
> >
> > While we wait for the requests to timeout, the nodes with depleted OOB
> threadpools start suspecting all other nodes because they can't receive

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-03 Thread Dan Berindei

On Sun, Feb 3, 2013 at 1:23 PM, Bela Ban  wrote:

> A new thread pool owned by Infinispan is certainly something desirable,
> as discussed in Palma, but I think it wouldn't solve the issue Radim ran
> into, namely threads being used despite the fact that they only wait for
> another blocking RPC to finish.
>
>
IMO the fact that threads are being blocked waiting for an RPC to return is
not a big deal. The real problem is when all the OOB threads are used up,
causing a deadlock: existing OOB threads are blocked waiting for RPC
responses, and the RPC responses are blocked by until a OOB thread is freed.



> If we made the JGroups thread return immediately by transferring control
> to an Infinispan thread, then we'd simply move the issue from the former
> to the latter pool. Eventually, the Infinispan pool would run out of
> threads.
>
>
Yeah, but JGroups would still be able to process RPC responses, and by
doing that it will free some of the OOB threads.

For transactional caches there's an additional benefit: if commit/rollback
commands were handled directly on the OOB pool then neither thread pool
would have dependencies between tasks, so we could enable queueing for the
OOB pool and the Infinispan pool without causing deadlocks.



> Coming back to the specific problem Radim ran into: the forwarding of a
> PUT doesn't hold any locks, so your argument below wouldn't hold.
> However, of course this is only one specific scenario, and you're
> probably right that we'd have to consider the more general case of a
> thread holding locks...
>
>
Actually, NonTransactionalLockingInterceptor acquires a lock on the key
before the execution of the RPC (from
NonTxConcurrentDistributionInterceptor), and is keeping that lock for the
entire duration of the RPC.

We make other RPCs while holding the key lock as well, particularly to
invalidate the L1 entries.



> All said, I believe it would still be worthwhile looking into a more
> non-blocking way of invoking RPCs, that doesn't occupy threads which
> essentially only wait on IO (network traffic)... A simple state machine
> approach could be the solution to this...
>
>
Switching to a state machine approach would require rethinking and
rewriting all our interceptors, and I'm pretty sure the code would get more
complex and harder to debug (to say nothing about interpreting the logs).
Are you sure it's going to have that many benefits to make it worthwhile?



> On 2/1/13 10:54 AM, Dan Berindei wrote:
> > Yeah, I wouldn't call this a "simple" solution...
> >
> > The distribution/replication interceptors are quite high in the
> > interceptor stack, so we'd have to save the state of the interceptor
> > stack (basically the thread's stack) somehow and resume processing it
> > on the thread receiving the responses. In a language that supports
> > continuations that would be a piece of cake, but since we're in Java
> > we'd have to completely change the way the interceptor stack works.
> >
> > Actually we do hold the lock on modified keys while the command is
> > replicated to the other owners. But think locking wouldn't be a
> > problem: we already allow locks to be owned by transactions instead of
> > threads, so it would just be a matter of creating a "lite transaction"
> > for non-transactional caches. Obviously the
> > TransactionSynchronizerInterceptor would have to go, but I see that as
> > a positive thing ;)
> >
> > So yeah, it could work, but it would take a huge amount of effort and
> > it's going to obfuscate the code. Plus, I'm not at all convinced that
> > it's going to improve performance that much compared to a new thread
> pool.
> >
> > Cheers
> > Dan
> >
> >
> > On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa  > <mailto:rva...@redhat.com>> wrote:
> >
> > Yeah, that would work if it is possible to break execution path
> > into the FutureListener from the middle of interceptor stack - I
> > am really not sure about that but as in current design no locks
> > should be held when a RPC is called, it may be possible.
> >
> > Let's see what someone more informed (Dan?) would think about that.
> >
> > Thanks, Bela
> >
> > Radim
> >
> > - Original Message -
> > | From: "Bela Ban" mailto:b...@redhat.com>>
> > | To: infinispan-dev@lists.jboss.org
> > <mailto:infinispan-dev@lists.jboss.org>
> > | Sent: Friday, February 1, 2013 9:39:43 AM
> > | Subject: Re: [infinispan-dev] Threadpools in a large cluster
> > |
&

Re: [infinispan-dev] Upstream 5.2.x branch does not exist?

2013-02-04 Thread Dan Berindei

Yes, it's most likely because 5.2.x is identical to master at this point.

Cheers
Dan


On Mon, Feb 4, 2013 at 12:17 PM, Galder Zamarreño  wrote:

> Something's wrong here because whereas 5.1.x and previous branches appear
> as "unmerged", 5.2.x appears as "merged branch"
>
> Maybe the issue is due to the fact that no different commits have been
> pushed to 5.2.x yet…
>
> Cheers,
>
> On Feb 4, 2013, at 11:11 AM, Galder Zamarreño  wrote:
>
> > Hey Mircea,
> >
> > Seems like there's no 5.2.x branch in Infinispan upstream:
> > https://github.com/infinispan/infinispan/branches
> >
> > Did you forgot to push when you branched? Or is there any other issue?
> >
> > Cheers,
> > --
> > Galder Zamarreño
> > gal...@redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarreño
> gal...@redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] scala code in Infinispan

2013-02-07 Thread Dan Berindei

On 6 Feb 2013 17:51, "Manik Surtani"  wrote:

>
> On 6 Feb 2013, at 14:58, Mircea Markus  wrote:
>
> On 6 Feb 2013, at 15:37, Galder Zamarreño wrote:
>
> I don't think that encouraging scala code is good purely for maintenance
> reasons. If there's a choice, it should be java. Not saying that learning a
> new language is not cool - but in practice people are a bit put off by
> maintaining Scala code. Its not only about what the writer of the code
> prefers as a language: it's more important what the maintainers of the code
>
>
> will has to work with.
>
>
> Would such maintainers also be put off by new language features (lambdas)
> in Java 8 when we (eventually) baseline to it?  :-)
>
>  It's really NOT the same thing: any decent java programmer keeps up with
> all the enhancements in Java.
>
> What I might not want to - as an ISPN programmer - is to keep up with the
> language enhancements in Scala. And I might need to do that because of
> Scala language enhancements used in ISPN.
>
>
> ^ I wonder whether C programmers thought the same way 20 years ago.
>
> Personally I don't believe Scala is the next big thing as it doesn't have
> a "killer" feature, e.g. OOP from C -> C++ or GC from C++ -> Java.
>
>
> That's 20/20 hindsight.  Lots of C developers said OOP was bullish*t when
> C++ came about, and even today some C++ folks argue than GC is for losers.
>  :)
>
>
Not sure about C developers, but there are plenty of developers in the
functional camp who still say OOP is bullsh*t :)

And many of the GC arguments were only invalidated 10 years after Java came
out, as multi-core became the norm and the GC could use a "free" core.



>  As Alan said, I for one look forward to writing all my code in JavaScript
> but until that day there is a lot of innovation we ought to embrace.
>  Java's shown itself to be slow to grow and evolve.  Oracle's acquisition
> of Sun has sped things up a lot, but it still is behind the curve.  There's
> a good reason why Ruby, Python, Erlang and Scala are gaining popularity.
>  If you've ever spent any time writing extensive code in any of these
> platforms you'd understand why.
>
>
Seriously, what do JavaScript, Ruby, Python, Erlang and Scala have in
common? The only thing I can think of is "they're not Java" :)

I think Python is just as slow to evolve as Java, maybe even slower. And
it's not just the language itself, but the community as well: Python 3.0
came out in 2008, yet not everyone is on board just yet (
https://news.ycombinator.com/item?id=5009484).

Scala seems to be on the other end of the spectrum, adding a truck-load of
features every couple of years. My feeling is the Scala guys haven't
learned that every new feature starts at -100 points yet:
http://www.scala-lang.org/node/43




> - M
>
>   --
> Manik Surtani
> ma...@jboss.org
> twitter.com/maniksurtani
>
> Platform Architect, JBoss Data Grid
> http://red.ht/data-grid
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei

On Thu, Feb 7, 2013 at 6:53 AM, Bela Ban  wrote:

> Hi Pedro,
>
> this is almost exactly what I wanted to implement !
>
> Question:
> - In RequestCorrelator.handleRequest():
>
> protected void handleRequest(Message req, Header hdr) {
> Object retval;
> boolean threwException = false;
> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
> try {
> retval=request_handler.handle(messageRequest);
> } catch(Throwable t) {
> retval=t;
> threwException = true;
> }
> messageRequest.sendReply(retval, threwException);// <-- should be moved
> up, or called only if threwException == true
> }
>
>
> , you create a MessageRequestImpl and pass it to the RequestHandler. The
> request handler then dispatches the request (possibly) to a thread pool
> and calls MessageRequestImpl.sendReply() when done.
>
> However, you also call MessageRequest.sendReply() before returning from
> handleRequest(). I think this is an error, and
> MessageRequest.sendReply() should be moved up inside the catch clause,
> or be called only if threwException is true, so that we send a reply on
> behalf of the RequestHandler if and only if it threw an exception (e.g.
> before it dispatches the request to a thread pool). Otherwise, we'd send
> a reply *twice* !
>
> A few changes I have in mind (need to think about it more):
>
> - I want to leave the existing RequestHandler interface in place, so
> current implementation continue to work
> - There will be a new AsyncRequestHandler interface (possibly extending
> RequestHandler, so an implementation can decide to implement both). The
> RequestCorrelator needs to have either request_handler or
> async_request_handler set. If the former is set, the logic is unchanged.
> If the latter is set I'll invoke the async dispatching code
>
> - AsyncRequestHandler will look similar to the following:
> void handle(Message request, Handback hb, boolean requires_response)
> throws Throwable;
> - Handback is an interface, and its impl contains header information
> (e.g. request ID)
> - Handback has a sendReply(Object reply, boolean is_exception) method
> which sends a response (or exception) back to the caller
>

+1 for a new interface. TBH I hadn't read the RequestCorrelator code, so I
had assumed it was already asynchronous, and only RpcDispatcher was
synchronous.

I'm not so sure about the Handback name, how about calling it Response
instead?



> - When requires_response is false, the AsyncRequestHandler doesn't need
> to invoke sendReply()
>
>
I think this should be the other way around: when requires_response is
true, the AsyncRequestHandler *can* invoke sendReply(), but is not required
to (the call will just time out on the caller node); when requires_response
is false, invoking sendReply() should throw an exception.


- Message batching
> - The above interfaces need to take message batching into account, e.g.
> the ability to handle multiple requests concurrently (if they don't need
> to be executed sequentially)
>
>
You mean handle() is still going to be called once for each request, but
second handle() call won't necessarily wait for the first message's
sendReply() call?

Is this going to apply only to OOB messages, or to regular messages as
well? I think I'd prefer it if it only applied to OOB messages, otherwise
we'd have to implement our own ordering for regular/async commands.



>
> Thoughts ?
>
>
>

> On 2/6/13 8:29 PM, Pedro Ruivo wrote:
> > Hi all,
> >
> > Recently I came up with a solution that can help with the thread pool
> > problem motivated by the following:
> >
> > In one of the first implementation of Total Order based commit
> > protocol (TO), I had the requirement to move the PrepareCommand to
> > another thread pool. In resume, the TO protocol delivers the
> > PrepareCommand in a deterministic order in all the nodes, by a single
> > deliver thread. To ensure consistency, if it delivers two conflicting
> > transactions, the second transaction must wait until the first
> > transaction finishes. However, blocking single deliver thread is not a
> > good solution, because no more transaction can be validated, even if
> > they don't conflict, while the thread is blocked.
> >
> > So, after creating a dependency graph (i.e. the second transaction
> > knows that it must wait for the first transaction to finish) I move
> > the PrepareCommand to another thread pool. Initially, I implemented a
> > new command, called PrepareResponseCommand, that sends back the reply
> > of the PrepareCommand. This solution has one disadvantage: I had to
> > implement an ack collector in ISPN, while JGroups already offers me
> > that with a synchronous communication.
> >
> > Recently (2 or 3 months ago) I implemented a simple modification in
> > JGroups. In a more generic approach, it allows other threads to reply
> > to a RPC request (such as the PrepareCommand). In the previous
> > scenario, I replaced the PrepareResponseCommand and the ack collector
> > implementation with a synchronous RPC invocation. I

Re: [infinispan-dev] UNICAST / UNICAST2 connection reaping

2013-02-07 Thread Dan Berindei

I've created https://issues.jboss.org/browse/ISPN-2805


On Wed, Feb 6, 2013 at 7:26 PM, Bela Ban  wrote:

> Connection reaping may lead to message loss in UNICAST{2}. Until I've
> fixed [1], could you disable connection reaping ? Instructions are in [1].
>
>
> [1] https://issues.jboss.org/browse/JGRP-1586
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei

On Thu, Feb 7, 2013 at 12:43 PM, Bela Ban  wrote:

>
> On 2/7/13 11:09 AM, Dan Berindei wrote:
> >
> >
> > A few changes I have in mind (need to think about it more):
> >
> > - I want to leave the existing RequestHandler interface in place, so
> > current implementation continue to work
> > - There will be a new AsyncRequestHandler interface (possibly
> > extending
> > RequestHandler, so an implementation can decide to implement
> > both). The
> > RequestCorrelator needs to have either request_handler or
> > async_request_handler set. If the former is set, the logic is
> > unchanged.
> > If the latter is set I'll invoke the async dispatching code
> >
> > - AsyncRequestHandler will look similar to the following:
> > void handle(Message request, Handback hb, boolean requires_response)
> > throws Throwable;
> > - Handback is an interface, and its impl contains header information
> > (e.g. request ID)
> > - Handback has a sendReply(Object reply, boolean is_exception) method
> > which sends a response (or exception) back to the caller
> >
> >
> > +1 for a new interface. TBH I hadn't read the RequestCorrelator code,
> > so I had assumed it was already asynchronous, and only RpcDispatcher
> > was synchronous.
>
>
> Nope, unfortunately not.
>
>
>

> >
> > I'm not so sure about the Handback name, how about calling it Response
> > instead?
>
>
> It *is* actually called Response (can you read my mind?) :-)
>
>
Nice :)


> >
> > - When requires_response is false, the AsyncRequestHandler doesn't
> > need to invoke sendReply()
> >
> >
> > I think this should be the other way around: when requires_response is
> > true, the AsyncRequestHandler *can* invoke sendReply(), but is not
> > required to (the call will just time out on the caller node); when
> > requires_response is false, invoking sendReply() should throw an
> > exception.
>
>
> The way I actually implemented it this morning is to omit the boolean
> parameter altogether:
> void handle(Message request, Response response) throws Exception;
>
> Response is null for async requests.
>
>
Sounds good.


>
>
>
>
> >
> >
> > - Message batching
> > - The above interfaces need to take message batching into account,
> > e.g.
> > the ability to handle multiple requests concurrently (if they
> > don't need
> > to be executed sequentially)
> >
> >
> > You mean handle() is still going to be called once for each request,
> > but second handle() call won't necessarily wait for the first
> > message's sendReply() call?
>
> Yes. I was thinking of adding a second method to the interface, which
> has a message batch as parameter. However, we'd also have to pass in an
> array of Response objects and it looked a bit clumsy.
>
>
Agree, it would look quite clumsy.


> >
> > Is this going to apply only to OOB messages, or to regular messages as
> > well? I think I'd prefer it if it only applied to OOB messages,
> > otherwise we'd have to implement our own ordering for regular/async
> > commands.
>
> No, I think it'll apply to all messages. A simple implementation could
> dispatch OOB messages to the thread pool, as they don't need to be
> ordered. Regular messages could be added to a queue where they are
> processed sequentially by a *single* thread. Pedro does implement
> ordering based on transactions (see his prev email), and I think there
> are some other good use cases for regular messages. I think one thing
> that could be done for regular messages is to implement something like
> SCOPE (remember ?) for async RPCs: updates to different web sessions
> could be processed concurrently, only updates to the *same* session
> would have to be ordered.
>
>
Yeah, I agree implementing the regular message ordering ourselves would
give us a little more room for optimizations. But it would make our part
more complicated, too. Well, not for Total Ordering, because Pedro already
implemented it, but for our regular async scenarios we'd need to add a
thread pool (we want to allow 2 threads from different sources to access
different keys at the same time).



> This API is not in stone, we can always change it. Once I'm done with
> this and have batching II implemented, plus some other JIRAs, I'll ping
> you guys and we should have a meeting discussing
> - Async invocation API
> - Message batching (also in conjunction with the above)
> - Message bundling and OOB / DONT_BUNDLE; bundling of OOB messages
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] UserTransactionLookup for JSR-107?

2013-02-07 Thread Dan Berindei

Do we really need to expose the TransactionManager's UserTransaction
implementation?

Looking at the interface, it seems like a subset of TransactionManager, so
couldn't we return a custom UserTransaction that just delegates to the
TransactionManager?


On Thu, Feb 7, 2013 at 2:06 PM, Manik Surtani  wrote:

> Ok.  Then a separate Lookup is what we'd need, I guess.  Not pretty, but
> oh well.
>
> On 7 Feb 2013, at 11:41, Galder Zamarreño  wrote:
>
> >
> > On Feb 7, 2013, at 12:31 PM, Manik Surtani  wrote:
> >
> >>
> >> On 7 Feb 2013, at 11:23, Galder Zamarreño  wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'm back with a more food for thought wrt JSR-107 impl. Our
> CacheManager adapter needs to implement:
> >>>
> >>> UserTransaction getUserTransaction();
> >>>
> >>> The problem there is that there's no standard way of getting a
> UserTransaction given a JTA TransactionManager.
> >>>
> >>> It really is down to each TransactionManager provider to give a
> UserTransaction instance (whether JNDI, static…etc).
> >>>
> >>> So, we need a way to lookup a UserTransaction.
> >>>
> >>> One option is to add a getUserTransaction to TransactionManagerLookup,
> but that will break existing clients.
> >>
> >> You mean it would break existing TML implementations?
> >
> > ^ Yeah, potentially yeah.
> >
> >> Do we know of any custom TML implementations though?
> >
> > Yes:
> > -
> https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinispan/src/main/java/org/hibernate/cache/infinispan/tm/HibernateTransactionManagerLookup.java
> > -
> https://github.com/jbossas/jboss-as/blob/master/clustering/infinispan/src/main/java/org/jboss/as/clustering/infinispan/TransactionManagerProvider.java
> >
> >>
> >>>
> >>> Alternatively, define a
> org.infinispan.transaction.lookup.UserTransactionLookup interface which is
> configurable. We'd then need to implement for existing TML classes.
> >>>
> >>> If anyone has any other ideas, let us know.
> >>>
> >>> Cheers,
> >>> --
> >>> Galder Zamarreño
> >>> gal...@redhat.com
> >>> twitter.com/galderz
> >>>
> >>> Project Lead, Escalante
> >>> http://escalante.io
> >>>
> >>> Engineer, Infinispan
> >>> http://infinispan.org
> >>>
> >>>
> >>> ___
> >>> infinispan-dev mailing list
> >>> infinispan-dev@lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >> --
> >> Manik Surtani
> >> ma...@jboss.org
> >> twitter.com/maniksurtani
> >>
> >> Platform Architect, JBoss Data Grid
> >> http://red.ht/data-grid
> >>
> >>
> >> ___
> >> infinispan-dev mailing list
> >> infinispan-dev@lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> >
> > --
> > Galder Zamarreño
> > gal...@redhat.com
> > twitter.com/galderz
> >
> > Project Lead, Escalante
> > http://escalante.io
> >
> > Engineer, Infinispan
> > http://infinispan.org
> >
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Manik Surtani
> ma...@jboss.org
> twitter.com/maniksurtani
>
> Platform Architect, JBoss Data Grid
> http://red.ht/data-grid
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Codename for Infinispan 5.3.0

2013-02-07 Thread Dan Berindei

Now that I've seen the commercial, it's obvious that Boddington's is the
right choice :)

http://www.youtube.com/watch?v=XEEU1nQeGNA


On Wed, Feb 6, 2013 at 8:02 PM, Bela Ban  wrote:

> - Corona (with lemon)
> - Boddington's
> - Bug Light :-)
>
> On 2/6/13 5:17 PM, Mircea Markus wrote:
> > Following the tradition, please bring your suggestion of beer-code
> > name for the new Infinispan release. Then we'll vote.
> > Mine is  Guinness :-)
> >
> > Cheers,
> > --
> > Mircea Markus
> > Infinispan lead (www.infinispan.org )
> >
> >
> >
> >
> >
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Adding JSR-107 support for invokeEntryProcessor

2013-02-07 Thread Dan Berindei

On Wed, Feb 6, 2013 at 7:48 PM, Galder Zamarreño  wrote:

> Hi all,
>
> We're meant to implement this method in JSR-107:
>
> https://github.com/jsr107/jsr107spec/blob/master/src/main/java/javax/cache/Cache.java#L510
>
> The interesting bit comes in the javadoc of EntryProcessor:
> https://github.com/jsr107/jsr107spec/blob/master/src/main/java/javax/cache/Cache.java#L510
>
>
The EntryProcessor javadoc link is wrong, it should be
https://github.com/jsr107/jsr107spec/blob/master/src/main/java/javax/cache/Cache.java#L618:)

To be more precise:
>
> " * Allows execution of code which may mutate a cache entry with
> exclusive
>  * access (including reads) to that entry.
>  * 
>  * Any mutations will not take effect till after the processor has
> completed; if an exception
>  * thrown inside the processor, the exception will be returned wrapped
> in an
>  * ExecutionException.  No changes will be made to the cache.
>  * 
>  * This enables a way to perform compound operations without
> transactions
>  * involving a cache entry atomically. Such operations may include
> mutations."
>
> Having quickly glanced, there's several things that need addressing from
> Infinispan internals perspective:
>
> 1. Implies that we need to be able to lock a key without a transaction,
> something we don't currently support.
>
>
Actually we don't support it with optimistic transactions either (see
OptimisticLockingInterceptor#visitLockControlCommand()).



> 2. We need an unlock()
>
>
Even if we do implement it, I wouldn't allow user code to call lock/unlock
in non-transactional caches.



> 3. Requires exclusive access, even for read operations. Our lock()
> implementation still allows read operations.
>
>
What happens on other nodes? Do we have to block threads on other nodes
that want to read the entry from their own L1 cache?

I think the intention of this requirement is not really to block readers
from executing, but from seeing incomplete values. So we should be
complying with the spirit (if not the letter) of the specification if we
made a copy of the entry before handing it over to the EntryProcessor.



> These are fairly substantial changes (I'm planning to add them as subtasks
> to https://issues.jboss.org/browse/ISPN-2639) particularly 1) and 3), and
> so wanted to share some thoughts:
>
> For 1 and 2, the easiest way I can think of doing this is by having a new
> LockingInterceptor that is similar to NonTransactionalLockingInterceptor,
> but unlocks only when unlock is called (as opposed to after each operation
> finishes).
>
>
Shouldn't this work with any cache configuration? If yes, then every
LockingInterceptor implementation should handle it.



> For 3, we'd either need to add a new lock() method that supports locking
> read+write, or change lock() behaivour to also lock reads. The latter could
> break old clients, so I'd go for a new lock method, i.e. lockExclusively().
> Again, to support this, a new different NonTransactionalLockingInterceptor
> is needed so that locks are acquired on read operations as well.
>
>
Again, I think this should be a new command (or a new flag on
LockControlCommand) and every LockingInterceptor implementation should
handle it.



> Finally, any new configurations could be avoided at this stage by simply
> having the JSR-107 adapter inject the right locking interceptor. IOW, if
> you use JSR-107, we'll swap NonTransactionalLockingInterceptor for
> JSR107FriendlyNonTransactionalLockingInterceptor.
>
>
Except it won't always be NonTransactionalLockingInterceptor...


> Before I get started with this, I wanted to get the thoughts/opinions of
> the list.
>
> Cheers,
> --
> Galder Zamarreño
> gal...@redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei

On Thu, Feb 7, 2013 at 3:55 PM, Pedro Ruivo  wrote:

> Hi, see inline
>
> Cheers,
> Pedro
>
> On 2/7/13 11:42 AM, Bela Ban wrote:
> > On 2/7/13 12:29 PM, Pedro Ruivo wrote:
> >> Hi Bela
> >>
> >> On 2/7/13 4:53 AM, Bela Ban wrote:
> >>> Hi Pedro,
> >>>
> >>> this is almost exactly what I wanted to implement !
> >>>
> >>> Question:
> >>> - In RequestCorrelator.handleRequest():
> >>>
> >>> protected void handleRequest(Message req, Header hdr) {
> >>> Object retval;
> >>> boolean threwException = false;
> >>> MessageRequest messageRequest = new MessageRequestImpl(req, hdr);
> >>> try {
> >>> retval=request_handler.handle(messageRequest);
> >>> } catch(Throwable t) {
> >>> retval=t;
> >>> threwException = true;
> >>> }
> >>> messageRequest.sendReply(retval, threwException);//<-- should be moved
> >>> up, or called only if threwException == true
> >>> }
> >>>
> >>>
> >>> , you create a MessageRequestImpl and pass it to the RequestHandler.
> The
> >>> request handler then dispatches the request (possibly) to a thread pool
> >>> and calls MessageRequestImpl.sendReply() when done.
> >>>
> >>> However, you also call MessageRequest.sendReply() before returning from
> >>> handleRequest(). I think this is an error, and
> >>> MessageRequest.sendReply() should be moved up inside the catch clause,
> >>> or be called only if threwException is true, so that we send a reply on
> >>> behalf of the RequestHandler if and only if it threw an exception (e.g.
> >>> before it dispatches the request to a thread pool). Otherwise, we'd
> send
> >>> a reply *twice* !
> >> In my defense, I was assuming if the application uses the sendReply()
> >> method, it must return a special return value: DO_NOT_REPLY (in
> >> RequestHandler interface).
> >> This return value is automatically ignored:
> >>
> >> public final void sendReply(Object reply, boolean exceptionThrown) {
> >> if(!header.rsp_expected || reply == RequestHandler.DO_NOT_REPLY)
> >> return;
> > OK
> >
> >
> >>> A few changes I have in mind (need to think about it more):
> >>>
> >>> - I want to leave the existing RequestHandler interface in place, so
> >>> current implementation continue to work
> >>> - There will be a new AsyncRequestHandler interface (possibly extending
> >>> RequestHandler, so an implementation can decide to implement both). The
> >>> RequestCorrelator needs to have either request_handler or
> >>> async_request_handler set. If the former is set, the logic is
> unchanged.
> >>> If the latter is set I'll invoke the async dispatching code
> >> I'm not sure if it is a good idea to have the AsyncRequestHandler
> >> extending the RequestHandler interface. If the application implements
> >> both methods (Object handle(Message) and void handle(Message, ...)) how
> >> do you know which method should be invoked?
> >
> > The default would be to invoke the old handle(Message) method. The
> > dispatching mechanism could be changed to use the async method by
> > setting an attribute in MessageDispatcher (which in turn sets it in
> > RequestCorrelator).
> >
> > How would you do this ? Remember, we cannot change or remove
> > handle(Message) as subclasses of RpcDispatcher or MessageDispatcher, or
> > impls of RequestHandler are out there and any change to handle(Message)
> > would break them.
> >
> > Would you simply provide a separate AsyncRequestHandler interface, not
> > extending RequestHandler ? This would require RequestCorrelator and
> > MessageDispatcher to have 2 refs instead of 1. With the current approach
> > I can do an instanceof on the RequestHandler.
> >
> > I eventually like to merge RequestHandler and AsyncRequestHandler into 1
> > class, but this can be done in 4.0 at the earliest time.
> >
> My opinion, I would use a separate a interface and 2 references (while
>

I'd go for 2 references as well, I think the null check is easier for the
JIT/CPU to optimize.

I also think the two interfaces represent two different use cases, so it
should be clear to a reader that a class that implements
AsyncRequestHandler really is using async delivery.


RequestHandler is still active). However, I will change the
> AsyncRequestHandler interface to:
>
> Object handle(Message message, Response response) throws Exception
>
> and the Response interface would be as follow:
>
> void sendResponse(Object reply, boolean isException) //or sendReply(...);
>
> void setAsyncResponse(boolean value); //or setAsyncReply(...);
>
> boolean isAsyncResponse(); //or isAsyncReply();
>
> And with this, it can supports both behaviors, with a minor addition:
>
> 1) To work in async mode:
>  the application invokes setAsyncResponse(true) and eventually it
> will invoke the sendResponse(...)
>
> 2) To work in sync mode:
>  the application invokes setAsyncResponse(false). I think this
> should be the default value
>
> in the RequestCorrelator, it checks if isAsyncReponse() value. If true,
> it does not send the response (it only sends it if it is an exception
> caught). If the value is false,

Re: [infinispan-dev] Threadpools in a large cluster

2013-02-07 Thread Dan Berindei

On Thu, Feb 7, 2013 at 8:05 PM, Mircea Markus  wrote:

>
> On 1 Feb 2013, at 09:54, Dan Berindei wrote:
>
> Yeah, I wouldn't call this a "simple" solution...
>
> The distribution/replication interceptors are quite high in the
> interceptor stack, so we'd have to save the state of the interceptor stack
> (basically the thread's stack) somehow and resume processing it on the
> thread receiving the responses. In a language that supports continuations
> that would be a piece of cake, but since we're in Java we'd have to
> completely change the way the interceptor stack works.
>
> Actually we do hold the lock on modified keys while the command is
> replicated to the other owners. But think locking wouldn't be a problem: we
> already allow locks to be owned by transactions instead of threads, so it
> would just be a matter of creating a "lite transaction" for
> non-transactional caches. Obviously the TransactionSynchronizerInterceptor
> would have to go, but I see that as a positive thing ;)
>
> The TransactionSynchronizerInterceptor protected the CacheTransaction
> objects from multiple writes, we'd still need that because of the NBST
> forwarding.
>

We wouldn't need it if access to the Collection members in CacheTransaction
was properly synchronized. Perhaps hack is too strong a word, let's just
say I'm seeing TransactionSynchronizerInterceptor as a temporary solution :)


> So yeah, it could work, but it would take a huge amount of effort and it's
> going to obfuscate the code. Plus, I'm not at all convinced that it's going
> to improve performance that much compared to a new thread pool.
>
> +1
>
>
> Cheers
> Dan
>
>
> On Fri, Feb 1, 2013 at 10:59 AM, Radim Vansa  wrote:
>
>> Yeah, that would work if it is possible to break execution path into the
>> FutureListener from the middle of interceptor stack - I am really not sure
>> about that but as in current design no locks should be held when a RPC is
>> called, it may be possible.
>>
>> Let's see what someone more informed (Dan?) would think about that.
>>
>> Thanks, Bela
>>
>> Radim
>>
>> - Original Message -
>> | From: "Bela Ban" 
>> | To: infinispan-dev@lists.jboss.org
>> | Sent: Friday, February 1, 2013 9:39:43 AM
>> | Subject: Re: [infinispan-dev] Threadpools in a large cluster
>> |
>> | It looks like the core problem is an incoming RPC-1 which triggers
>> | another blocking RPC-2: the thread delivering RPC-1 is blocked
>> | waiting
>> | for the response from RPC-2, and can therefore not be used to serve
>> | other requests for the duration of RPC-2. If RPC-2 takes a while,
>> | e.g.
>> | waiting to acquire a lock in the remote node, then it is clear that
>> | the
>> | thread pool will quickly exceed its max size.
>> |
>> | A simple solution would be to prevent invoking blocking RPCs *from
>> | within* a received RPC. Let's take a look at an example:
>> | - A invokes a blocking PUT-1 on B
>> | - B forwards the request as blocking PUT-2 to C and D
>> | - When PUT-2 returns and B gets the responses from C and D (or the
>> | first
>> | one to respond, don't know exactly how this is implemented), it sends
>> | the response back to A (PUT-1 terminates now at A)
>> |
>> | We could change this to the following:
>> | - A invokes a blocking PUT-1 on B
>> | - B receives PUT-1. Instead of invoking a blocking PUT-2 on C and D,
>> | it
>> | does the following:
>> |   - B invokes PUT-2 and gets a future
>> |   - B adds itself as a FutureListener, and it also stores the
>> | address of the original sender (A)
>> |   - When the FutureListener is invoked, B sends back the result
>> |   as a
>> | response to A
>> | - Whenever a member leaves the cluster, the corresponding futures are
>> | cancelled and removed from the hashmaps
>> |
>> | This could probably be done differently (e.g. by sending asynchronous
>> | messages and implementing a finite state machine), but the core of
>> | the
>> | solution is the same; namely to avoid having an incoming thread block
>> | on
>> | a sync RPC.
>> |
>> | Thoughts ?
>> |
>> |
>> |
>> |
>> | On 2/1/13 9:04 AM, Radim Vansa wrote:
>> | > Hi guys,
>> | >
>> | > after dealing with the large cluster for a while I find the way how
>> | > we use OOB threads in synchronous configuration non-robust.
>> | > Imagine a situation where node which is not an owner of the key
>> | &

Re: [infinispan-dev] Protecting ourselves against naive JSR-107 usages in app server environments

2013-02-08 Thread Dan Berindei

On Fri, Feb 8, 2013 at 3:41 PM, Galder Zamarreño  wrote:

> Hi all,
>
> We've got a small class loading puzzle to solve in our JSR-107
> implementation.
>
> JSR-107 has a class called Caching which keeps a singleton enum reference
> (AFAIK, has same semantics as static) to the systemt's CacheManagerFactory,
> which in our case it would be InfinispanCacheManagerFactory:
>
> https://github.com/jsr107/jsr107spec/blob/master/src/main/java/javax/cache/Caching.java
>
> A naive user of JSR-107 could decide to use this Caching class in an app
> server environment and get a reference to the CMF through it, which could
> cause major classloading issues if we don't protect ourselves.
>
> Within out CMF implementation, we need to keep some kind of mapping which
> given a name *and* a classloader, which can find the CacheManager instance
> associated to it.
>
> This poses a potential risk of a static strong reference being held
> indirectly on the classloader associated with the Infinispan Cache Manager
> (amongst other sensible components...).
>
> One way to break this strong reference is for CMF implementation to hold a
> weak reference on the CM as done here:
>
> https://github.com/galderz/infinispan/blob/t_2639/jsr107/src/main/java/org/infinispan/jsr107/cache/InfinispanCacheManagerFactory.java#L56
>
> This poses a problem though in that the Infinispan Cache Manager can be
> evicted from memory without it's stop/shutdown method being called, leading
> to resources being left open (i.e. jgroups, jmx…etc).
>
> The only safe way to deal with this that I've thought so far is to have a
> finalyze() method in InfinispanCacheManager (JSR-107 impl of CacheManager)
> that makes sure this cache manager is shut down. I'm fully aware this is an
> expensive operation, but so far is the only way I can see in which we can
> avoid leaking stuff, while not affecting the actual Infinispan core module.
>
> I've found a good example of this in
> https://github.com/jbossas/jboss-as/blob/master/controller-client/src/main/java/org/jboss/as/controller/client/impl/RemotingModelControllerClient.java-
>  It even tracks creation time so that if all references to
> InfinispanCacheManager are lost but the ICM instance is not closed, it will
> print a warm message.
>
> If anyone has any other thoughts, it'd be interesting to hear about them.
>
>

The Caching javadoc seems to prohibit stopping the CacheManagers without
user intervention (
https://github.com/jsr107/jsr107spec/blob/master/src/main/java/javax/cache/Caching.java#L35
):

 * Also keeps track of all CacheManagers created by the factory.
Subsequent calls
 * to {@link #getCacheManager()} return the same CacheManager.


And in the javadoc of Caching.close() (
https://github.com/jsr107/jsr107spec/blob/master/src/main/java/javax/cache/Caching.java#L153
):

 * All cache managers obtained from the factory are shutdown.
 * 
 * Subsequent requests from this factory will return different
cache managers than would have been obtained before
 * shutdown. So for example

 * 
 *  CacheManager cacheManager = CacheFactory.getCacheManager();
 *  assertSame(cacheManager, CacheFactory.getCacheManager());
 *  CacheFactory.close();
 *  assertNotSame(cacheManager, CacheFactory.getCacheManager());
 * 

We can't guarantee that getCacheManager() will return the same instance
unless we keep a hard reference to it in our CacheManagerFactory. So I
think the only option is to add a finalize() method to CacheManagerFactory
that will stop all the CacheManagers if the user didn't explicitly call
Caching.close().

Cheers
Dan



> Cheers,
> --
> Galder Zamarreño
> gal...@redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] UserTransactionLookup for JSR-107?

2013-02-08 Thread Dan Berindei

Galder, the CacheManager.getUserTransaction() javadoc says the method
should return "a" UserTransaction. It doesn't mandate any connection with
any active TM, in fact based on this issue I think Ehcache will always
return their own UserTransaction object:
https://github.com/jsr107/jsr107spec/issues/28

The Javadoc of the methods in UserTransaction and in TransactionManager in
the standard are identical. So if they don't behave the same, that's a bug
in the TM.

Case in point, I just looked at JBossTS version 4.16.3 and
UserTransactionImple doesn't do anything except extend BaseTransaction.
TransactionManagerImple also extends BaseTransaction and adds a few extra
methods. BaseTransaction doesn't have any instance state (it's all statics
and thread locals), so UserTransactionImple and TransactionManagerImple are
identical except for the extra methods in TransactionManagerImple.

Cheers
Dan



On Fri, Feb 8, 2013 at 11:18 AM, Galder Zamarreño  wrote:

> I'm no transactions expert, but I did consider that and I highly doubt
> it's that simple.
>
> Even if it might probably just work, you'll never be able to guarantee
> that such UserTransaction behaves just like You-Fav-JTATM-UserTransaction
> without throrough testing.
>
> Go to your IDE (dunno where JBoss TS source code is online…) and open up:
> - com.arjuna.ats.internal.jta.transaction.arjunacore.UserTransactionImple
> - com.arjuna.ats.internal.jta.transaction.arjunacore.BaseTransaction
>
> Cheers,
>
> On Feb 7, 2013, at 2:17 PM, Dan Berindei  wrote:
>
> > Do we really need to expose the TransactionManager's UserTransaction
> implementation?
> >
> > Looking at the interface, it seems like a subset of TransactionManager,
> so couldn't we return a custom UserTransaction that just delegates to the
> TransactionManager?
> >
> >
> > On Thu, Feb 7, 2013 at 2:06 PM, Manik Surtani 
> wrote:
> > Ok.  Then a separate Lookup is what we'd need, I guess.  Not pretty, but
> oh well.
> >
> > On 7 Feb 2013, at 11:41, Galder Zamarreño  wrote:
> >
> > >
> > > On Feb 7, 2013, at 12:31 PM, Manik Surtani 
> wrote:
> > >
> > >>
> > >> On 7 Feb 2013, at 11:23, Galder Zamarreño  wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> I'm back with a more food for thought wrt JSR-107 impl. Our
> CacheManager adapter needs to implement:
> > >>>
> > >>> UserTransaction getUserTransaction();
> > >>>
> > >>> The problem there is that there's no standard way of getting a
> UserTransaction given a JTA TransactionManager.
> > >>>
> > >>> It really is down to each TransactionManager provider to give a
> UserTransaction instance (whether JNDI, static…etc).
> > >>>
> > >>> So, we need a way to lookup a UserTransaction.
> > >>>
> > >>> One option is to add a getUserTransaction to
> TransactionManagerLookup, but that will break existing clients.
> > >>
> > >> You mean it would break existing TML implementations?
> > >
> > > ^ Yeah, potentially yeah.
> > >
> > >> Do we know of any custom TML implementations though?
> > >
> > > Yes:
> > > -
> https://github.com/hibernate/hibernate-orm/blob/master/hibernate-infinispan/src/main/java/org/hibernate/cache/infinispan/tm/HibernateTransactionManagerLookup.java
> > > -
> https://github.com/jbossas/jboss-as/blob/master/clustering/infinispan/src/main/java/org/jboss/as/clustering/infinispan/TransactionManagerProvider.java
> > >
> > >>
> > >>>
> > >>> Alternatively, define a
> org.infinispan.transaction.lookup.UserTransactionLookup interface which is
> configurable. We'd then need to implement for existing TML classes.
> > >>>
> > >>> If anyone has any other ideas, let us know.
> > >>>
> > >>> Cheers,
> > >>> --
> > >>> Galder Zamarreño
> > >>> gal...@redhat.com
> > >>> twitter.com/galderz
> > >>>
> > >>> Project Lead, Escalante
> > >>> http://escalante.io
> > >>>
> > >>> Engineer, Infinispan
> > >>> http://infinispan.org
> > >>>
> > >>>
> > >>> ___
> > >>> infinispan-dev mailing list
> > >>> infinispan-dev@lists.jboss.org
> > >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>

Re: [infinispan-dev] Trigger NBST for Auto-Placer

2013-02-12 Thread Dan Berindei

Hi Pedro

When I split off the RebalancePolicy I was thinking that when a
RebalancePolicy needs to collaborate with a ConsistentHashFactory, they
should do so via another cache manager-scoped component. But that doesn't
really work (yet?), because ConsistentHashFactory can't access any
components.

I think it would be better to extend
ClusterTopologyManager.triggerRebalance (and
ConsistentHashFactory.rebalance) to accept an arbitrary Object parameter.
Then RebalancePolicy could use this parameter to pass extra information to
the CHF, like your Mappings object, and then when
ClusterTopologyManagerImpl asks for a balanced CH, the CHF will include the
Mappings in the result CH. What do you think?

In order to trigger the rebalance you have to call startRebalance, and the
new ("balanced") consistent hash must not be equal to the existing
consistent hash. See
https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/topology/ClusterTopologyManagerImpl.java#L389

Cheers
Dan

On Thu, Feb 7, 2013 at 10:05 PM, Pedro Ruivo  wrote:

> Hi,
>
> I'm working in a way to rebase auto-placer on top of NBST and I have one
> question...
> If you have already forgot, auto-placer analyzes the workload and tries
> to move the most remote accessed keys to the corresponding requester.
>
> After calculating the new mappings, I want to trigger the NBST with this
> mapping. I'm thinking to add a new method in the ClusterTopologyManager,
> something like:
>
> triggerAutoPlacer(String cacheName, Mappings newMappings);
>
> and this method it will be a duplicate of triggerRebalance but instead
> of doing chFactory.rebalance(CH) (in the startRebalance() method) I'm
> thinking to do chFactory.autoPlacer(CH, Mappings). The last method will
> override the defautl CH location.
>
> Question: will this solution trigger the NBST or do I have to create the
> triggerAutoPlacer() method in another class?
>
> ps. forget the methods names... I will think in better names later
>
> Thanks!!
>
> Cheers,
> Pedro
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Trigger NBST for Auto-Placer

2013-02-12 Thread Dan Berindei

Sorry, I didn't read your code so I just assumed you're writing your own
RebalancePolicy.

I think you need to implement your own RebalancePolicy, because
ClusterTopologyManagerImpl by itself doesn't remember that a rebalance was
triggerred. So if you call startRebalance, but there is already a rebalance
in progress, it is just ignored. When the in-progress rebalance finishes,
it calls RebalancePolicy.updateCacheStatus, and it's the RebalancePolicy
implementation's job to start a new rebalance if needed.


On Tue, Feb 12, 2013 at 5:28 PM, Pedro Ruivo  wrote:

> **
> Hi Dan,
>
>
> On 2/12/13 3:12 PM, Dan Berindei wrote:
>
>  Hi Pedro
>
>  When I split off the RebalancePolicy I was thinking that when a
> RebalancePolicy needs to collaborate with a ConsistentHashFactory, they
> should do so via another cache manager-scoped component. But that doesn't
> really work (yet?), because ConsistentHashFactory can't access any
> components.
>
> I didn't understand the previous sentence... Do I need to invoke anything
> in the RebalancePolicy?
>
> So far, I'm invoking directly in the ClusterTopologyManager:
> https://github.com/pruivo/infinispan/blob/cloudtm_v2/core/src/main/java/org/infinispan/dataplacement/DataPlacementManager.java#L246
>
> Thanks!
>
> Cheers,
> Pedro
>
>
>  I think it would be better to extend
> ClusterTopologyManager.triggerRebalance (and
> ConsistentHashFactory.rebalance) to accept an arbitrary Object parameter.
> Then RebalancePolicy could use this parameter to pass extra information to
> the CHF, like your Mappings object, and then when
> ClusterTopologyManagerImpl asks for a balanced CH, the CHF will include the
> Mappings in the result CH. What do you think?
>
>  In order to trigger the rebalance you have to call startRebalance, and
> the new ("balanced") consistent hash must not be equal to the existing
> consistent hash. See
> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/topology/ClusterTopologyManagerImpl.java#L389
>
>  Cheers
>  Dan
>
>
>
>
> On Thu, Feb 7, 2013 at 10:05 PM, Pedro Ruivo wrote:
>
>> Hi,
>>
>> I'm working in a way to rebase auto-placer on top of NBST and I have one
>> question...
>> If you have already forgot, auto-placer analyzes the workload and tries
>> to move the most remote accessed keys to the corresponding requester.
>>
>> After calculating the new mappings, I want to trigger the NBST with this
>> mapping. I'm thinking to add a new method in the ClusterTopologyManager,
>> something like:
>>
>> triggerAutoPlacer(String cacheName, Mappings newMappings);
>>
>> and this method it will be a duplicate of triggerRebalance but instead
>> of doing chFactory.rebalance(CH) (in the startRebalance() method) I'm
>> thinking to do chFactory.autoPlacer(CH, Mappings). The last method will
>> override the defautl CH location.
>>
>> Question: will this solution trigger the NBST or do I have to create the
>> triggerAutoPlacer() method in another class?
>>
>> ps. forget the methods names... I will think in better names later
>>
>> Thanks!!
>>
>> Cheers,
>> Pedro
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
>
> ___
> infinispan-dev mailing 
> listinfinispan-dev@lists.jboss.orghttps://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Trigger NBST for Auto-Placer

2013-02-12 Thread Dan Berindei

Actually I'd like regular users to be able to define their own
RebalancePolicies and ConsistentHashFactories, without modifying
ClusterTopologyManagerImpl/ClusterCacheStatus, so I see this as a good
opportunity to modify our implementation to allow it.


On Tue, Feb 12, 2013 at 6:04 PM, Pedro Ruivo  wrote:

> **
> Can I modify the existing one?
>
> I'm thinking in the following:
>
> ClusterTopologyManagerImpl.handleNewMappings(...) { //new method
>   ClusterCacheStatus status = //get status for cache name
>   status.setNewMappings(...) //synchronized of course
>   rebalancePolicy.updateCacheStatus(...);
> }
>
>
Like I said, I'd like to keep ClusterTopologyManager as generic as possible
wrt rebalance strategies, so I think your DataPlacementManager should call
your custom RebalancePolicy directly.

The rebalance policy could keep the new mappings in a map on its own,
although maybe it would be a nice touch to allow storing custom state in
ClusterCacheStatus.



>  DefaultRebalancePolicy.updateCacheStatus(...) { //modified
>   ...
>   if (!status.hasJoiners() && isBalanced(...) && !status.hasNewMappings())
> { //added last condition
> return;
>   }
>   ...
> }
>

I guess you'd also need to clean up the old "new mappings" here after the
rebalance is done.



>
> ClusterTopologyManagerImpl.startRebalance(...) { //modifed
>   ...
>   chFactory.rebalance(ch);
>   chFactory.applyMappings(ch, status.getNewMappings()); //added.
>   ... //if it is the same ch, no state transfer is triggered
> }
>
>
This would require ClusterTopologyManagerImpl to know about your custom
ConsistentHashFactory, and it wouldn't work with another
ConsistentHashFactory that requires different custom data. So I'd rather we
add a generic parameter to ConsistentHashFactory.rebalance.


What do you think?
>
> Thanks,
> Pedro
>
>
> On 2/12/13 3:39 PM, Dan Berindei wrote:
>
>  Sorry, I didn't read your code so I just assumed you're writing your own
> RebalancePolicy.
>
>  I think you need to implement your own RebalancePolicy, because
> ClusterTopologyManagerImpl by itself doesn't remember that a rebalance was
> triggerred. So if you call startRebalance, but there is already a rebalance
> in progress, it is just ignored. When the in-progress rebalance finishes,
> it calls RebalancePolicy.updateCacheStatus, and it's the RebalancePolicy
> implementation's job to start a new rebalance if needed.
>
>
> On Tue, Feb 12, 2013 at 5:28 PM, Pedro Ruivo wrote:
>
>>  Hi Dan,
>>
>>
>> On 2/12/13 3:12 PM, Dan Berindei wrote:
>>
>>  Hi Pedro
>>
>>  When I split off the RebalancePolicy I was thinking that when a
>> RebalancePolicy needs to collaborate with a ConsistentHashFactory, they
>> should do so via another cache manager-scoped component. But that doesn't
>> really work (yet?), because ConsistentHashFactory can't access any
>> components.
>>
>>  I didn't understand the previous sentence... Do I need to invoke
>> anything in the RebalancePolicy?
>>
>> So far, I'm invoking directly in the ClusterTopologyManager:
>> https://github.com/pruivo/infinispan/blob/cloudtm_v2/core/src/main/java/org/infinispan/dataplacement/DataPlacementManager.java#L246
>>
>> Thanks!
>>
>> Cheers,
>> Pedro
>>
>>
>>  I think it would be better to extend
>> ClusterTopologyManager.triggerRebalance (and
>> ConsistentHashFactory.rebalance) to accept an arbitrary Object parameter.
>> Then RebalancePolicy could use this parameter to pass extra information to
>> the CHF, like your Mappings object, and then when
>> ClusterTopologyManagerImpl asks for a balanced CH, the CHF will include the
>> Mappings in the result CH. What do you think?
>>
>>  In order to trigger the rebalance you have to call startRebalance, and
>> the new ("balanced") consistent hash must not be equal to the existing
>> consistent hash. See
>> https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/topology/ClusterTopologyManagerImpl.java#L389
>>
>>  Cheers
>>  Dan
>>
>>
>>
>>
>> On Thu, Feb 7, 2013 at 10:05 PM, Pedro Ruivo wrote:
>>
>>> Hi,
>>>
>>> I'm working in a way to rebase auto-placer on top of NBST and I have one
>>> question...
>>> If you have already forgot, auto-placer analyzes the workload and tries
>>> to move the most remote accessed keys to the corresponding requester.
>>>
>>> After calculating the new mappings, I want to trigger the NBST with this
>>> ma

Re: [infinispan-dev] Protecting ourselves against naive JSR-107 usages in app server environments

2013-02-14 Thread Dan Berindei

On Thu, Feb 14, 2013 at 4:43 PM, Galder Zamarreño  wrote:

>
>
> On Feb 8, 2013, at 3:35 PM, Dan Berindei  wrote:
>
> >
> >
> >
> > On Fri, Feb 8, 2013 at 3:41 PM, Galder Zamarreño 
> wrote:
> > Hi all,
> >
> > We've got a small class loading puzzle to solve in our JSR-107
> implementation.
> >
> > JSR-107 has a class called Caching which keeps a singleton enum
> reference (AFAIK, has same semantics as static) to the systemt's
> CacheManagerFactory, which in our case it would be
> InfinispanCacheManagerFactory:
> >
> https://github.com/jsr107/jsr107spec/blob/master/src/main/java/javax/cache/Caching.java
> >
> > A naive user of JSR-107 could decide to use this Caching class in an app
> server environment and get a reference to the CMF through it, which could
> cause major classloading issues if we don't protect ourselves.
> >
> > Within out CMF implementation, we need to keep some kind of mapping
> which given a name *and* a classloader, which can find the CacheManager
> instance associated to it.
> >
> > This poses a potential risk of a static strong reference being held
> indirectly on the classloader associated with the Infinispan Cache Manager
> (amongst other sensible components...).
> >
> > One way to break this strong reference is for CMF implementation to hold
> a weak reference on the CM as done here:
> >
> https://github.com/galderz/infinispan/blob/t_2639/jsr107/src/main/java/org/infinispan/jsr107/cache/InfinispanCacheManagerFactory.java#L56
> >
> > This poses a problem though in that the Infinispan Cache Manager can be
> evicted from memory without it's stop/shutdown method being called, leading
> to resources being left open (i.e. jgroups, jmx…etc).
> >
> > The only safe way to deal with this that I've thought so far is to have
> a finalyze() method in InfinispanCacheManager (JSR-107 impl of
> CacheManager) that makes sure this cache manager is shut down. I'm fully
> aware this is an expensive operation, but so far is the only way I can see
> in which we can avoid leaking stuff, while not affecting the actual
> Infinispan core module.
> >
> > I've found a good example of this in
> https://github.com/jbossas/jboss-as/blob/master/controller-client/src/main/java/org/jboss/as/controller/client/impl/RemotingModelControllerClient.java-
>  It even tracks creation time so that if all references to
> InfinispanCacheManager are lost but the ICM instance is not closed, it will
> print a warm message.
> >
> > If anyone has any other thoughts, it'd be interesting to hear about them.
> >
> >
> >
> > The Caching javadoc seems to prohibit stopping the CacheManagers without
> user intervention (
> https://github.com/jsr107/jsr107spec/blob/master/src/main/java/javax/cache/Caching.java#L35
> ):
> >
> >  * Also keeps track of all CacheManagers created by the factory.
> Subsequent calls
> >  * to {@link #getCacheManager()} return the same CacheManager.
> >
> >
> >
> >
> > And in the javadoc of Caching.close() (
> https://github.com/jsr107/jsr107spec/blob/master/src/main/java/javax/cache/Caching.java#L153
> ):
> >  * All cache managers obtained from the factory are shutdown.
> >  * 
> >  * Subsequent requests from this factory will return different cache
> managers than would have been obtained before
> >
> >
> >
> >  * shutdown. So for example
> >  * 
> >  *  CacheManager cacheManager = CacheFactory.getCacheManager();
> >
> >
> >
> >  *  assertSame(cacheManager, CacheFactory.getCacheManager());
> >  *  CacheFactory.close();
> >
> >
> >
> >  *  assertNotSame(cacheManager, CacheFactory.getCacheManager());
> >  * 
> >
> > We can't guarantee that getCacheManager() will return the same instance
> unless we keep a hard reference to it in our CacheManagerFactory. So I
> think the only option is to add a finalize() method to CacheManagerFactory
> that will stop all the CacheManagers if the user didn't explicitly call
> Caching.close().
>
> A finalize() in CacheManagerFactory does not solve the problem since
> there's still a hard reference to the CacheManagerFactory impl from
> Caching, and as long as that's not cleared, finalize() won't be executed,
> so you're still exposed to a potential leak.
>
>
Yeah, that's true.

But note that the opposite is also possible: the user can call
Caching.close() from one web app and it will close all the cache managers
opened from any other web app. I doubt we can protect ourselves a

Re: [infinispan-dev] NBST test failures in Query

2013-02-15 Thread Dan Berindei

Sanne, what are the failing tests?

ISPN-2628 only mentions
MultiNodeReplicatedTest.testIndexingWorkDistribution, is there any other?


On Fri, Feb 15, 2013 at 1:44 PM, Sanne Grinovero wrote:

> Thanks Anna, good to see it's logged.
>
> All, this is not just a simple test having inaccurate timing related
> code, it's a critical regression in functionality. Another example of
> why failing tests need the highest attention.
>
> Cheers,
> Sanne
>
> On 15 February 2013 08:36, Anna Manukyan  wrote:
> > Hi Sanne,
> >
> > FYI, there is a bug reported for the mentioned issue:
> > https://issues.jboss.org/browse/ISPN-2648
> >
> > Regards,
> > Anna
> >
> > - Original Message -
> > From: "Sanne Grinovero" 
> > To: "infinispan -Dev List" 
> > Sent: Thursday, February 14, 2013 7:58:08 PM
> > Subject: [infinispan-dev] NBST test failures in Query
> >
> > I'm having a couple of tests in the Query module sporadically (but
> > often) failing because of exceptions like:
> >
> >  - Received invalid rebalance confirmation from NodeC-62354 for cache
> > ___defaultcache, we don't have a rebalance in progress
> >  - Suspected members
> >  - WARN  [InboundTransferTask] (transport-thread-1,NodeD) ISPN000210:
> > Failed to request segments [0, 2, 4, 36, 6, 42, 8, 43, 40, 10, 41, 11,
> > 12, 13, 14, 50, ...
> >  - ISPN71: Caught exception when handling command
> > CacheTopologyControlCommand ...
> >
> > Looks very much state-transfer related to me. Could anyone from the
> > NBST experts have a look?
> >
> > There are some interesting test there:
> > org.infinispan.query.distributed.MultiNodeDistributedTest
> >
> > and its extensions explicitly attempt to start additional nodes,
> > scaling up and down while performing other operations. To be fair, I
> > wish they would really do thinks in parallel, but they are politely
> > blocking and waiting for rehashing to complete at each step..
> >
> > I would expect at this point to see some tests doing operations and -
> > in parallel - have nodes added and removed. Are there no such tests in
> > core?
> >
> > Sanne
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] More verbose logging

2013-02-20 Thread Dan Berindei

On Wed, Feb 20, 2013 at 11:49 AM, Mircea Markus  wrote:

> I always liked this idea of categories but never saw it at use. Are there
> any projects that use this logging approach?
>
> On 20 Feb 2013, at 09:57, Sanne Grinovero wrote:
>
> +1 for using categories
>
> We could even experiment combining multiple categories, for example in
> this case you could have a "RPCDispatcher" category and also have a
> "RPCDispatcher.includeCacheEntries" which will make descriptions
> more/less verbose.
>
>
> That's not what I understand by a category - "logical process" as defined
> by David. I consider "Remoting" or "Rehashing" a category, but
> RPCDIspatcher is just an entity (too fine grained)
> and RPCDispatcher.includeCacheEntries even more so.
>
> Also that wouldn't necessarily solve the problem Manik raised: in this
> particular case the toString of StateResponseCommand is huge. Adrian/Dan is
> this needed for debugging state transfer issues? If so +1 for managing it
> with the verbose flag.
>
>
Yeah, we couldn't introduce a RpcDispatcher.includeCacheEntries category
anyway because the cache entries are included in the command's toString() -
the logger can't do anything to filter them out.

I think we could eliminate the cache entries from
StateResponseCommand.toString() (and the segment owners from
ConsistentHash.toString()), and only log them separately, under a different
category/class name.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Staggering remote GET calls

2013-02-20 Thread Dan Berindei

Radim, just to be sure, you are testing embedded mode with RadarGun, right?
With HotRod most of the get operations should be initiated from the main
owner, so Manik's changes shouldn't make a big difference in the number of
active threads.

How about throughput, has it also improved compared to 5.2.0.CR3, or is it
the same?



On Wed, Feb 20, 2013 at 2:15 PM, Radim Vansa  wrote:

> Hi Manik,
>
> so I have tried to compile this branch and issued a 20 minute stress test
> (preceded by 10 minute warmup) on 128 nodes, where each node has 10
> stressor threads.
> While in 5.2.0.CR3 the maximum OOB threadpool size was 553 with this
> configuration, with t_825 it was 219. This looks good, but it's actually
> better :). When I looked on the per-node maximum, in t_825 there was only
> one node with the 219 threads (as the max), others were usually around 25,
> few around 40. On the contrary, in 5.2.0.CR3 all the nodes had maximum
> around 500!
>
> Glad to bring good news :)
>
> Radim
>
> - Original Message -
> | From: "Manik Surtani" 
> | To: "infinispan -Dev List" , "Radim
> Vansa" 
> | Sent: Tuesday, February 19, 2013 6:33:04 PM
> | Subject: Staggering remote GET calls
> |
> | Guys,
> |
> | I have a topic branch with a fix for ISPN-825, to stagger remote GET
> | calls.  (See the JIRA for details on this patch).
> |
> | This should have an interesting effect on greatly reducing the
> | pressure on the OOB thread pool.  This isn't a *real* fix for the
> | problem that Radim reported (Pedro is working on that with Bela),
> | but reducing pressure on the OOB thread pool is a side effect of
> | this fix.
> |
> | It should generally make things faster too, with less traffic on the
> | network.  I'd be curious for you to give this branch a try, Radim -
> | see how it impacts your tests.
> |
> | https://github.com/maniksurtani/infinispan/tree/t_825
> |
> | Cheers
> | Manik
> | --
> | Manik Surtani
> | ma...@jboss.org
> | twitter.com/maniksurtani
> |
> | Platform Architect, JBoss Data Grid
> | http://red.ht/data-grid
> |
> |
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Blocking issue in TO State Transfer

2013-02-26 Thread Dan Berindei

On Tue, Feb 26, 2013 at 12:57 PM, Pedro Ruivo  wrote:

> hi,
>
> I found the blocking problem with the state transfer this morning. It
> happens because of the reordering of a regular and OOB message.
>
> Below, is a simplification of what is happening for two nodes
>
> A: total order broadcasts rebalance_start
>
> B: (incoming thread) delivers rebalance_start
> B: has no segments to request so the rebalance is done
> B: sends async request with rebalance_confirm (unicast #x)
> B: sends the rebalance_start response (unicast #x+1) (the response is a
> regular message)
>
> A: receives rebalance_start response (unicast #x+1)
> A: in UNICAST2, it detects the message is out-of-order and blocks the
> response in the sender window (i.e. the message #x is missing)
> A: receives the rebalance_confirm (unicast #x)
> A: delivers rebalance_confirm. Infinispan blocks this command until all
> the rebalance_start responses are received ==> this originates a deadlock!
> (because the response is blocked in unicast layer)
>
> Question: can the request's response message be sent always as OOB? (I
> think the answer should be no...)
>
>
We could, if Bela adds the send(Message) method to the Response
interface... and personally I think it would be better to make all
responses OOB (as in JGroups 3.2.x). I don't have any data to back this up,
though...



> My suggestion: when I deliver a rebalance_confirm command (that it is send
> async), can I move it to a thread in async_thread_pool_executor?
>
>
I have WIP fix for https://issues.jboss.org/browse/ISPN-2825, which should
stop blocking the REBALANCE_CONFIRM commands on the coordinator:
https://github.com/danberindei/infinispan/tree/t_2825_m

I haven't issued a PR yet because I'm still getting a failure in
ClusterTopologyManagerTest, I think because of a JGroups issue (RSVP not
receiving an ACK from itself). I'll let you know when I find out...



> Weird thing: last night I tried more than 5x time in a row with UNICAST3
> and it never blocks. can this meaning a problem with UNICAST3 or I had just
> lucky?
>
>
Even though the REBALANCE_CONFIRM command is sent async, the message is
still OOB. I think UNICAST/2/3 should not block any regular message waiting
for the processing of an OOB message, as long as that message was received,
so maybe the problem is in UNICAST2?



> Any other suggestion?
>
> Cheers,
> Pedro
>
>
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Blocking issue in TO State Transfer

2013-02-27 Thread Dan Berindei

On Wed, Feb 27, 2013 at 11:13 AM, Bela Ban  wrote:

> OK, here's what happens:
>
> - A's receiver table forB is at #6, this means that next message from B
> must be #7
> - A receives B#8 (regular message from B)
> - A adds B#8 to B's receiver table, but doesn't deliver it (not OOB,and
> not #7)
> - A receives OOB message B#7 from B
> - The OOB thread delivers B#7 immediately
> - Infinispan blocks on B#7
> - Unless another message from B is received, B#8 will *not* get
> delivered: as you can see in the codebelow, the OOB thread would check
> *after* delivering B#7 if there are more messages to be delivered, but
> because it is blocked by Infinispan, it cannot deliver B#8.
>
> This is one of the rare cases where an OOB thread gets to deliver
> regular messages.
>
> The root cause is that Infinispan blocks on an OOB message; but OOB
> messages should never block! This is another reason why an Infinispan
> application thread pool makes a lot of sense !
>
>
I wonder who first added sync mode and locking in JBossCache ;)


>
>  // An OOB message is passed up immediately. Later, when remove() is
> called, we discard it. This affects ordering !
>  // http://jira.jboss.com/jira/browse/JGRP-377
>  if(msg.isFlagSet(Message.OOB) && added) {
>  try {
>  up_prot.up(evt);
>  }
>  catch(Throwable t) {
>  log.error("couldn't deliver OOB message " + msg, t);
>  }
>  }
>
>  //The OOB thread never gets here as it is blocked in
> up_prot.up()by Infinispan.
>
>  final AtomicBoolean processing=win.getProcessing();
>  if(!processing.compareAndSet(false, true))
>  return true;
>
>
>
> On 2/26/13 7:35 PM, Pedro Ruivo wrote:
> > On 02/26/2013 04:31 PM, Bela Ban wrote:
> >> On 2/26/13 5:14 PM, Pedro Ruivo wrote:
> >>> So, in this case, the regular message will block until the OOB
> >>> message is delivered.
> >>
> >> No, the regular message should get delivered as soon as the OOB message
> >> has been *received* (not *delivered*). Unless there are previous regular
> >> messages from the same sender which are delivered in the same thread,
> >> and one of them is blocked in application code...
> > In attachment is part of the log. I only know that the response is
> > disappearing between UNICAST2 and the ISPN unmarshaller.
> >
> > could you please take a look?
> >
> > the response is being sent and received and I don't understand why
> > ISPN is not receive it
> >
> > Thanks
> > Pedro
> >>
> >>
> >>> however, the OOB message is being block in the application
> >>> until the regular message is delivered. And there is no way to pick the
> >>> regular message from the window list while the OOB is blocked, right?
> >>> (assuming no more incoming messages)
> >> This actually should happen, as they're delivered by different threads !
> >>
> >>
> >>> so, if everybody agrees, if I move the OOB message to another thread,
> >>> everything should work fine...
> >>>
> >>> On 02/26/2013 03:50 PM, Bela Ban wrote:
> >>>> On 2/26/13 4:15 PM, Dan Berindei wrote:
> >>>>> On Tue, Feb 26, 2013 at 12:57 PM, Pedro Ruivo  >>>>> <mailto:pe...@infinispan.org>> wrote:
> >>>>>
> >>>>>   hi,
> >>>>>
> >>>>>   I found the blocking problem with the state transfer this
> >>>>> morning.
> >>>>>   It happens because of the reordering of a regular and OOB
> >>>>> message.
> >>>>>
> >>>>>   Below, is a simplification of what is happening for two nodes
> >>>>>
> >>>>>   A: total order broadcasts rebalance_start
> >>>>>
> >>>>>   B: (incoming thread) delivers rebalance_start
> >>>>>   B: has no segments to request so the rebalance is done
> >>>>>   B: sends async request with rebalance_confirm (unicast #x)
> >>>>>   B: sends the rebalance_start response (unicast #x+1) (the
> >>>>> response
> >>>>>   is a regular message)
> >>>>>
> >>>>>   A: receives rebalance_start response (unicast #x+1)
> >>>>>   A: in UNICAST2, it detects the message is out-of-order and
> >>>>> bloc

Re: [infinispan-dev] Message batching in JGroups

2013-02-27 Thread Dan Berindei

I'm ok with not having a receive(MessageBatch) method, I don't see any
benefit for Infinispan either.


On Wed, Feb 27, 2013 at 6:44 PM, Bela Ban  wrote:

> I'm not sure adding receive(MessageBatch) to Receiver and to UpHandler
> is a benefit to applications. The current implementation simply calls
> receive(Message) or up(new Event(Event.MSG, msg)), so each message in a
> batch is delivered in turn.
>
> I thought - if it turns out we need this - I can always add it later,
> .e.g. in 4.0 where API breakage is allowed.
>
> Thoughts ? I know this is pretty knew, so folks have probably not yet
> played with this feature...
>
> [1] https://issues.jboss.org/browse/JGRP-1581
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] ISPN-2808 - thread pool for incoming message [feedback]

2013-02-28 Thread Dan Berindei

On Thu, Feb 28, 2013 at 2:18 PM, Mircea Markus  wrote:

>
> On 27 Feb 2013, at 19:06, Pedro Ruivo wrote:
>
> Hi all,
>
> I'm working on ISPN-2808 and I want some feedback about it (code is here
> [1])
>
> I'm starting to implement this feature but I know that Asynchronous
> Invocation API is not totally finished in JGroups.
>
> My idea in to use an executor service in CommandAwareRpcDispatcher (CARD)
> and when a request (command) is received, it checks if it is useful to move
> the command execution to another thread (in this line [2])
>
> For now, I'm thinking to move all the write commands, lock control
> command, prepare command and commit command to the executor service (Note:
> commit command is only moved when in DIST mode and L1 is enabled).
>
>
> you might want to move Commit there when we have a tx cache and cache
> store - it's during the commit where the data is written to the cache store
> and that might take time.
>
>
RollbackCommand can block as well, if it needs to be forwarded to other
nodes.


>
> first question: do you think it is fine to move the commands to the
> executor service in CARD or should I move this functionally to the
> InvoundHandler?
>
> +1 for the InboundInvocationHandler: with ISPN-2849 we'll build the tx
> dependency right before invoking the interceptor chain (potentially in a
> new interceptor), so i think the closer you move it to the interceptor
> chain the better.
>
> second question: do you have in mind other commands may block the
> OOB/Regular thread and should be moved to a thread in the executor service?
>
>
> Generally all the commands that are long-processing(lock acquisition or
> interaction with a cache store) would be better executed in this pool in
> order to avoid the OOB/regular thread pool to deadlock.
> Looking at the command hierarchy for long processing commands:
> - StateResponseCommand  seems to be a good candidate as it might acquire
> locks
> -IndexUpdateCommand/ClusterQueryCommand - I'll let Sanne comment on these
> two, which might require an update on query module as well
> - MapCombineCommand if a cache loader is present (it iterates over the
> entries in the loader)
> Dan/Adrian care to comment on the CacheTopologyControlCommand?
>
>
Yes, CacheTopologyControlCommand can definitely block.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] ISPN-2808 - thread pool for incoming message [feedback]

2013-02-28 Thread Dan Berindei

Actually some of the commands you mentioned don't go through the
interceptor chain (CacheTopologyControlCommand, StateRequestCommand,
StateRequestCommand etc.) so you can't use an interceptor to move them to a
separate thread pool.


On Thu, Feb 28, 2013 at 5:47 PM, Mircea Markus  wrote:

>
> On 28 Feb 2013, at 15:31, Pedro Ruivo wrote:
>
> >>
> >> On 27 Feb 2013, at 19:06, Pedro Ruivo wrote:
> >>
> >>> Hi all,
> >>>
> >>> I'm working on ISPN-2808 and I want some feedback about it (code is
> >>> here [1])
> >>>
> >>> I'm starting to implement this feature but I know that Asynchronous
> >>> Invocation API is not totally finished in JGroups.
> >>>
> >>> My idea in to use an executor service in CommandAwareRpcDispatcher
> >>> (CARD) and when a request (command) is received, it checks if it is
> >>> useful to move the command execution to another thread (in this line
> [2])
> >>>
> >>> For now, I'm thinking to move all the write commands, lock control
> >>> command, prepare command and commit command to the executor service
> >>> (Note: commit command is only moved when in DIST mode and L1 is
> enabled).
> >>
> >> you might want to move Commit there when we have a tx cache and cache
> >> store - it's during the commit where the data is written to the cache
> >> store and that might take time.
> >>
> >>> first question: do you think it is fine to move the commands to the
> >>> executor service in CARD or should I move this functionally to the
> >>> InvoundHandler?
> >> +1 for the InboundInvocationHandler: with ISPN-2849 we'll build the tx
> >> dependency right before invoking the interceptor chain (potentially in a
> >> new interceptor), so i think the closer you move it to the interceptor
> >> chain the better.
> > So do you think that is better to create a new interceptor to dispatch
> > the commands to the thread pool? (at least for the VisitableCommands).
> > And put this new interceptor after the InvocationContextInterceptor?
> we shouldn't create an interceptor yet, perhaps we'll do that with
> ISPN-2849.
> >
> > My opinion, it to dispatch the command to a new thread before invoking
> > command.perform() in order to avoid to move some ThreadLocal variable,
> > set by the perform() method.
> +1
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [infinispan-internal] Unstable Cluster

2013-03-04 Thread Dan Berindei

On Mon, Mar 4, 2013 at 10:28 AM, Bela Ban  wrote:

> Another node: in general, would it make sense to use shorter names ?
> E.g. instead of
>
> ** New view: [jdg-perf-01-60164|9] [jdg-perf-01-60164,
> | jdg-perf-01-24167, jdg-perf-01-53841, jdg-perf-01-39558,
> | jdg-perf-01-8977, jdg-perf-01-49115, jdg-perf-01-24774,
> | jdg-perf-01-5758, jdg-perf-01-37137, jdg-perf-01-45330,
> | jdg-perf-01-24793, jdg-perf-01-35602, jdg-perf-02-7751,
> | jdg-perf-02-37056, jdg-perf-02-50381, jdg-perf-02-53449,
> | jdg-perf-02-64954, jdg-perf-02-34066, jdg-perf-02-61515,
> | jdg-perf-02-65045 ...]
>
>
> we could have
> ** New view: [1|9] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15,
> 16, 17, 18, 19, 20, ...]
>
> This makes reading logs *much* easier than having those long names.
>
>
Yes and no... I sometimes find it useful to have a somehow longer name, as
searching/filtering for a node in the log is pretty much impossible with a
name like "1".

I also think we need the random number at the end so that we can debug
problems with node restarts. In JDG/AS7 they don't add a random number, and
it was very difficult to see what was happening when a node name appeared
twice in the consistent hash. But we could make the random number shorter.



> If we wanted the host name to be part of a cluster name, we could use
> the alphabet, e.g. A=jdk-perf-01, B=jdg-perf-02:
>
> ** New view: [A1|9][A1, A2, A3, B4, B6, C2, C3, ...]
>
> This is of course tied to a given host naming scheme. But oftentimes,
> host names include numbers, so perhaps we could use a regexp to extract
> that number and use it as a prefix to the name, e.g.
> cluster-01 first instance: 1-1
> cluster-02 2nd instance: 1-2
> etc.
>
> Thoughts ?
>
>
Are you thinking of an automatic way of assigning a letter+digit
combination to a node on startup? We also use the node name for some other
stuff (e.g. thread names), so I'm not sure if it's feasible to wait until
we have connected to the JGroups cluster to set the node name dynamically.

For RadarGun we could use a static system where we configure a node name
for each slave and then RadarGun passes the node name to Infinispan via a
system property. No Infinispan changes required (except perhaps making the
random number in the node name optional.)


Cheers
Dan



>
> On 3/4/13 8:43 AM, Radim Vansa wrote:
> > Just a small sidenote: if you want to print full view (not just first 20
> nodes and ellipsis after that), use -Dmax.list.print_size=cluster_size
> >
> > Radim
> >
> > - Original Message -
> > | From: "Shane Johnson" 
> > | To: infinispan-inter...@redhat.com
> > | Sent: Friday, March 1, 2013 5:52:17 PM
> > | Subject: Re: [infinispan-internal] Unstable Cluster
> > |
> > | The JGroups cluster appeared stable, at first. However, I did notice
> > | that the logs looked a little bit different on one machine /
> > | instance. I'm not sure if that means anything or not.
> > |
> > | Machine 1 / Instance 1-12
> > |
> > | ** New view: [jdg-perf-01-60164|9] [jdg-perf-01-60164,
> > | jdg-perf-01-24167, jdg-perf-01-53841, jdg-perf-01-39558,
> > | jdg-perf-01-8977, jdg-perf-01-49115, jdg-perf-01-24774,
> > | jdg-perf-01-5758, jdg-perf-01-37137, jdg-perf-01-45330,
> > | jdg-perf-01-24793, jdg-perf-01-35602, jdg-perf-02-7751,
> > | jdg-perf-02-37056, jdg-perf-02-50381, jdg-perf-02-53449,
> > | jdg-perf-02-64954, jdg-perf-02-34066, jdg-perf-02-61515,
> > | jdg-perf-02-65045 ...]
>
>
> --
> Bela Ban, JGroups lead (http://www.jgroups.org)
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Some missing tests during testsuite run

2013-03-06 Thread Dan Berindei

I did a search a long time ago for wrong test names:
https://issues.jboss.org/browse/ISPN-2534

It seems I missed JdbcMixedCacheStoreTest2 / JdbcMixedCacheStoreVamTest2 in
the patch for some reason, although I did mention them in the JIRA
description :(

Searching again, I see some more tests that need renaming:

./core/src/test/java/org/infinispan/profiling/ProfileTestSlave.java
./core/src/test/java/org/infinispan/xsite/TestBackupForNotSpecified.java
./query/src/test/java/org/infinispan/query/config/LegacyConfigurationAdaptorTests.java
./tools/src/test/java/org/infinispan/test/fwk/TestNameVerifier.java

Could you create a JIRA + PR for them as well?

BTW, it may be a good idea to remove the auto-correcting functionality of
TestNameVerifier, so we can enable it all the time, and instead add the
ability to check the file names against the Surefire pattern.

Cheers
Dan



On Tue, Mar 5, 2013 at 6:34 PM, Anna Manukyan  wrote:

> Hi all,
>
> during ER12 testing I've found out that there are some tests which were
> not included into ISPN testsuite run.
>
> And this issue appeared both on our JDG related jobs as well as I've
> checked the Cloudbees for the community version runs and the same situation
> is there.
>
> If the testsute run Maven command includes the -Dtest=org/infinispan/**...
> parameter (for corresponding module), then these tests are included in the
> run.
>
> The thing was that there were some failing tests, which we didn't see
> during our previous test runs.
>
> I've found out that the issue is that some of the test classes don't
> follow the naming convention for Maven (Test*.java || *Test.java ||
> *TestCase.java). Example tests are: JdbcMixedCacheStoreTest2 &
> JdbcMixedCacheStoreVamTest2 classes.
>
> So I've renamed and fixed the tests mentioned above, but I will need to
> find all tests which are under the mentioned category and rename them so
> that all existing tests run properly (they are not so much).
>
> Best regards,
> Anna.
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [infinispan-internal] Unstable Cluster

2013-03-06 Thread Dan Berindei

On Tue, Mar 5, 2013 at 6:04 PM, Bela Ban  wrote:

>
>
> On 3/5/13 3:30 PM, Erik Salter wrote:
> > Hi guys,
> >
> > Keep in mind that some of your customers may have built queries and
> indexes
> > on cluster names on top of very expensive analytics engines.
>
>
> Well, if they use their own naming (JChannel.setName()), no problem.
>
>
Bela, you mean GlobalConfiguration.transport().nodeName(), right? :)



> But I've said (many times) that relying on node names is a *bad thing* !
> Node names are syntactic sugar, and there may not be a name associated
> with a node, and then it has to be fetched dynamically, using an ARP
> like protocol.
>
>
So if a thread logging the name of a node it may trigger an "RPC" and it'll
block until it gets a response?
That doesn't sound right...


> If someone wanted to add information to an address, then the way to do
> it would be to use an AddressGenerator and return subclasses of UUID,
> e.g. PayloadUUID.
>
>
I think Erik just wants the logs to contain the actual host names, so that
they match with the logs from other sources. Using a PayloadUUID and
sending the host name with every message would be overkill for that.


> > If this discussion is limited in scope to internal applications, that's
> > fine.  If not, having done debugging of issues on live customer sites, I
> > think it's fine the way it is.
> >
>

Erik, the discussion is by no means limited to internal applications.
However...

1. Replacing the random number with a counter would require the user to
configure a persistent location for the counter, so even if we implement it
we have to make it opt-in.

2. The discussion about the host name part only applies if the user doesn't
specify a node name in the Infinispan configuration.



>  > Erik
> >
> > -Original Message-
> > From: infinispan-dev-boun...@lists.jboss.org
> > [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of Bela Ban
> > Sent: Tuesday, March 05, 2013 1:50 AM
> > To: infinispan-dev@lists.jboss.org
> > Subject: Re: [infinispan-dev] [infinispan-internal] Unstable Cluster
> >
> >
> >
> > On 3/4/13 6:35 PM, Dan Berindei wrote:
> >>
> >> On Mon, Mar 4, 2013 at 10:28 AM, Bela Ban  >> <mailto:b...@redhat.com>> wrote:
> >>
> >>  Another node: in general, would it make sense to use shorter names
> ?
> >>  E.g. instead of
> >>
> >>  ** New view: [jdg-perf-01-60164|9] [jdg-perf-01-60164,
> >>  | jdg-perf-01-24167, jdg-perf-01-53841, jdg-perf-01-39558,
> >>  | jdg-perf-01-8977, jdg-perf-01-49115, jdg-perf-01-24774,
> >>  | jdg-perf-01-5758, jdg-perf-01-37137, jdg-perf-01-45330,
> >>  | jdg-perf-01-24793, jdg-perf-01-35602, jdg-perf-02-7751,
> >>  | jdg-perf-02-37056, jdg-perf-02-50381, jdg-perf-02-53449,
> >>  | jdg-perf-02-64954, jdg-perf-02-34066, jdg-perf-02-61515,
> >>  | jdg-perf-02-65045 ...]
> >>
> >>
> >>  we could have
> >>  ** New view: [1|9] [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
> 15,
> >>  16, 17, 18, 19, 20, ...]
> >>
> >>  This makes reading logs *much* easier than having those long names.
> >>
> >>
> >> Yes and no... I sometimes find it useful to have a somehow longer
> >> name, as searching/filtering for a node in the log is pretty much
> >> impossible with a name like "1".
> >
> >
> > Yes, however if you have context it's not an issue, e.g. in JGroups I
> > oftentimes prefix my messages with name:. So, for example, if you're
> looking
> > for unicast traffic received by 5, this would work "5: <--" as grep
> > argument.
> >
> > I agree this isn't useful when you want to follow the address of a member
> > throughout the log, and all protocols.
> >
> >> I also think we need the random number at the end so that we can debug
> >> problems with node restarts. In JDG/AS7 they don't add a random
> >> number, and it was very difficult to see what was happening when a
> >> node name appeared twice in the consistent hash. But we could make the
> >> random number shorter.
> >
> >
> > Would maintaining a base name (A) and then incrementing a short help ?
> > E.g. A1, when restarted A2 ? The problem is that we'd have to store the
> > number on disk...
> >
> >
> >>  If we wanted the host name to be part of a cluster name, we could
> use
> >>  the alphabet, e.g. A=jdk-perf-01, B=jdg-p

Re: [infinispan-dev] old workaround in CacheImpl

2013-03-12 Thread Dan Berindei

I think the field was needed because InboundInvocationHandlerImpl was using
ComponentRegistry.getComponent(ResponseGenerator.class), and there wasn't
anyone actually creating the ResponseGenerator component.

Since https://issues.jboss.org/browse/ISPN-1793, ComponentRegistry creates
the ResponseGenerator component explicitly, so the field in CacheImpl is no
longer needed.



On Tue, Mar 12, 2013 at 1:53 PM, Mircea Markus  wrote:

> git annotate points to Mr. Surtani :-)
>
> On 12 Mar 2013, at 11:43, Adrian Nistor wrote:
>
> > And this is how it looked in 5.1.x
> >
> https://github.com/anistor/infinispan/blob/5.1.x/core/src/main/java/org/infinispan/CacheImpl.java#L139
> >
> > On 03/12/2013 01:40 PM, Adrian Nistor wrote:
> >> Hi,
> >>
> >> does anyone know what issue is the unused (but injected)
> >> CacheImpl.responseGenerator field supposed to cure? See here:
> >>
> https://github.com/anistor/infinispan/blob/master/core/src/main/java/org/infinispan/CacheImpl.java#L139
> >>
> >>
> >> The accompanying comment does not seem to be valid anymore. There is no
> >> jira for it and the tests run fine without it. Can't we just remove it?
> >>
> >> Cheers
> >> ___
> >> infinispan-dev mailing list
> >> infinispan-dev@lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Conditional cache operations useless with Optimistic locking?

2013-03-12 Thread Dan Berindei

On Tue, Mar 12, 2013 at 1:32 PM, Galder Zamarreño  wrote:

>
> On Mar 8, 2013, at 1:50 PM, Sanne Grinovero  wrote:
>
> > Hi Galder,
> > I think using conditional operations is very useful even with
> > optimistic locking: the single conditional operation might not make
> > sense, but a transaction might include more operations and some of
> > these operations might depend on the result of the conditional
> > operation.
> >
> > I'd expect the conditional operation to only return the value based on
> > current state (a prediction), and the transaction would fail if this
> > value is no longer valid at commit time. So no locks need to be taken
> > during the evaluation.
>
> ^ That's indeed what's happening, but as you can see, it confuses users
> (Jim, are you there?)…
>
> And I can see why they get confused. The conditional operations, such as
> replace, are rooted in enabling CAS-like operation, and an attempt to
> replace a value in a collection without having to synchronize or use locks…
>
> You can do what you say above with put/get operations. It will create more
> boiler plate code for sure, but the expectations of what the code does
> might be easier to understand for users.
>
> So, on one side, conditional operations can help reduce the code-size (by
> doing multiple operations in one go), but users expect them to behave in a
> way which does not really happen when you use transactions + OL.
>
> I'm in two minds on this…
>
>
I remember having a discussion about conditional operations with optimistic
locking a long time ago (Edinburgh?) and concluding that they should be
allowed to "lie" - pretend that they did the put/replace/remove and check
again on prepare if the condition still holds (with write skew check
enabled).

I think the other option on the table was treat conditional operations as
"non-transactional" - if putIfAbsent(k, v) returned true, then v would be
stored in the cache even if the current transaction was rolled back.

I don't remember if we considered acquiring locks for conditional
operations even with optimistic locking at the time or not. It would be a
bit of a hack, but it would make a lot more sense than the current
behaviour with the default configuration (optimistic locking with write
skew check disabled, which makes conditional operations pretty much
useless).

Off-topic: is it just me, or is it a little odd that the locking mode is
configured via TransactionConfigurationBuilder and the isolation
level/write skew check are configured via LockingConfigurationBuilder?

Cheers
Dan


Cheers,
>
> >
> > Sanne
> >
> > On 6 March 2013 14:45, Galder Zamarreño  wrote:
> >> Hi,
> >>
> >> Re: https://issues.jboss.org/browse/ISPN-2891
> >>
> >> Not sure what previous Infinispan version the AS instance Immutant guys
> used (5.1?), but seems like they're having more issues with the clojure
> test linked in the JIRA.
> >>
> >> Again, we're talking about issues with replace(), but in a single node
> environment. They seem to have issues with PL and OL, but let's focus on OL
> for which there are logs attached.
> >>
> >> Do conditional operations make any sense at all with OL? For example,
> can the return of replace() be taken a truthful in a transaction with OL?
> >>
> >> As shown in the bad.log, this is not possible because lock acquisition
> and write skew checks are only done when the transaction is committed. So,
> what's the point of the conditional operations with OL? Their returns
> provide no guarantees whatsoever.
> >>
> >> If this is known thing, I'm not aware of any docu on the topic.
> >>
> >> Thoughts?
> >>
> >> Cheers,
> >> --
> >> Galder Zamarreño
> >> gal...@redhat.com
> >> twitter.com/galderz
> >>
> >> Project Lead, Escalante
> >> http://escalante.io
> >>
> >> Engineer, Infinispan
> >> http://infinispan.org
> >>
> >>
> >> ___
> >> infinispan-dev mailing list
> >> infinispan-dev@lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarreño
> gal...@redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Infinispan Server repository

2013-03-12 Thread Dan Berindei

Cool! When are we moving the server modules to the infinispan-server
project? ;)


On Tue, Mar 12, 2013 at 10:10 AM, Tristan Tarrant wrote:

> Hi devs,
>
> as we agreed a while ago, I have renamed the infinispan/jdg repository
> on GitHub to infinispan/infinispan-server.
> I will also issue pull requests soon to remove any references to jdg
> from the code/docs/examples.
>
> Please update your local remotes to point to:
>
> g...@github.com:infinispan/infinispan-server.git
>
> Tristan
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] DefaultExecutorFactory and rejection policy

2013-03-14 Thread Dan Berindei

On Thu, Mar 14, 2013 at 9:32 AM, Radim Vansa  wrote:

> Blocking OOB threads is the thing we want to avoid, remember?
>

Well, you have to block somewhere...

I like Adrian's solution, because it's a lot better than CallerRunsPolicy:
it's blocking the OOB thread until any other command finishes executing,
not until one particular command finishes executing.


I am used do synchronous replication, there we could hold a semaphore on
> requests to each node. Going down before each RPC and up after that would
> be pretty simple and having this mechanism in RPC layer it would block only
> request, not response messages.


I don't think you can throttle on the sender, because you don't know how
many threads the recipient should allocate per sender in the Infinispan
thread pool.

E.g. in non-tx concurrent mode, for each user-initiated request, the
primary owner sends out numOwners requests to the other nodes. If the cache
is also replicated, you have the coordinator sending clusterSize requests
to the other nodes for each user-initiated request. (So for each request
that a node receives from the coordinator, it will receive exactly 0
requests from the other nodes.)

If you start considering that each cache should have its own semaphore,
because they need different numbers of threads, then finding an appropriate
initial value is pretty much impossible.


Are there any shoot & forget messages sent from Infinispan which could not
> be applied in this way?
> Regarding the async replication, I am not sure whether the request is just
> processed in non-caller thrad or if there is anything more.
>
>
Async messages require ordering most of the time, so they're handled on the
Incoming thread pool directly and they're not passed to the Infinispan
thread pool (at least for now).



> Radim
>
> - Original Message -
> | From: "Adrian Nistor" 
> | To: "infinispan -Dev List" 
> | Sent: Wednesday, March 13, 2013 3:37:23 PM
> | Subject: Re: [infinispan-dev] DefaultExecutorFactory and rejection policy
> |
> | What about creating a RejectedExecutionHandler that blocks the
> | calling
> | thread if the thread pool has reached the max (and all threads are
> | busy)
> | and the associated queue is also full? See
> | org.hibernate.search.batchindexing.impl.Executors.BlockPolicy
> |
> | Some time ago I was thinking we need this for infinispan's thread
> | pools
> | too because of some very insidious test suite failures caused by
> | dropped
> | tasks and created ISPN-2438 but never got to experiment with this.
> | The
> | solution used for the moment was to increase the pool a bit for unit
> | tests...
> |
> | Cheers
> |
> | On 03/13/2013 04:18 PM, Pedro Ruivo wrote:
> | > By default, the thread pools are bounded and we have nothing to
> | > prevent
> | > the it to be full.
> | >
> | > currently, the regular message are not processed by the thread
> | > pool.
> | > this can be solved when a smarter dispatcher/ordering mechanism is
> | > implemented.
> | >
> | > On 03/13/2013 02:07 PM, Bela Ban wrote:
> | >> You mean an unbounded queue ? This would be bad if the insertion
> | >> rate of
> | >> messages is higher than the processing rate.
> | >>
> | >> The problem with the Infinispan pool is that we cannot simply
> | >> discard
> | >> messages; they won't get retransmitted, as they're been delivered
> | >> by
> | >> JGroups.
> | >>
> | >> If we end up with our own ordering of messages in the ISPN pool,
> | >> then a
> | >> rejection policy of "run" would also cause harm, as it would
> | >> destroy the
> | >> ordering (except for OOB messages)...
> | >>
> | >> On 3/13/13 2:52 PM, Erik Salter wrote:
> | >>> Any reason the messages can't be queued?   In my mind, the
> | >>> benefits would
> | >>> outweigh the unbounded nature of it.
> | >>>
> | >>> Erik
> | >>>
> | >>> -Original Message-
> | >>> From: infinispan-dev-boun...@lists.jboss.org
> | >>> [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of
> | >>> Pedro Ruivo
> | >>> Sent: Wednesday, March 13, 2013 9:29 AM
> | >>> To: ispn-dev
> | >>> Subject: [infinispan-dev] DefaultExecutorFactory and rejection
> | >>> policy
> | >>>
> | >>> Hi
> | >>>
> | >>> I'm working on ISPN-2808
> | >>> (https://issues.jboss.org/browse/ISPN-2808) and I
> | >>> noticed that the DefaultExecutorFactory is creating the executor
> | >>> service
> | >>> with an Abortpolicy.
> | >>>
> | >>> Is there any particular reason for that?
> | >>>
> | >>> In the new thread pool needed by ISPN-2808, I cannot have the
> | >>> messages (i.e.
> | >>> the runnables) discarded, because it can cause some inconsistent
> | >>> state and
> | >>> even block all the cluster.
> | >>>
> | >>> I have set in my branch a CallerRunPolicy. If you see any issue
> | >>> with this
> | >>> let me know.
> | >>>
> | >>> Cheers,
> | >>> Pedro
> | >>
> | > ___
> | > infinispan-dev mailing list
> | > infinispan-dev@lists.jboss.org
> | > https://lists.jboss.org/mailman/listinfo/infini

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Dan Berindei

Hi Sanne

On Tue, Mar 19, 2013 at 4:12 PM, Sanne Grinovero wrote:

> Mircea,
> what I was most looking forward was to you comment on the interceptor
> order generated for DIST+cachestores
>  - we don't think the ClusteredCacheLoader should be needed at all
>

Agree, ClusteredCacheLoader should not be necessary.

James, if you're still seeing problems with numOwners=1, could you create
an issue in JIRA?



>  - each DIST node is loading from the CacheLoader (any) rather than
> loading from its peer nodes for non-owned entries (!!)
>
>
Sometimes loading stuff from a local disk is faster than going remote, e.g.
if you have numOwners=2 and both owners have to load the same entry from
disk and send it to the originator twice.

Still, most of the time the entry is going to be in memory on the owner
nodes, so the local load is slower (especially with a shared cache store,
where loading is over the network as well).



> This has come up on several threads now and I think it's critically
> wrong, as I commented previously this also introduces many
> inconsistencies - as far as I understand it.
>
>
Is there a JIRA for this already?

Yes, loading a stale entry from the local cache store is definitely not a
good thing, but we actually delete the non-owned entries after the initial
state transfer. There may be some consistency issues if one uses a
DIST_SYNC cache with a shared async cache store, but fully sync
configurations should be fine.

OTOH, if the cache store is not shared, the chances of finding the entry in
the local store on a non-owner are slim to none, so it doesn't make sense
to do the lookup.

Implementation-wise, just changing the interceptor order is probably not
enough. If the key doesn't exist in the cache, the CacheLoaderInterceptor
will still try to load it from the cache store after the remote lookup, so
we'll need a marker  in the invocation context to avoid the extra cache
store load. Actually, since this is just a performance issue, it could wait
until we implement tombstones everywhere.



> BTW your gist wouldn't work, the metadata cache needs to load certain
> elements too. But nice you spotted the need to potentially filter what
> "preload" means in the scope of each cache, as the metadata one should
> only preload metadata, while in the original configuration this data
> would indeed be duplicated.
> Opened: https://issues.jboss.org/browse/ISPN-2938
>
> Sanne
>
> On 19 March 2013 11:51, Mircea Markus  wrote:
> >
> > On 16 Mar 2013, at 01:19, Sanne Grinovero wrote:
> >
> >> Hi Adrian,
> >> let's forget about Lucene details and focus on DIST.
> >> With numOwners=1 and having two nodes the entries should be stored
> >> roughly 50% on each node, I see nothing wrong with that
> >> considering you don't need data failover in a read-only use case
> >> having all the index available in the shared CacheLoader.
> >>
> >> In such a scenario, and having both nodes preloaded all data, in case
> >> of a get() operation I would expect
> >> either:
> >> A) to be the owner, hence retrieve the value from local in-JVM reference
> >> B) to not be the owner, so to forward the request to the other node
> >> having roughly 50% chance per key to be in case A or B.
> >>
> >> But when hitting case B) it seems that instead of loading from the
> >> other node, it hits the CacheLoader to fetch the value.
> >>
> >> I already had asked James to verify with 4 nodes and numOwners=2, the
> >> result is the same so I suggested him to ask here;
> >> BTW I think numOwners=1 is perfectly valid and should work as with
> >> numOwners=1, the only reason I asked him to repeat
> >> the test is that we don't have much tests on the numOwners=1 case and
> >> I was assuming there might be some (wrong) assumptions
> >> affecting this.
> >>
> >> Note that this is not "just" a critical performance problem but I'm
> >> also suspecting it could provide inconsistent reads, in two classes of
> >> problems:
> >>
> >> # non-shared CacheStore with stale entries
> >> If for non-owned keys it will hit the local CacheStore first, where
> >> you might expect to not find anything, so to forward the request to
> >> the right node. What if this node has been the owner in the past? It
> >> might have an old entry locally stored, which would be returned
> >> instead of the correct value which is owned on a different node.
> >>
> >> # shared CacheStore using write-behind
> >> When using an async CacheStore by definition the content of the
> >> CacheStore is not trustworthy if you don't check on the owner first
> >> for entries in memory.
> >>
> >> Both seem critical to me, but the performance impact is really bad too.
> >>
> >> I hoped to make some more tests myself but couldn't look at this yet,
> >> any help from the core team would be appreciated.
> > I think you have a fair point and reads/writes to the data should be
> coordinated through its owners both for performance and (more importantly)
> correctness.
> > Mind creating a JIRA for this?
> >
> >>

Re: [infinispan-dev] [infinispan-internal] async processing documentation (+ nice inconsistency scenario example)

2013-03-19 Thread Dan Berindei

On Tue, Mar 19, 2013 at 5:17 PM, Manik Surtani  wrote:

>
> On 19 Mar 2013, at 15:07, Sanne Grinovero  wrote:
>
> >
> >
> > - Original Message -
> >>
> >> On 19 Mar 2013, at 12:21, Mircea Markus  wrote:
> >>
> >>> On 19 Mar 2013, at 11:05, Sanne Grinovero wrote:
>  Does Marshalling really need to be performed in a separate thread
>  pool?
>  I think we have too many pools, too much context switching, and
>  situations like this one which should be avoided.
> 
>  We could document it  but all these details are making it very
>  hard to feel comfortable with, and for this specific use case I
>  wonder if there
>  is a strong benefit: plain serial operations seem so much cleaner
>  to me.
> >>> +1 for dropping it in 6.0. It isn't enabled by default and AFAIK it
> >>> created more confusion through the users than benefits.
> >>
> >> Why?  I don't agree.  If network transfer is the most expensive part
> >> of performing a write, then marshalling is the second-most
> >> expensive.  If you don't take the marshalling offline as well,
> >> you're only realising a part of the performance gains of using
> >> async.
> >
> > Of course. I didn't mean to put it on the thread of the invoker, I would
> expect
> > this to happen "behind the scenes" when using async, but in the same
> thread which
> > is managing the lower IO so to reduce both context switching and these
> weird
> > race conditions.. so removing the option only.
>
> Well, when using the same lower IO pool, while common sense, isn't as easy
> since it is a JGroups pool.  If we pass the marshaller itself into JGroups,
> the marshalling still happens online, and just the IO happening in a
> separate thread.  Also, JGroups allows you to register one marshaller and
> unmarshaller per channel - which doesn't work when you have a transport
> shared by multiple cache instances potentially on different class loaders.
>
> So yes, this can be done much better, but that means a fair few changes in
> JGroups such that:
>
> * Marshalling happens in the async thread (the same one that puts the
> message on the wire) rather than in the caller's thread
> * sendMessage() should accept a marshaller and unmarshaller per invocation
>
> Then we can drop this additional thread pool.
>
>
The upper-most protocol in the default stack is FRAG2, and it already needs
the serialized payload - it can't split an Object in 2 messages. Most other
protocols need at least the message size. So there's no way our payload is
going to get serialized only in the TP thread that actually puts the bytes
on the wire.

In fact, I would go the other way around. Because we have multiple
marshallers, I think it would be cleaner if we used MessageDispatcher
directly and did the request/response serialization in Infinispan.

I wouldn't recommend async marshalling anyway. The user must be very
careful not to modify the value object at any time after calling
cache.put(key, value), so to me using async marshalling is just asking for
trouble.

There are a couple places where I think we could save an async transport
thread, but I don't think either would make a perceptible change in
performance:
* Waiting for a response from the recipients is much slower than sending
the message. If RpcManagerImpl.invokeRemotelyInFuture just sent the message
and returned the JGroups Request object, I don't think we'd need the thread
pool there.
* We could also detect if the user invoked cache.putAsync and avoid using
an extra async transport thread when the cache is async.



> >
> >>
> >>> On top of that the number of pools is growing (5.3 adds another
> >>> pool in the scope of ISPN-2808).
> >>
> >> You can configure to use a single thread pool for all these tasks, if
> >> hanging on to multiple thread pools is too complex.
> >
> > I don't believe you can always do that, if you don't keep tasks isolated
> > in different pools deadlocks could happen. So unless you can come up with
> > a nice diagram and explain which ones are safe to share, it is very
> > complex to handle.
> >
>

If queueing is disabled and the caller runs tasks when the thread pool is
full, dependencies are not a problem. If queueing is enabled... yes,
dependencies are a big problem.

But I'm pretty sure you can have dependency cycles with 2 thread pools as
well, if both have queueing enabled.



> > Would be nice to have these discussions on the public mailing list.
>
> +1.  Adding infinispan-dev in cc.
>
> >
> > Sanne
> >
> >>
> >> - M
> >>
> >> --
> >> Manik Surtani
> >> ma...@jboss.org
> >> twitter.com/maniksurtani
> >>
> >> Platform Architect, JBoss Data Grid
> >> http://red.ht/data-grid
> >>
> >>
> >>
> >
>
> --
> Manik Surtani
> ma...@jboss.org
> twitter.com/maniksurtani
>
> Platform Architect, JBoss Data Grid
> http://red.ht/data-grid
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-de

Re: [infinispan-dev] CacheLoaders, Distribution mode and Interceptors

2013-03-19 Thread Dan Berindei

>

> > Implementation-wise, just changing the interceptor order is probably not
> enough. If the key doesn't exist in the cache, the CacheLoaderInterceptor
> will still try to load it from the cache store after the remote lookup, so
> we'll need a marker  in the invocation context to avoid the extra cache
> store load.
> if the key does't map to the local node it should trigger a remote get to
> owners (or allow the dist interceptor to do just that)
> > Actually, since this is just a performance issue, it could wait until we
> implement tombstones everywhere.
> Hmm, not sure i see the correlation between this and tombstones?
>
>
If the key doesn't exist in the cache at all, on any node, then the remote
lookup will return null and the CacheLoaderInterceptor will try to load it
from the local cache store again (assuming we move CacheLoaderInterceptor
after DistributionInterceptor). If DistributionInterceptor put a tombstone
in the invocation context for that key, CacheLoaderInterceptor could avoid
that extra cache store lookup.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Remote command smarter dispatcher (merge ISPN-2808 and ISPN-2849)

2013-03-19 Thread Dan Berindei

On Mon, Mar 18, 2013 at 6:09 PM, Pedro Ruivo  wrote:

> Hi all,
>
> To solve ISPN-2808 (avoid blocking JGroups threads in order to allow to
> deliver the request responses), I've created another thread pool to move
> the possible blocking commands (i.e. the commands that may block until
> some state is achieved).
>
> Problem description:
>
> With this solution, the new thread pool should be large in order to be
> able to handle the remote commands without deadlocks. The problem is
> that all the threads can be block to process the command that may
> unblock other commands.
>
> Example: a bunch of commands are blocked waiting for a new topology ID
> and the command that will increment the topology ID is in the thread
> pool queue.
>
> Solution:
>
> Use a smart command dispatcher, i.e., keep the command in the queue
> until we are sure that it will not wait for other commands. I've already
> implemented some kind of executor service (ConditionalExecutorService,
> in ISPN-2635 and ISPN-2636 branches, Total Order stuff) that only put
> the Runnable (more precisely a new interface called ConditionalRunnable)
> in the thread pool when it is ready to be processed. Creative guys, it
> may need a better name :)
>
> The ConditionalRunnable has a new method (boolean isReady()) that should
> return true when the runnable should not block.
>
> Example how to apply this to ISPN-2808:
>
> Most of the commands awaits for a particular topology ID and/or for lock
> acquisition.

Well, the original problem description was about the
DistributionInterceptor forwarding the command from the primary owner to
the backup owners and waiting for a response from them :)
The forwarding done by StateTransferInterceptor is also synchronous and can
block.

It's true that you can't check how long a remote call will take
beforehand...

> In this way, the isReady() implementation can be something
> like:
>
> isReady()
>   return commandTopologyId <= currentTopologyId && (for all keys; do if
> !lock(key).tryLock(); return false; done)
>
>
Shouldn't you release the locks you managed to lock already if one of the
lock acquisitions failed?

Actually you may want to release the locks even if lock acquisition did
succeed... In non-tx mode, the locks are owned by the current thread, so
you can't lock a key on one thread and unlock it on another (though you
could skip the check completely in non-tx mode). And in transactional mode,
you could have a deadlock because 2 txs lock the same keys in a different
order.

Which leads me to a different point, how would you handle deadlocks? With
pessimistic mode, if tx1 holds lock k1 and wants to acquire k2, but tx2
holds k2 and wants to acquire k1, will the LockCommands tx1:k2 and tx2:k1
ever be scheduled? In general, can we make the time that a
Lock/PrepareCommand spends in the ConditionalExecutorService queue count
against lockAcquisitionTimeout?

With this, I believe we can keep the number of thread low and avoid the
> thread deadlocks.
>
> Now, I have two possible implementations:
>
> 1) put a reference for StateTransferManager and/or LockManager in the
> commands, and invoke the methods directly (a little dirty)
>
> 2) added new method in the CommandInterceptor like: boolean
> preProcess(Command, InvocationContext). each interceptor will
> check if the command will block on it (returning false) or not (invoke
> the next interceptor). For example, the StateTransferInterceptor returns
> immediately false if the commandToplogyId is higher than the
> currentTopologyId and the *LockingIntercerptor will return false if it
> cannot acquire some lock.
>
> Any other suggestions? If I was not clear let me know.
>

TBH I would only check for the topology id before scheduling the commands
to the Infinispan thread pool, because checking the locks is very
complicated with all the configuration options that we have.

This is how I was imagining solving the lock problem: The OOB threads would
execute directly the commands that do not acquire locks (CommitCommand,
TxCompletionNotificationCommand) directly and submit the others to the ISPN
thread pool; if the command's topology id was higher than the current
topology id, the OOB thread would just stick it in a queue. A separate
thread would read only the commands with the current topology id from the
queue, and process them just like an OOB thread.

However... the CommitCommands may very well have a *lower* topology id by
the time they are executed, so the OOB/executor thread may well block while
waiting for the results of the forwarding RPC. Maybe we could work around
it by having StateTransferInterceptor submit a task with the command
forwarding only to the thread pool, but again it gets quite complicated. So
for the first phase I recommend handling only the topology id problem.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [infinispan-internal] async processing documentation (+ nice inconsistency scenario example)

2013-03-19 Thread Dan Berindei

> > * Marshalling happens in the async thread (the same one that puts the
> message on the wire) rather than in the caller's thread
> my understanding is that there's no such additional thread, but caller's
> thread goes to the network stack even for async calls. I think Bela can put
> some light on this.
>

If you use bundling, the bundler thread writes the bytes to the socket, not
the caller's thread. TCP also uses a special sender thread for each
connection, if use_sender_queues=true (the default).



> > * sendMessage() should accept a marshaller and unmarshaller per
> invocation
>
> There is a org.jgroups.Buffer that we pass to the org.jgroups.Message we
> send across, another, less intrusive way would be to write a lazy wrapper
> around it.
>
>
Still, FRAG2 needs the actual bytes of the message, and that's a long time
before the message gets passed to the bundler thread.

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Remote command smarter dispatcher (merge ISPN-2808 and ISPN-2849)

2013-03-20 Thread Dan Berindei

On Tue, Mar 19, 2013 at 11:55 PM, Pedro Ruivo  wrote:

>
>
> On 03/19/2013 08:41 PM, Dan Berindei wrote:
> >
> > On Mon, Mar 18, 2013 at 6:09 PM, Pedro Ruivo  > <mailto:pe...@infinispan.org>> wrote:
> >
> > Hi all,
> >
> > To solve ISPN-2808 (avoid blocking JGroups threads in order to allow
> to
> > deliver the request responses), I've created another thread pool to
> move
> > the possible blocking commands (i.e. the commands that may block
> until
> > some state is achieved).
> >
> > Problem description:
> >
> > With this solution, the new thread pool should be large in order to
> be
> > able to handle the remote commands without deadlocks. The problem is
> > that all the threads can be block to process the command that may
> > unblock other commands.
> >
> > Example: a bunch of commands are blocked waiting for a new topology
> ID
> > and the command that will increment the topology ID is in the thread
> > pool queue.
> >
> > Solution:
> >
> > Use a smart command dispatcher, i.e., keep the command in the queue
> > until we are sure that it will not wait for other commands. I've
> already
> > implemented some kind of executor service
> (ConditionalExecutorService,
> > in ISPN-2635 and ISPN-2636 branches, Total Order stuff) that only put
> > the Runnable (more precisely a new interface called
> ConditionalRunnable)
> > in the thread pool when it is ready to be processed. Creative guys,
> it
> > may need a better name :)
> >
> > The ConditionalRunnable has a new method (boolean isReady()) that
> should
> > return true when the runnable should not block.
> >
> > Example how to apply this to ISPN-2808:
> >
> > Most of the commands awaits for a particular topology ID and/or for
> lock
> > acquisition.
> >
> >
> > Well, the original problem description was about the
> > DistributionInterceptor forwarding the command from the primary owner to
> > the backup owners and waiting for a response from them :)
> > The forwarding done by StateTransferInterceptor is also synchronous and
> > can block.
> >
> > It's true that you can't check how long a remote call will take
> > beforehand...
> >
> > In this way, the isReady() implementation can be something
> > like:
> >
> > isReady()
> >return commandTopologyId <= currentTopologyId && (for all keys;
> do if
> > !lock(key).tryLock(); return false; done)
> >
> >
> > Shouldn't you release the locks you managed to lock already if one of
> > the lock acquisitions failed?
>
> you are right. I have to release the locks for the pessimist mode. In
> optimistic mode, the locks are only released with the rollback command.
> >
> > Actually you may want to release the locks even if lock acquisition did
> > succeed... In non-tx mode, the locks are owned by the current thread, so
> > you can't lock a key on one thread and unlock it on another (though you
> > could skip the check completely in non-tx mode). And in transactional
> > mode, you could have a deadlock because 2 txs lock the same keys in a
> > different order.
> Non-tx caches cannot use this optimization. I've seen that problem early
> today when I start debugging it.
> >
> > Which leads me to a different point, how would you handle deadlocks?
> > With pessimistic mode, if tx1 holds lock k1 and wants to acquire k2, but
> > tx2 holds k2 and wants to acquire k1, will the LockCommands tx1:k2 and
> > tx2:k1 ever be scheduled? In general, can we make the time that a
> > Lock/PrepareCommand spends in the ConditionalExecutorService queue count
> > against lockAcquisitionTimeout?
>
> If I got a DeadLockException, I will send it back immediately and
> release all the locks.
>
>
Hmmm, the default LockManager doesn't throw DeadlockDetectedExceptions, you
have to enable deadlockDetection explicitly in the configuration. I'm
guessing that's for performance reasons...

It's true that you would need a new LockManager.tryLock method anyway, so
you could implement deadlock detection in LockManagerImpl.tryLock. But
probably the same performance considerations would apply.



> The lockAcquisitionTimeout is a problem that I haven't solved yet.
>
> >
> > With this, I believe we can keep the number of thread low and avoid
> the
> > thread deadlocks.
> >
> > Now, I have two possible imp

Re: [infinispan-dev] DefaultExecutorFactory and rejection policy

2013-03-20 Thread Dan Berindei

On Thu, Mar 14, 2013 at 2:31 PM, Radim Vansa  wrote:

>
>
> - Original Message -
> | From: "Dan Berindei" 
> | To: "infinispan -Dev List" 
> | Sent: Thursday, March 14, 2013 10:03:10 AM
> | Subject: Re: [infinispan-dev] DefaultExecutorFactory and rejection policy
> |
> | On Thu, Mar 14, 2013 at 9:32 AM, Radim Vansa < rva...@redhat.com >
> | wrote:
> |
> |
> | | Blocking OOB threads is the thing we want to avoid, remember?
> |
> | Well, you have to block somewhere...
> |
> | I like Adrian's solution, because it's a lot better than
> | CallerRunsPolicy: it's blocking the OOB thread until any other
> | command finishes executing, not until one particular command
> | finishes executing.
>
> I don't like caller-runs policy either. OOB threads shouldn't be waiting
> for anything and executing a command within OOB thread could cause that. In
> our problem, it would only increase the OOB threadpool size by the ispn
> thread pool size and cause some overhead to it. We should always have some
> OOB threads able to process the responses.
>
>
I agree that the OOB threads should not block, as much as possible. The
problem is that the internal thread pool needs to have some limits, too,
and when we get to those limits we'll have to block. We can't discard the
message and wait for the sender to retransmit it, like JGroups does.

We can increase those limits a lot by using a queue instead of extra
threads, but queueing needs to be smart (e.g. can't queue a CommitCommand
when there is a PrepareCommand waiting for a lock).


 |
> | | ...
> |
> | I don't think you can throttle on the sender, because you don't know
> | how many threads the recipient should allocate per sender in the
> | Infinispan thread pool.
>
> You don't need to know how many threads exactly are executing on the
> receiver, because you would have an unbounded queue there which will,
> sooner or later, process the messages. The semaphore is there because of
> throttling so that you never overwhelm the recipient with requests.
>
>
You still need a number to initialize the semaphore...



> I don't insist on semaphore-like synchronization, but the receiver should
> provide some feedback that it's not able to process more messages and the
> sender should be responsible for abiding it. AND, the feedback should be
> provided in a way that the node is still able to process other messages.


Well, sticking all the commands in the ISPN thread pool queue and delaying
their processing would be giving feedback to the sender: while the command
is waiting in the queue, the thread that sent it can't send another command
:)



> If it is a jammed signal message broadcast after the queue length grows
> beyond some limit, and then all messages on senders should be postponed, or
> divine intervention, it's just a performance issue. But if there a
> situation where the node is not able to process any more messages (on
> JGroups level), the OOB issue won't be solved, because there may be a reply
> other message is waiting for that is never processed.
>
>
Even if we do implement a jammed signal, the other nodes will not respond
to the signal instantaneously. So we still have the possibility of not
having enough threads in the ISPN thread pool to process all the incoming
messages, or even enough room in the queue.


>
> |
> | E.g. in non-tx concurrent mode, for each user-initiated request, the
> | primary owner sends out numOwners requests to the other nodes. If
> | the cache is also replicated, you have the coordinator sending
> | clusterSize requests to the other nodes for each user-initiated
> | request. (So for each request that a node receives from the
> | coordinator, it will receive exactly 0 requests from the other
> | nodes.)
> |
> |
> | If you start considering that each cache should have its own
> | semaphore, because they need different numbers of threads, then
> | finding an appropriate initial value is pretty much impossible.
> |
>
> Sorry, I miss the point here. It's not necessary to be exactly able to
> tell how many messages may go from one node to another, you can just use a
> common sense to limit the maximum amount of request that should be
> processed concurrently between the two nodes until we say "Hey, give the
> poor node a break, it's not responding because it's busy and another
> message would not help right now".
>
>
My point is that there is no "common sense" limit for the maximum amount of
requests that should be processed concurrently between two nodes, because
some nodes send a lot more requests than others, and limiting them
artificially would just slow things down

Re: [infinispan-dev] How to run the testsuite?

2013-03-20 Thread Dan Berindei

The problem is that we still leak threads in almost every module, and that
means we keep a copy of the core classes (and all their dependencies) for
every module. Of course, some modules' dependencies are already oversized,
so keeping only one copy is already too much...

I admit I don't run the whole test suite too often either, but I recently
changed the Cloudbees settings to get rid of the OOM there. It uses about
550MB of permgen by the end of the test suite, without
-XX:+UseCompressedOops. These are the settings I used:

-server -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+UseParNewGC
-XX:+CMSClassUnloadingEnabled   -XX:NewRatio=4 -Xss500k -Xms100m -Xmx900m
-XX:MaxPermSize=700M


Cheers
Dan



On Wed, Mar 20, 2013 at 2:59 PM, Tristan Tarrant wrote:

> Sanne, turn on CompressedOops ? Still those requirements are indeed
> ridiculous.
>
> Tristan
>
> On 03/20/2013 01:27 PM, Sanne Grinovero wrote:
> > I'm testing master, at da5c3f0
> >
> > Just killed a run which was using
> >
> > java version "1.7.0_17"
> > Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
> > Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
> >
> > this time again an OOM (while I have 2GB !), last sign of life came
> > from the "Rolling Upgrade Tooling"
> >
> > I'm not going to merge/review any pull request until this works.
> >
> > Sanne
> >
> > On 20 March 2013 12:09, Mircea Markus  wrote:
> >> I've just run it on master and didn't get OOM. well I'm using osx. Are
> you running it on master or a particular branch? Which module crashes?
> >> e.g. pedro's ISPN-2808 adds quite some threads to the party - that's
> the reason it hasn't been integrated yet.
> >>
> >> On 20 Mar 2013, at 11:40, Sanne Grinovero wrote:
> >>
> >>> Hi all,
> >>> after reviewing some pull requests, I'm since a couple of days unable
> >>> to run the testsuite; since Anna's fixes affect many modules I'm
> >>> trying to run the testsuite of the whole project, as we should always
> >>> do but I admit I haven't done it in a while because of the core module
> >>> failures.
> >>>
> >>> So I run:
> >>> $ mvn -fn clean install
> >>>
> >>> using -fn to have it continue after the core failures.
> >>>
> >>> First attempt gave me an OOM, was running with 1G heap.. I'm pretty
> >>> sure this was good enough some months back.
> >>>
> >>> Second attempt slowed down like crazy, and I found a warning about
> >>> having filled the code cache size, so doubled it to 200M.
> >>>
> >>> Third attempt: OutOfMemoryError: PermGen space! But I'm running with
> >>> -XX:MaxPermSize=380M which should be plenty?
> >>>
> >>> This is :
> >>> java version "1.6.0_43"
> >>> Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
> >>> Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)
> >>>
> >>> MAVEN_OPTS=-Xmx2G -XX:MaxPermSize=380M -XX:+TieredCompilation
> >>> -Djava.net.preferIPv4Stack=true -Djgroups.bind_addr=127.0.0.1
> >>> -XX:ReservedCodeCacheSize=200M
> >>> -Dlog4j.configuration=file:/opt/infinispan-log4j.xml
> >>>
> >>> My custom log configuration just disables trace & debug.
> >>>
> >>> Going to try now with larger PermGen and different JVMs but it looks
> >>> quite bad.. any other suggestion?
> >>> (I do have the security limits setup properly)
> >>>
> >>> Sanne
> >>> ___
> >>> infinispan-dev mailing list
> >>> infinispan-dev@lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >> Cheers,
> >> --
> >> Mircea Markus
> >> Infinispan lead (www.infinispan.org)
> >>
> >>
> >>
> >>
> >>
> >> ___
> >> infinispan-dev mailing list
> >> infinispan-dev@lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] How to run the testsuite?

2013-03-20 Thread Dan Berindei

On Wed, Mar 20, 2013 at 5:33 PM, Mircea Markus  wrote:

>
> On 20 Mar 2013, at 15:12, Dan Berindei wrote:
>
> > The problem is that we still leak threads in almost every module, and
> that means we keep a copy of the core classes (and all their dependencies)
> for every module. Of course, some modules' dependencies are already
> oversized, so keeping only one copy is already too much...
> do we have a JIRA for this?
>
>
There is https://issues.jboss.org/browse/ISPN-2477, but we probably need a
new issue to cover the test suite specifically.

>
> > I admit I don't run the whole test suite too often either, but I
> recently changed the Cloudbees settings to get rid of the OOM there. It
> uses about 550MB of permgen by the end of the test suite, without
> -XX:+UseCompressedOops. These are the settings I used:
> >
> > -server -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+UseParNewGC
> -XX:+CMSClassUnloadingEnabled   -XX:NewRatio=4 -Xss500k -Xms100m -Xmx900m
> -XX:MaxPermSize=700M
>
> Thanks for sharing this Dan!
>
> >
> >
> > Cheers
> > Dan
> >
> >
> >
> > On Wed, Mar 20, 2013 at 2:59 PM, Tristan Tarrant 
> wrote:
> > Sanne, turn on CompressedOops ? Still those requirements are indeed
> > ridiculous.
> >
> > Tristan
> >
> > On 03/20/2013 01:27 PM, Sanne Grinovero wrote:
> > > I'm testing master, at da5c3f0
> > >
> > > Just killed a run which was using
> > >
> > > java version "1.7.0_17"
> > > Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
> > > Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
> > >
> > > this time again an OOM (while I have 2GB !), last sign of life came
> > > from the "Rolling Upgrade Tooling"
> > >
> > > I'm not going to merge/review any pull request until this works.
> > >
> > > Sanne
> > >
> > > On 20 March 2013 12:09, Mircea Markus  wrote:
> > >> I've just run it on master and didn't get OOM. well I'm using osx.
> Are you running it on master or a particular branch? Which module crashes?
> > >> e.g. pedro's ISPN-2808 adds quite some threads to the party - that's
> the reason it hasn't been integrated yet.
> > >>
> > >> On 20 Mar 2013, at 11:40, Sanne Grinovero wrote:
> > >>
> > >>> Hi all,
> > >>> after reviewing some pull requests, I'm since a couple of days unable
> > >>> to run the testsuite; since Anna's fixes affect many modules I'm
> > >>> trying to run the testsuite of the whole project, as we should always
> > >>> do but I admit I haven't done it in a while because of the core
> module
> > >>> failures.
> > >>>
> > >>> So I run:
> > >>> $ mvn -fn clean install
> > >>>
> > >>> using -fn to have it continue after the core failures.
> > >>>
> > >>> First attempt gave me an OOM, was running with 1G heap.. I'm pretty
> > >>> sure this was good enough some months back.
> > >>>
> > >>> Second attempt slowed down like crazy, and I found a warning about
> > >>> having filled the code cache size, so doubled it to 200M.
> > >>>
> > >>> Third attempt: OutOfMemoryError: PermGen space! But I'm running with
> > >>> -XX:MaxPermSize=380M which should be plenty?
> > >>>
> > >>> This is :
> > >>> java version "1.6.0_43"
> > >>> Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
> > >>> Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)
> > >>>
> > >>> MAVEN_OPTS=-Xmx2G -XX:MaxPermSize=380M -XX:+TieredCompilation
> > >>> -Djava.net.preferIPv4Stack=true -Djgroups.bind_addr=127.0.0.1
> > >>> -XX:ReservedCodeCacheSize=200M
> > >>> -Dlog4j.configuration=file:/opt/infinispan-log4j.xml
> > >>>
> > >>> My custom log configuration just disables trace & debug.
> > >>>
> > >>> Going to try now with larger PermGen and different JVMs but it looks
> > >>> quite bad.. any other suggestion?
> > >>> (I do have the security limits setup properly)
> > >>>
> > >>> Sanne
> > >>> ___
> > >>> infinispan-dev mailing list
> > >>> infinispan-dev@list

Re: [infinispan-dev] Remote command smarter dispatcher (merge ISPN-2808 and ISPN-2849)

2013-03-20 Thread Dan Berindei

On Wed, Mar 20, 2013 at 12:59 PM, Pedro Ruivo  wrote:

>
>
> On 03/20/2013 07:53 AM, Dan Berindei wrote:
> >
> > On Tue, Mar 19, 2013 at 11:55 PM, Pedro Ruivo  > <mailto:pe...@infinispan.org>> wrote:
> >
> >
> >
> > On 03/19/2013 08:41 PM, Dan Berindei wrote:
> >  >
> >  > On Mon, Mar 18, 2013 at 6:09 PM, Pedro Ruivo
> > mailto:pe...@infinispan.org>
> >  > <mailto:pe...@infinispan.org <mailto:pe...@infinispan.org>>>
> wrote:
> >  >
> >  > Hi all,
> >  >
> >  > To solve ISPN-2808 (avoid blocking JGroups threads in order
> > to allow to
> >  > deliver the request responses), I've created another thread
> > pool to move
> >  > the possible blocking commands (i.e. the commands that may
> > block until
> >  > some state is achieved).
> >  >
> >  > Problem description:
> >  >
> >  > With this solution, the new thread pool should be large in
> > order to be
> >  > able to handle the remote commands without deadlocks. The
> > problem is
> >  > that all the threads can be block to process the command that
> may
> >  > unblock other commands.
> >  >
> >  > Example: a bunch of commands are blocked waiting for a new
> > topology ID
> >  > and the command that will increment the topology ID is in the
> > thread
> >  > pool queue.
> >  >
> >  > Solution:
> >  >
> >  > Use a smart command dispatcher, i.e., keep the command in the
> > queue
> >  > until we are sure that it will not wait for other commands.
> > I've already
> >  > implemented some kind of executor service
> > (ConditionalExecutorService,
> >  > in ISPN-2635 and ISPN-2636 branches, Total Order stuff) that
> > only put
> >  > the Runnable (more precisely a new interface called
> > ConditionalRunnable)
> >  > in the thread pool when it is ready to be processed. Creative
> > guys, it
> >  > may need a better name :)
> >  >
> >  > The ConditionalRunnable has a new method (boolean isReady())
> > that should
> >  > return true when the runnable should not block.
> >  >
> >  > Example how to apply this to ISPN-2808:
> >  >
> >  > Most of the commands awaits for a particular topology ID
> > and/or for lock
> >  > acquisition.
> >  >
> >  >
> >  > Well, the original problem description was about the
> >  > DistributionInterceptor forwarding the command from the primary
> > owner to
> >  > the backup owners and waiting for a response from them :)
> >  > The forwarding done by StateTransferInterceptor is also
> > synchronous and
> >  > can block.
> >  >
> >  > It's true that you can't check how long a remote call will take
> >  > beforehand...
> >  >
> >  > In this way, the isReady() implementation can be something
> >  > like:
> >  >
> >  > isReady()
> >  >return commandTopologyId <= currentTopologyId && (for all
> > keys; do if
> >  > !lock(key).tryLock(); return false; done)
> >  >
> >  >
> >  > Shouldn't you release the locks you managed to lock already if
> one of
> >  > the lock acquisitions failed?
> >
> > you are right. I have to release the locks for the pessimist mode. In
> > optimistic mode, the locks are only released with the rollback
> command.
> >  >
> >  > Actually you may want to release the locks even if lock
> > acquisition did
> >  > succeed... In non-tx mode, the locks are owned by the current
> > thread, so
> >  > you can't lock a key on one thread and unlock it on another
> > (though you
> >  > could skip the check completely in non-tx mode). And in
> transactional
> >  > mode, you could have a deadlock because 2 txs lock the same keys
> in a
> >  > different order.
> > Non-tx caches cannot use this optimization. I've seen that problem
> early
> > today wh

Re: [infinispan-dev] How to run the testsuite?

2013-03-21 Thread Dan Berindei

Sanne, you Xmx setting seems a bit excessive :)

Did you see it fail on your machine with 1GB regular heap and 700MB permgen?


On Thu, Mar 21, 2013 at 1:55 AM, Sanne Grinovero wrote:

> Thanks Dan,
> with the following options it completed the build:
>
> MAVEN_OPTS=-server -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> -XX:+UseParNewGC -XX:+CMSClassUnloadingEnabled -XX:NewRatio=4 -Xss500k
> -Xmx16G -Xms1G -XX:MaxPermSize=700M -XX:HeapDumpPath=/tmp/java_heap
> -Djava.net.preferIPv4Stack=true -Djgroups.bind_addr=127.0.0.1
> -XX:ReservedCodeCacheSize=200M
> -Dlog4j.configuration=file:/opt/infinispan-log4j.xml
>
> Sanne
>
> On 20 March 2013 17:36, Manik Surtani  wrote:
> >
> > On 20 Mar 2013, at 15:29, Adrian Nistor  wrote:
> >
> > I've also tried changing the fork mode of surefire from 'none' to 'once'
> and
> > the entire suite runs fine now on jvm 1.6 with 500mb MaxPermSize.
> > Previously I did not complete, 500mb was not enough.
> > Anyone knows why surefire was not allowed to fork?
> >
> > Haven't tried to analyze closely the heap yet but first thing I noticed
> is
> > 15% of it is occupied by 19 ComponentMetadataRepo instances, which
> > probably is not the root cause of this issue, but is odd anyway :).
> >
> >
> > Yes, very odd.  Do you also see 19 instances of a
> > GlobalComponentRegistry?
> >
> >
> > On 03/20/2013 05:12 PM, Dan Berindei wrote:
> >
> > The problem is that we still leak threads in almost every module, and
> that
> > means we keep a copy of the core classes (and all their dependencies) for
> > every module. Of course, some modules' dependencies are already
> oversized,
> > so keeping only one copy is already too much...
> >
> > I admit I don't run the whole test suite too often either, but I recently
> > changed the Cloudbees settings to get rid of the OOM there. It uses about
> > 550MB of permgen by the end of the test suite, without
> > -XX:+UseCompressedOops. These are the settings I used:
> >
> > -server -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+UseParNewGC
> > -XX:+CMSClassUnloadingEnabled   -XX:NewRatio=4 -Xss500k -Xms100m -Xmx900m
> > -XX:MaxPermSize=700M
> >
> >
> > Cheers
> > Dan
> >
> >
> >
> > On Wed, Mar 20, 2013 at 2:59 PM, Tristan Tarrant 
> > wrote:
> >>
> >> Sanne, turn on CompressedOops ? Still those requirements are indeed
> >> ridiculous.
> >>
> >> Tristan
> >>
> >> On 03/20/2013 01:27 PM, Sanne Grinovero wrote:
> >> > I'm testing master, at da5c3f0
> >> >
> >> > Just killed a run which was using
> >> >
> >> > java version "1.7.0_17"
> >> > Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
> >> > Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)
> >> >
> >> > this time again an OOM (while I have 2GB !), last sign of life came
> >> > from the "Rolling Upgrade Tooling"
> >> >
> >> > I'm not going to merge/review any pull request until this works.
> >> >
> >> > Sanne
> >> >
> >> > On 20 March 2013 12:09, Mircea Markus  wrote:
> >> >> I've just run it on master and didn't get OOM. well I'm using osx.
> Are
> >> >> you running it on master or a particular branch? Which module
> crashes?
> >> >> e.g. pedro's ISPN-2808 adds quite some threads to the party - that's
> >> >> the reason it hasn't been integrated yet.
> >> >>
> >> >> On 20 Mar 2013, at 11:40, Sanne Grinovero wrote:
> >> >>
> >> >>> Hi all,
> >> >>> after reviewing some pull requests, I'm since a couple of days
> unable
> >> >>> to run the testsuite; since Anna's fixes affect many modules I'm
> >> >>> trying to run the testsuite of the whole project, as we should
> always
> >> >>> do but I admit I haven't done it in a while because of the core
> module
> >> >>> failures.
> >> >>>
> >> >>> So I run:
> >> >>> $ mvn -fn clean install
> >> >>>
> >> >>> using -fn to have it continue after the core failures.
> >> >>>
> >> >>> First attempt gave me an OOM, was running with 1G heap.. I'm pretty
> >> >>> sure this was good en

Re: [infinispan-dev] Bye bye wrappers, ComparingConcurrentHashMapv8 is here (ISPN-2281)

2013-03-21 Thread Dan Berindei

On Thu, Mar 21, 2013 at 1:17 PM, Galder Zamarreño  wrote:

>
> On Mar 20, 2013, at 11:52 AM, Manik Surtani  wrote:
>
> >
> > On 18 Mar 2013, at 12:21, Galder Zamarreño  wrote:
> >
> >> This is why, I've created a new CHM, based on the CHMv8, called
> ComparingConcurrentHashMapv8 (thx Tristan for the name!). The work for this
> can be seen in:
> https://github.com/galderz/infinispan/commit/351e29d327d163ca8e941edf873f6d46b43cfae1
> >
> > Sounds good, but why not extend
> org.infinispan.util.concurrent.jdk8backported.ConcurrentHashMapV8?
>
> To be honest, I'm considering keeping only one ConcurrentHashMapV8 around,
> which had the Comparing functions pluggable…, and I might end up doing
> that. IOW, a ComparingConcurrentHashMapV8 instance created with
> ComparingObject function for both keys and values is functionality
> equivalent ConcurrentHashMapV8 with little/hardly impact.
>
> I originally wanted to keep two versions so that I could more easily port
> over changes in JSR-166 to
> org.infinispan.util.concurrent.jdk8backported.ConcurrentHashMapV8, and then
> pass this on to ComparingCHMv8… but I don't think it's worth it.
>
>
Interesting, the extra166y package had a CustomConcurrentHashMap [1] that
did exactly what you want, but Doug Lea didn't port it over to
ConcurrentHashMapV8. Guava too has a CustomConcurrentHashMap class, but
it's also based on the old ConcurrentHashMap code.

I think maintaining one ConcurrentHashMapV8 class is hard enough, and I
doubt there would be any performance difference between the two versions.
So I vote to keep only the Comparing version.

[1]
http://gee.cs.oswego.edu/dl/jsr166/dist/extra166ydocs/extra166y/CustomConcurrentHashMap.html
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [infinispan] ISPN-2962 Fix thread leaks in the core test suite (#1736)

2013-04-05 Thread Dan Berindei

I still think it won't make any change for the test suite...


On Fri, Apr 5, 2013 at 2:47 PM, Galder Zamarreño  wrote:

> Adrian/Dan,
>
> Can you add a JIRA to add these calls and then verify that no leftover
> threads are left for Arjuna?
>
> Thanks all! :)
>
> On Mar 27, 2013, at 5:52 PM, Jonathan Halliday <
> jonathan.halli...@redhat.com> wrote:
>
> >
> > I don't do tx any more. You need Tom or the jbossts@ list.
> >
> > RecoveryManager.manager().stop();
> > TransactionReaper.terminate(false);
> >
> > Jonathan.
> >
> > On 03/27/2013 04:42 PM, Galder Zamarreño wrote:
> >> Hey Jonathan,
> >>
> >> How's it going?
> >>
> >> We're seeing the following Arjuna threads still running when our
> >> Infinispan testsuite and we wondered whether:
> >> a) There's a way to disable them.
> >> b) Is there a way to shut them down when Infinispan caches stop.
> >>
> >> Cheers,
> >>
> >> Begin forwarded message:
> >>
> >>> *From: *Adrian Nistor  >>> >
> >>> *Subject: **Re: [infinispan] ISPN-2962 Fix thread leaks in the core
> >>> test suite (#1736)*
> >>> *Date: *March 27, 2013 2:09:06 PM GMT
> >>> *To: *infinispan/infinispan  >>> >
> >>> *Reply-To: *infinispan/infinispan
> >>> <
> reply+i-12400113-692fb20cc01b01d67beffc2275beeaf015f0361a-50...@reply.github.com
> >>>  reply+i-12400113-692fb20cc01b01d67beffc2275beeaf015f0361a-50...@reply.github.com
> >>
> >>>
> >>> I noticed 3 strange threads that still run after the suite:
> >>> com.arjuna.ats.internal.arjuna.recovery.Listener,
> >>> com.arjuna.ats.internal.arjuna.coordinator.ReaperThread and
> >>> com.arjuna.ats.internal.arjuna.coordinator.ReaperWorkerThread.
> >>>
> >>> Not sure what we can do about them or if they matter to us.
> >>>
> >>> —
> >>> Reply to this email directly or view it on GitHub
> >>> <
> https://github.com/infinispan/infinispan/pull/1736#issuecomment-15525193>.
> >>>
> >>
> >>
> >> --
> >> Galder Zamarreño
> >> gal...@redhat.com 
> >> twitter.com/galderz 
> >>
> >> Project Lead, Escalante
> >> http://escalante.io
> >>
> >> Engineer, Infinispan
> >> http://infinispan.org
> >>
> >
> > --
> > Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham (USA), Mark Hegarty (Ireland), Matt Parson
> > (USA), Charlie Peters (USA)
>
>
> --
> Galder Zamarreño
> gal...@redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] AdvancedCache.put with Metadata parameter

2013-04-08 Thread Dan Berindei

On Mon, Apr 8, 2013 at 1:44 PM, Galder Zamarreño  wrote:

>
> On Apr 8, 2013, at 12:35 PM, Galder Zamarreño  wrote:
>
> >
> > On Apr 8, 2013, at 11:17 AM, Manik Surtani  wrote:
> >
> >> All sounds very good. One important thing to consider is that the
> reference to Metadata passed in by the client app will be tied to the ICE
> for the entire lifespan of the ICE.  You'll need to think about a defensive
> copy or some other form of making the Metadata immutable (by the user
> application, at least) the moment it is passed in.
> >
> > ^ Excellent point, it could be a nightmare if users could change the
> metadata reference by the ICE at will. I'll have a think on how to best
> achieve this.
>
> ^ The metadata is gonna have to be marshalled somehow to ship to other
> nodes, so that could be a way to achieve it, by enforcing this somehow.
> When the cache receives it, it can marshaller/unmarshall it to make a copy
>
>
If Metadata is just an interface, nothing is stopping the user from
implementing maxIdle() to return Random.maxLong(). Besides, local caches
need to support Metadata as well, and we shouldn't force
serialization/deserialization for local caches.

So I think we'd be better off documenting that Metadata objects should not
change after they are inserted in the cache, just like keys and values.



> One way would be to make Metadata extend Serializable, but not keen on
> that. Another would be to somehow force the interface to define the
> Externalizer to use (i.e. an interface method like getExternalizer()), but
> that's akward when it comes to unmarshalling… what about forcing the
> Metadata object to be provided with a @SerializeWith annotation?
>
> Any other ideas?
>
>
Why force anything? I think Metadata instances should be treated just like
keys and values, so they should be able to use Externalizers (via
@SerializeWith), Serializable, or Externalizable, depending on the user's
requirements.



> >
> > Cheers,
> >
> >>
> >> On 8 Apr 2013, at 09:24, Galder Zamarreño  wrote:
> >>
> >>> Hi all,
> >>>
> >>> As mentioned in
> http://lists.jboss.org/pipermail/infinispan-dev/2013-March/012348.html,
> in paralell to the switch to Equivalent* collections, I was also working on
> being able to pass metadata into Infinispan caches. This is done to better
> support the ability to store custom metadata in Infinispan without the need
> of extra wrappers. So, the idea is that InternalCacheEntry instances will
> have a a reference to this Metadata.
> >>>
> >>> One of that metadata is version, which I've been using as test bed to
> see if clients could pass succesfully version information via metadata. As
> you already know, Hot Rod requires to store version information. Before,
> this was stored in a class called CacheValue alongside the value itself,
> but the work I've done in [1], this is passed via the new API I've added in
> [2].
> >>>
> >>> So, I'd like to get some thoughts on this new API. I hope that with
> these new put/replace versions, we can get rid of the nightmare which is
> all the other put/replace calls taking lifespan and/or maxIdle information.
> In the end, I think there should be two basic puts:
> >>>
> >>> - put(K, V)
> >>> - put(K, V, Metadata)
> >>>
> >>> And their equivalents.
> >>>
> >>> IMPORTANT NOTE 1: The implementation details are bound to change,
> because the entire Metadata needs to be stored in InternalCacheEntry, not
> just version, lifespan..etc. I'll further develop the implementation once I
> get into adding more metadata, i.e. when working on interoperability with
> REST. So, don't pay too much attention to the implementation itself, focus
> on the AdvancedCache API itself and let's refine that.
> >>>
> >>> IMPORTANT NOTE 2: The interoperability work in commit in [1] is WIP,
> so please let's avoid discussing it in this email thread. Once I have a
> more final version I'll send an email about it.
> >>>
> >>> Apart from working on enhancements to the API, I'm now carry on
> tackling the interoperability work with aim to have an initial version of
> the Embedded <-> Hot Rod interoperability as first step. Once that's in, it
> can be released to get early feedback while the rest of interoperability
> modes are developed.
> >>>
> >>> Cheers,
> >>>
> >>> [1]
> https://github.com/galderz/infinispan/commit/a35956fe291d2b2dc3b7fa7bf44d8965ffb1a54d
> >>> [2]
> https://github.com/galderz/infinispan/commit/a35956fe291d2b2dc3b7fa7bf44d8965ffb1a54d#L10R313
> >>> --
> >>> Galder Zamarreño
> >>> gal...@redhat.com
> >>> twitter.com/galderz
> >>>
> >>> Project Lead, Escalante
> >>> http://escalante.io
> >>>
> >>> Engineer, Infinispan
> >>> http://infinispan.org
> >>>
> >>>
> >>> ___
> >>> infinispan-dev mailing list
> >>> infinispan-dev@lists.jboss.org
> >>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >>
> >> --
> >> Manik Surtani
> >> ma...@jboss.org
> >> twitter.com/maniksurtani
> >>
> >> Platform Architect, JBoss Data Grid
> >>

Re: [infinispan-dev] AdvancedCache.put with Metadata parameter

2013-04-08 Thread Dan Berindei

On Mon, Apr 8, 2013 at 2:36 PM, Galder Zamarreño  wrote:

>
> On Apr 8, 2013, at 1:11 PM, Dan Berindei  wrote:
>
> >
> >
> >
> > On Mon, Apr 8, 2013 at 1:44 PM, Galder Zamarreño 
> wrote:
> >
> > On Apr 8, 2013, at 12:35 PM, Galder Zamarreño  wrote:
> >
> > >
> > > On Apr 8, 2013, at 11:17 AM, Manik Surtani 
> wrote:
> > >
> > >> All sounds very good. One important thing to consider is that the
> reference to Metadata passed in by the client app will be tied to the ICE
> for the entire lifespan of the ICE.  You'll need to think about a defensive
> copy or some other form of making the Metadata immutable (by the user
> application, at least) the moment it is passed in.
> > >
> > > ^ Excellent point, it could be a nightmare if users could change the
> metadata reference by the ICE at will. I'll have a think on how to best
> achieve this.
> >
> > ^ The metadata is gonna have to be marshalled somehow to ship to other
> nodes, so that could be a way to achieve it, by enforcing this somehow.
> When the cache receives it, it can marshaller/unmarshall it to make a copy
> >
> >
> > If Metadata is just an interface, nothing is stopping the user from
> implementing maxIdle() to return Random.maxLong(). Besides, local caches
> need to support Metadata as well, and we shouldn't force
> serialization/deserialization for local caches.
> >
> > So I think we'd be better off documenting that Metadata objects should
> not change after they are inserted in the cache, just like keys and values.
> >
> >
> > One way would be to make Metadata extend Serializable, but not keen on
> that. Another would be to somehow force the interface to define the
> Externalizer to use (i.e. an interface method like getExternalizer()), but
> that's akward when it comes to unmarshalling… what about forcing the
> Metadata object to be provided with a @SerializeWith annotation?
> >
> > Any other ideas?
> >
> >
> > Why force anything? I think Metadata instances should be treated just
> like keys and values, so they should be able to use Externalizers (via
> @SerializeWith), Serializable, or Externalizable, depending on the user's
> requirements.
>
> ^ I agree.
>
> What do you think of my suggestion in the other email to separate both
> concerns and somehow enforce a copy of the object to be provided instead?
>
>
I wrote my reply before I saw your other email :)

Having said that, I still think enforcing a copy doesn't make sense (see my
other comment).




> >
> >
> > >
> > > Cheers,
> > >
> > >>
> > >> On 8 Apr 2013, at 09:24, Galder Zamarreño  wrote:
> > >>
> > >>> Hi all,
> > >>>
> > >>> As mentioned in
> http://lists.jboss.org/pipermail/infinispan-dev/2013-March/012348.html,
> in paralell to the switch to Equivalent* collections, I was also working on
> being able to pass metadata into Infinispan caches. This is done to better
> support the ability to store custom metadata in Infinispan without the need
> of extra wrappers. So, the idea is that InternalCacheEntry instances will
> have a a reference to this Metadata.
> > >>>
> > >>> One of that metadata is version, which I've been using as test bed
> to see if clients could pass succesfully version information via metadata.
> As you already know, Hot Rod requires to store version information. Before,
> this was stored in a class called CacheValue alongside the value itself,
> but the work I've done in [1], this is passed via the new API I've added in
> [2].
> > >>>
> > >>> So, I'd like to get some thoughts on this new API. I hope that with
> these new put/replace versions, we can get rid of the nightmare which is
> all the other put/replace calls taking lifespan and/or maxIdle information.
> In the end, I think there should be two basic puts:
> > >>>
> > >>> - put(K, V)
> > >>> - put(K, V, Metadata)
> > >>>
> > >>> And their equivalents.
> > >>>
> > >>> IMPORTANT NOTE 1: The implementation details are bound to change,
> because the entire Metadata needs to be stored in InternalCacheEntry, not
> just version, lifespan..etc. I'll further develop the implementation once I
> get into adding more metadata, i.e. when working on interoperability with
> REST. So, don't pay too much attention to the implementation itself, focus
> on the AdvancedCache API itself and let's refine that.
> > >>>
> > >

Re: [infinispan-dev] query repl timeout

2013-04-09 Thread Dan Berindei

Hi Ales

I managed to start the app with 3 nodes on my laptop, and it inserted a
flight in about 26.7 seconds with TRACE enabled for org.infinispan.
However, when I counted the number of cache commands being executed and I
got 55000 (8700 of which went remote), which seems way too much for a
single insert. (The log file grew by more than 100 MB.)

I think there may be a cycle whereas each operation on a cache generates a
log message, which then triggers a change in the Lucene caches, which
writes another log message, and so on. How does CapeDwarf capture the logs?
I haven't seen any appender in the standard-capedwarf.xml configuration.
How can I enable TRACE logging for org.infinispan without the logs being
indexed by CapeDwarf?

On Mon, Apr 8, 2013 at 4:12 PM, Ales Justin  wrote:

> Steps to re-produce:
>
> (1) checkout JBossAS 7.2.0.Final tag --> JBOSS_HOME
>
> (2) build CapeDwarf Shared
>
> https://github.com/capedwarf/capedwarf-shared
>
> (3) build CapeDwarf Blue
>
> https://github.com/capedwarf/capedwarf-blue
>
> (4) build CapeDwarf AS
>
> https://github.com/capedwarf/capedwarf-jboss-as
>
> mvn clean install -Djboss.dir= -Pupdate-as
>
> This will install CapeDwarf Subsystem into previous AS 7.2.0.Final
>
> (5) grab GAE 1.7.6 SDK
>
> http://googleappengine.googlecode.com/files/appengine-java-sdk-1.7.6.zip
>
> (6) Build GAE demos/helloorm2
>
> ant
>
> cd war/
>
> zip -r ROOT.war .
>
> This will zip the demo app as ROOT.war,
> which you then deploy to AS.
>
> (7) start CapeDwarf
>
> JBOSS_HOME/bin
>
> ./standalone.sh -c standalone-capedwarf.xml -b  -Djboss.node.name
> =some_name
>
> (8) deploy the app / ROOT.war
>
> ---
>
> Deploy this on a few nodes, goto browser: http://,
> add a few flights and see how it works.
>
> It now runs a bit better, where we changed mstruk's laptop with luksa's.
> But we still get replication locks ...
>
> Also, the problem is that query on indexing slave takes waaay tooo long.
>
> Anyway, you'll see. ;-)
>
> Ping me for any issues.
>
> -Ales
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] data interoperability and remote querying

2013-04-11 Thread Dan Berindei

On Thu, Apr 11, 2013 at 1:30 PM, Manik Surtani  wrote:

>
> On 10 Apr 2013, at 20:57, Sanne Grinovero  wrote:
>
> > Right, let's keep this to collecting requirements:
>
> +1.  Ok, so it seems we're all pretty much in agreement that metadata
> extraction and indexing should happen on the server side and not on the
> client.  As I said before, this is good. Simple clients, support for
> re-indexing, support for changes in indexing characteristics, and the
> ability to save the world from AIDS.
>
> This puts a requirement on an efficient and portable serialisation format.
>  Again, +1 to starting with defining what we need.  Good start below, Sanne.
>
>
Besides the serialization format, how do we want to define the indexes on
the server?

Relying on Java classes with Lucene annotations on them doesn't sound like
it would support indexing changes very well, because each node would index
whatever annotations it had loaded at the moment. So I guess we need a
separate indexing configuration, modifiable at runtime, and with
annotations as a backup.



> > - being able to upgrade the server without losing data
> > - being able to change the (soft) schema on the server
> > - read/write fields from different languages
>
> - deal with multi-version control of values (i.e. being able to read
> > an older value through an evoluted schema, doing comparisons of same
> > value even if it was stored using different schema generations)
>
> I'd add:
>
> * Support for fast and easy translation to/from object model in high level
> language of choice (i.e., not manual parsing!  Maybe some form of tooling,
> like a Maven plugin, to generate "IDL"-esque format)
> * Serialisation efficiency (size and speed) should be considered
>
> And in addition, I'd also list out existing technologies that fulfil some
> or all of these requirements that we can consider, look at extending, etc.
>
>
I'd add support for random access for reads. If the user only needs to
index a Person's date of birth, it would be nice if we could read only the
dateOfBirth field and index that.



> - Manik
>
> --
> Manik Surtani
> ma...@jboss.org
> twitter.com/maniksurtani
>
> Platform Architect, JBoss Data Grid
> http://red.ht/data-grid
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Unexpected value returned from the cache

2013-04-11 Thread Dan Berindei

Have you tried with different JDK versions on Windows as well?
Did you have AggresiveOpts enabled?


On Thu, Apr 11, 2013 at 8:59 PM, Sanne Grinovero wrote:

> I could never replicate this, and now I suspect I know why: I just
> figured that all failing reports where coming from Windows machines.
>
> Does it ring any new bell?
>
> Even on windows, it doesn't happen under low load.
>
> Sanne
>
> On 9 April 2013 21:43, Mircea Markus  wrote:
> >
> > On 9 Apr 2013, at 17:28, Sanne Grinovero wrote:
> >
> >> Hi,
> >> I'm frequently experiencing problems with a stack as this one:
> >>
> >> Caused by: java.lang.ClassCastException: java.lang.Class cannot be cast
> to [B
> >> at
> org.infinispan.lucene.SingleChunkIndexInput.(SingleChunkIndexInput.java:49)
> >>
> >>
> >> The puzzling aspect is that the cast operation is applied on a type
> >> I'm retreiving from a Cache, which can not possibly be of a different
> >> type. I also happen to store some byte[] instances, but I never store
> >> a Class instance so I have no clue where this is coming from.
> >>
> >> I see these possible explanations:
> >> 1- Infinispan "forgets" to deserialize the object I'm requesting
> > it wouldn't return a Class object then.
> >> 2- It's picking the wrong Externalizer
> >> 3- the key is returning me a diffent object
> >> 4- I'm fooling myself with crap code
> >>
> >> Any known issue in the first 3 categories?
> > nothing I'm aware of.
> >>
> >> Please don't ask me for trace logs, when I enable those the problem
> >> doesn't happen.. I could try again with specific categories.
> > Enabling trace on
> org.infinispan.interceptors.InvocationContextInterceptor would tell what
> you add to the cache so we can validate/invalidate 4.
> >>
> >> Sanne
> >> ___
> >> infinispan-dev mailing list
> >> infinispan-dev@lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > Cheers,
> > --
> > Mircea Markus
> > Infinispan lead (www.infinispan.org)
> >
> >
> >
> >
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Dan Berindei

On Sat, Apr 13, 2013 at 3:02 AM, Sanne Grinovero wrote:

> that's right, as suggested by Emmanuel I plan to separate the JGroups
> Sync/Async options from the worker.execution property so you can play
> with the two independently.
> I think the JGroups option's default could depend on the backend - if
> not otherwise specified, and if we all agree it doesn't make it too
> confusing.
>
> @All, the performance problem seemed to be caused by a problem in
> JGroups, which I've logged here:
> https://issues.jboss.org/browse/JGRP-1617
>
> For the record, the first operation was indeed triggering some lazy
> initialization of indexes, which in turn would trigger a Lucene
> Directory being started, triggering 3 Cache starts which in turn would
> trigger 6 state transfer processes: so indeed the first operation
> would not be exactly "cheap" performance wise, still this would
> complete in about 120 milliseconds.
> The same cost is paid again when the second node is hit the first
> time, after that index write operations block the writer for <1ms (not
> investigated further on potential throughput).
>
> Not being sure about the options of depending to a newer JGroups
> release or the complexity of a fix, I'll implement a workaround in
> HSearch in the scope of HSEARCH-1296.
>
> As a lesson learned, I think we need to polish some of our TRACE level
> messaged to include the cache name:



Sanne, we already push the cache name in the NDC if trace is enabled for
the "entry point" of the thread. So for your purpose, I think enabling
trace for org.infinispan.interceptors.InvocationContextInterceptor and
including %x in your pattern layout should work.



> to resolve this we had not just
> many threads and components but also 4 of them where using JGroups
> (interleaving messages of all sorts) and 9 different caches where
> involved for each simple write operation in CD: made it interesting to
> figure what was going on! Also I'm wondering how hard it would be to
> have a log parser which converts my 10GB of text log from today in a
> graphical sequence diagram.
> Big thanks to Mircea who helped me figuring this out.
>
> Sanne
>
> On 12 April 2013 21:10, Ales Justin  wrote:
> > I think we need more fine-grained config for this new JGroups sync
> feature.
> >
> > I added this to our cache config
> >
> >  name="hibernate.search.default.worker.execution">async
> >
> > and it broke our tests.
> >
> > Where previous (old / non JGroups sync) behavior worked.
> >
> > It of course also works  without this async config,
> > but in this case we don't need sync / ACK JGroups message.
> > (we didn't have one before and it worked ;-)
> >
> > -Ales
> >
> > On Apr 11, 2013, at 11:51 PM, Sanne Grinovero 
> wrote:
> >
> >> There is a "blackhole" indexing backend, which pipes all indexing
> >> requests > /dev/null
> >>
> >> Set this as an Infinispan Query configuration property:
> >>
> >>default.worker.backend = blackhole
> >>
> >> Of course that means that the index will not be updated: you might
> >> need to adapt your test to tolerate that, but the point is not
> >> functional testing but to verify how much the SYNC option on the
> >> JGroups backend is actually slowing you down. I suspect the
> >> performance penalty is not in the network but in the fact you're now
> >> waiting for the index operations, while in async you where not waiting
> >> for them to be flushed.
> >>
> >> If you can identify which part is slow, then we can help you with
> >> better configuration options.
> >>
> >>
> >> On 11 April 2013 20:47, Ales Justin  wrote:
> >>> What do you mean?
> >>>
> >>> On Apr 11, 2013, at 21:41, Sanne Grinovero 
> wrote:
> >>>
> >>> You could try the new sync version but setting the blackhole backend
> on the
> >>> master node to remove the indexing overhead from the picture.
> >>>
> >>> On Apr 11, 2013 8:39 PM, "Sanne Grinovero" 
> wrote:
> 
>  Are you sure that the async version actually had applied all writes
> to the
>  index in the measured interval?
> 
>  On Apr 11, 2013 8:13 PM, "Ales Justin"  wrote:
> >
> > Although this change fixes query lookup,
> > it adds horrible performance:
> >
> > Running CapeDwarf cluster QueryTest:
> >
> > with HSEARCH-1296
> >
> > 21:00:27,188 INFO
> > [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
> > (http-/192.168.1.102:8080-1) HSEARCH000168: Serialization service
> Avro
> > SerializationProvider v1.0 being used for index
> > 'default_capedwarf-test__com.google.appengine.api.datastore.Entity'
> > 21:01:17,911 INFO  [org.jboss.web] (ServerService Thread Pool -- 49)
> > JBAS018224: Unregister web context: /capedwarf-tests
> >
> > 50sec
> >
> > old 4.2.0.Final HS
> >
> > 21:08:19,988 INFO
> > [org.hibernate.search.indexes.impl.DirectoryBasedIndexManager]
> > (http-/192.168.1.102:8080-2) HSEARCH000168: Serialization service
> Avro
> > Serial

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Dan Berindei

On Sat, Apr 13, 2013 at 2:42 PM, Sanne Grinovero wrote:

> On 13 April 2013 11:20, Bela Ban  wrote:
> >
> >
> > On 4/13/13 2:02 AM, Sanne Grinovero wrote:
> >
> >> @All, the performance problem seemed to be caused by a problem in
> >> JGroups, which I've logged here:
> >> https://issues.jboss.org/browse/JGRP-1617
> >
> >
> > Almost no information attached to the case :-( If it wasn't you, Sanne,
> > I'd outright reject the case ...
>
> I wouldn't blame you, and am sorry for the lack of details: as I said
> it was very late, still I preferred to share the observations we made
> so far.
>
> >From all the experiments we made - and some good logs I'll cleanup for
> sharing - it's clear that the thread is not woken up while the ACK was
> already received.
> And of course I wouldn't expect this to fail in a simple test as it
> wouldn't have escaped you ;-) or at least you would have had earlier
> reports.
>
> There are lots of complex moving parts in this scenario: from a Muxed
> JGroups Channel, and the Application Server responsible for
> initializing the stack with some added magic from CapeDwarf itself:
> it's not clear to me what configuration is exactly being used, for
> one.
>
>
Does CD also change the JGroups configuration? I thought it only tweaks the
Infinispan cache configuration on deployment, and the JGroups channel is
already started by the time the CD application is deployed.



> Without a testcase we might not be 100% sure but it seems likely to be
> an unexpected behaviour in JGroups, at least under some very specific
> setup.
>
>
I'm glad to help tracking down more details of what could trigger
> this, but I'm not too eager to write a full unit test for this as it
> involves a lot of other components, and by mocking my own components
> out I could still reproduce it: it's not Hibernate Search, so I'll
> need the help from the field experts.
>
> Also I suspect a test would need to depend on many more components: is
> JGroups having an easy way to manage dependencies nowadays?
>
> some more inline:
>
> >
> > The MessageDispatcher will *not* wait until the timeout kicks in, it'll
> > return as soon as it has acks from all members of the target set. This
> > works and is covered with a bunch of unit tests, so a regression would
> > have been caught immediately.
>
> I don't doubt the "vanilla scenario", but this is what happens in the
> more complex case of the CapeDwarf setup.
>
>
My first guess would be that the MuxRpcDispatcher on the second node hasn't
started yet by the time you call castMessage on the first node. It could be
that your workaround just delayed the message a little bit, until the
MuxRpcDispatcher on the other node actually started (because the JChannel
is already started on both nodes, but as long as the MuxRpcDispatcher isn't
started on the 2nd node it won't send any responses back).



> >
> > I attached a test program to JGRP-1617 which shows that this feature
> > works correctly.
> >
> > Of course, if you lose an ack (e.g. due to a maxed out incoming / OOB
> > thread pool), the unicast protocol will have to retransmit the ack until
> > it has been received. Depending on the unicast protocol you use, this
> > will be immediate (UNICAST, UNICAST3), or based on a stability interval
> > (UNICAST2).
>
> Right it's totally possible this is a stack configuration problem in the
> AS.
> I wouldn't be the best to ask that though, I don't even understand the
> configuration format.
>
>
You can get the actual JGroups configuration with
channel.getProtocolStack().printProtocolSpecAsXml(), but I wouldn't expect
you to find any surprises there: they should use pretty much the JGroups
defaults.

By default STABLE.desired_avg_gossip is 20s and STABLE.stability_delay is
6s, so even if the message was lost it should take < 30s for the message to
be resent.



>
> >> For the record, the first operation was indeed triggering some lazy
> >> initialization of indexes, which in turn would trigger a Lucene
> >> Directory being started, triggering 3 Cache starts which in turn would
> >> trigger 6 state transfer processes: so indeed the first operation
> >> would not be exactly "cheap" performance wise, still this would
> >> complete in about 120 milliseconds.
> >
> > This sounds very low for the work you describe above. I don't think 6
> > state transfers can be completed in 120ms, unless they're async (but
> > then that means they're not done when you return). Also, cache starts
> > (wrt JGroups) will definitely take more than a few seconds if you're the
> > first cluster node...
>
> It's a unit test: the caches are initially empty and networking is
> loopback,
> on the second round some ~6 elements are in the cache, no larger than
> ~10 character strings.
> Should be reasonable?
>
>
Yes, I think it's reasonable, if the JChannel was already started before
the CD application was deployed. Starting the first JChannel would take at
least 3s, which is the default PING.timeout.



> >> Not being sure abo

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Dan Berindei

On Mon, Apr 15, 2013 at 1:30 PM, Sanne Grinovero wrote:

> I've attached the logs the the JIRA.
>
> Some replies inline:
>
> On 15 April 2013 11:04, Dan Berindei  wrote:
> >
> >
> >
> > On Sat, Apr 13, 2013 at 2:42 PM, Sanne Grinovero 
> > wrote:
> >>
> >> On 13 April 2013 11:20, Bela Ban  wrote:
> >> >
> >> >
> >> > On 4/13/13 2:02 AM, Sanne Grinovero wrote:
> >> >
> >> >> @All, the performance problem seemed to be caused by a problem in
> >> >> JGroups, which I've logged here:
> >> >> https://issues.jboss.org/browse/JGRP-1617
> >> >
> >> >
> >> > Almost no information attached to the case :-( If it wasn't you,
> Sanne,
> >> > I'd outright reject the case ...
> >>
> >> I wouldn't blame you, and am sorry for the lack of details: as I said
> >> it was very late, still I preferred to share the observations we made
> >> so far.
> >>
> >> >From all the experiments we made - and some good logs I'll cleanup for
> >> sharing - it's clear that the thread is not woken up while the ACK was
> >> already received.
> >> And of course I wouldn't expect this to fail in a simple test as it
> >> wouldn't have escaped you ;-) or at least you would have had earlier
> >> reports.
> >>
> >> There are lots of complex moving parts in this scenario: from a Muxed
> >> JGroups Channel, and the Application Server responsible for
> >> initializing the stack with some added magic from CapeDwarf itself:
> >> it's not clear to me what configuration is exactly being used, for
> >> one.
> >>
> >
> > Does CD also change the JGroups configuration? I thought it only tweaks
> the
> > Infinispan cache configuration on deployment, and the JGroups channel is
> > already started by the time the CD application is deployed.
>
> CD uses a custom AS build and a custom AS configuration, so anything
> could be different.
> On top of that, some things are reconfigured programmatically by it.
>
>
Ales already cleared this out, CD doesn't change the JGroups config at all.


> >> Without a testcase we might not be 100% sure but it seems likely to be
> >> an unexpected behaviour in JGroups, at least under some very specific
> >> setup.
> >>
> >>
> >> I'm glad to help tracking down more details of what could trigger
> >> this, but I'm not too eager to write a full unit test for this as it
> >> involves a lot of other components, and by mocking my own components
> >> out I could still reproduce it: it's not Hibernate Search, so I'll
> >> need the help from the field experts.
> >>
> >> Also I suspect a test would need to depend on many more components: is
> >> JGroups having an easy way to manage dependencies nowadays?
> >>
> >> some more inline:
> >>
> >> >
> >> > The MessageDispatcher will *not* wait until the timeout kicks in,
> it'll
> >> > return as soon as it has acks from all members of the target set. This
> >> > works and is covered with a bunch of unit tests, so a regression would
> >> > have been caught immediately.
> >>
> >> I don't doubt the "vanilla scenario", but this is what happens in the
> >> more complex case of the CapeDwarf setup.
> >>
> >
> > My first guess would be that the MuxRpcDispatcher on the second node
> hasn't
> > started yet by the time you call castMessage on the first node. It could
> be
> > that your workaround just delayed the message a little bit, until the
> > MuxRpcDispatcher on the other node actually started (because the
> JChannel is
> > already started on both nodes, but as long as the MuxRpcDispatcher isn't
> > started on the 2nd node it won't send any responses back).
>
> Before the point in which Search uses the dispatcher, many more
> operations happened succesfully and with a reasonable timing:
> especially some transactions on Infinispan stored entries quickly and
> without trouble.
>
> Besides if such a race condition would be possible, I would consider
> it a critical bug.
>
>
I looked at the muxer code and I think they actually take care of this
already: MuxUpHandler returns a NoMuxHandler response when it can't find an
appropriate MuxedRpcDispatcher, and MessageDispatcher counts that response
against the number of expected responses. So it must be something else...

Re: [infinispan-dev] [hibernate-dev] HSEARCH-1296

2013-04-15 Thread Dan Berindei

Sorry for missing your message, Ales!

Anyway, good news, we found out why the test was taking so long: the
Message instance passed to dispatcher.cast() already had a destination
address set, and JGroups only sent the message to that address, even though
the dispatcher was waiting for a reply from the local node as well.

Sanne has verified that setting the destination in the message to null
fixes things, and I have been able to verify this by modifying Bela's test.

Sanne, a few notes:
1) The cast() call doesn't throw a TimeoutException if one of the targets
didn't reply - if you want to throw an exception, you need to check
wasReceived() on each element of the responses list.
2) For ChannelMessageSender as well, channel.send() may throw a
TimeoutException or not - depending on the value of
RSVP.throw_exception_on_timeout. Because of this and the potential conflict
with Infinispan on RSVP.ack_on_delivery, I would strongly recommend using
the DispatcherMessageSender all the time.
3) Because Infinispan calls channel.setDiscardOwnMessages(false) in
5.3.0.Alpha1, and channel.setDiscardOwnMessages(true) in all previous
versions, whether the local node receives a broadcast message depends on
the Infinispan version running on the same channel. If you don't actually
need the local node to process the message, you should use
options.setExclusionList(dispatcher.getChannel().getAddress()) to make it
obvious. If you do need the local node to process the message, you may need
to process the message yourself when channel.getDiscardOwnMessages()
returns true.

On Mon, Apr 15, 2013 at 3:44 PM, Ales Justin  wrote:

> Looking at your workaround, I think you actually set the response mode to
> GET_NONE (because that's the default value in RequestOptions), so you're
> back to sending an asynchronous request.
>
>
> That was my question as well:
>
> Shouldn't this "synchronous" flag still be used?
>
>
> https://github.com/Sanne/hibernate-search/blob/077f29c245d2d6e960cd6ab59ff58752320d5658/hibernate-search-engine/src/main/java/org/hibernate/search/backend/impl/jgroups/DispatcherMessageSender.java#L57
>
> e.g.
> if (synchronous) {
> int size = dispatcher.getChannel().getView().getMembers().size();
> RequestOptions options = RequestOptions.SYNC();
>  options.setRspFilter( new WaitAllFilter( size ) );
> } else {
> options = RequestOptions.ASYNC();
> }
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] ISPN-263 and handling partitions

2013-04-17 Thread Dan Berindei

On Wed, Apr 17, 2013 at 1:28 PM, Bela Ban  wrote:

> Well, first of all, we won't *have* any conflicting topology IDs, as the
> minority partitions don't change them after becoming minority.
>
>
We don't have the notion of "conflicting topology IDs" with the current
algorithm, either. After a merge, it doesn't matter which partition had the
highest topology id before, we just pick a topology id that we know wasn't
used in any of the partitions.

Then we assume that each node has latest data in the segments that it owned
in its pre-merge consistent hash. Obviously, if any value changed while the
partitions were separated, we would have lost consistency - hence the chaos
that Adrian mentioned.

> Secondly, we can end up with the coordinator of a minority partition
> becoming the coordinator of the new merged partition, so we shouldn't
> rely on that (but I don't think we do so anyway?).
>
>
No, we don't care what partition the merge coordinator was in, we tread all
partitions the same (with some extra work for overlapping partitions).

> On a merge, everyone knows whether it came from a minority or majority
> partition, and the algorithm for state transfer should always clear the
> state in members in the minority partition and overwrite it from members
> of the primary partition.
>

Actually, the merge coordinator is the only one that has to know which node
is from a minority or a majority partition.

I like the idea of always clearing the state in members of the minority
partition(s), but one problem with that is that there may be some keys that
only had owners in the minority partition(s). If we wiped the state of the
minority partition members, those keys would be lost.

Of course, you could argue that the cluster already lost those keys when we
allowed the majority partition to continue working without having those
keys... We could also rely on the topology information, and say that we
only support partitioning when numOwners >= numSites (or numRacks, if there
is only one site, or numMachines, if there is a single rack).

One other option is to perform a more complicated post-merge state
transfer, in which each partition sends all the data it has to all the
other partitions, and on the receiving end each node has a "conflict
resolution" component that can merge two values. That is definitely more
complicated than just going with a primary partition, though.

One final point... when a node comes back online and it has a local cache
store, it is very much as if we had a merge view. The current approach is
to join as if the node didn't have any data, then delete everything from
the cache store that is not mapped to the node in the consistent hash.
Obviously that can lead to consistency problems, just like our current
merge algorithm. It would be nice if we could handle both these cases the
same way.

> On 4/17/13 10:58 AM, Radim Vansa wrote:
> > And the nice behaviour is that if we have partitions P1 and P2 with
> latest common topology 20 , when P2 increased it's topology to, say 40,
> while P1 only to 30, when a new coordinator from P1 will be elected it will
> try to compare these topology ids directly (assuming which one is newer or
> older) which won't end up well.
> >
> > Radim
> >
> > - Original Message -
> > | From: "Adrian Nistor" 
> > | To: "infinispan -Dev List" 
> > | Cc: "Manik Surtani" 
> > | Sent: Wednesday, April 17, 2013 10:31:39 AM
> > | Subject: Re: [infinispan-dev] ISPN-263 and handling partitions
> > |
> > | In case of MergeView the cluster topology manager running on (the new)
> > | coordinator will request the current cache topology from all members
> and
> > | will compute a new topology as the union of all. The new topology id is
> > | computed as the max + 2 of the existing topology ids. Any currently
> > | pending rebalance in any subpartition is ended now and a new rebalance
> > | is triggered for the new cluster. No data version conflict resolution
> is
> > | performed => chaos :)
> > |
> > | On 04/16/2013 10:05 PM, Manik Surtani wrote:
> > | > Guys - I've started documenting this here [1] and will put together a
> > | > prototype this week.
> > | >
> > | > One question though, perhaps one for Dan/Adrian - is there any
> special
> > | > handling for state transfer if a MergeView is detected?
> > | >
> > | > - M
> > | >
> > | > [1]
> https://community.jboss.org/wiki/DesignDealingWithNetworkPartitions
> > | >
> > | > On 6 Apr 2013, at 04:26, Bela Ban  wrote:
> > | >
> > | >>
> > | >> On 4/5/13 3:53 PM, Manik Surtani wrote:
> > | >>> Guys,
> > | >>>
> > | >>> So this is what I have in mind for this, looking for opinions.
> > | >>>
> > | >>> 1.  We write a SplitBrainListener which is registered when the
> > | >>> channel connects.  The aim of this listener is to identify when we
> > | >>> have a partition.  This can be identified when a view change is
> > | >>> detected, and the new view is significantly smaller than the old
> > | >>> view.  Easier to detect for large

Re: [infinispan-dev] ISPN-263 and handling partitions

2013-04-18 Thread Dan Berindei

On Wed, Apr 17, 2013 at 5:53 PM, Manik Surtani  wrote:

>
> On 17 Apr 2013, at 08:23, Dan Berindei  wrote:
>
> I like the idea of always clearing the state in members of the minority
> partition(s), but one problem with that is that there may be some keys that
> only had owners in the minority partition(s). If we wiped the state of the
> minority partition members, those keys would be lost.
>
>
> Right, this is my concern with such a wipe as well.
>
> Of course, you could argue that the cluster already lost those keys when
> we allowed the majority partition to continue working without having those
> keys... We could also rely on the topology information, and say that we
> only support partitioning when numOwners >= numSites (or numRacks, if there
> is only one site, or numMachines, if there is a single rack).
>
>
> This is only true for an embedded app.  For an app communicating with the
> cluster over Hot Rod, this isn't the case as it could directly read from
> the minority partition.
>
>
For that to happen, the client would have to be able to keep two (or more)
active consistent hashes at the same time. I think, at least in the first
phase, the servers in a minority partition should send a "look somewhere
else" response to any request from the client, so that it installs the
topology update of the majority partition and not the topology of one of
the minority partitions.

One other option is to perform a more complicated post-merge state
> transfer, in which each partition sends all the data it has to all the
> other partitions, and on the receiving end each node has a "conflict
> resolution" component that can merge two values. That is definitely more
> complicated than just going with a primary partition, though.
>
>
> That sounds massively expensive.  I think the right solution at this point
> is entry versioning using vector clocks and the vector clocks are exchanged
> and compared during a merge.  Not the entire dataset.
>
>
True, it would be very expensive. I think for many applications just
selecting a winner should be fine, though, so it might be worth
implementing this algorithm with the versioning support we already have as
a POC.

One final point... when a node comes back online and it has a local cache
> store, it is very much as if we had a merge view. The current approach is
> to join as if the node didn't have any data, then delete everything from
> the cache store that is not mapped to the node in the consistent hash.
> Obviously that can lead to consistency problems, just like our current
> merge algorithm. It would be nice if we could handle both these cases the
> same way.
>
>
> +1
>
>

I've been thinking about this some more... the problem with the local cache
store is that nodes don't necessary start in the same order in which they
were shut down. So you might have enough nodes for a cluster to consider
itself "available", but only a slight overlap with the cluster as it looked
the last time it was "available" - so you would have stale data.

We might be able to save the topology on shutdown and block startup until
the same nodes that were in the last "available" partition are all up, but
it all sounds a bit fragile.

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] CHM or CHMv8?

2013-04-19 Thread Dan Berindei

+1 to make CHMv8 the default on JDK6 and JDK7

But I'm not convinced we should make it the default for JDK8 - even though
we don't know exactly what we're getting with the JDK's implementation.


On Fri, Apr 19, 2013 at 5:39 AM, David M. Lloyd wrote:

> On 04/18/2013 09:35 PM, Manik Surtani wrote:
> > Guys,
> >
> > Based on some recent micro benchmarks I've been doing, I've seen:
> >
> > MapStressTest configuration: capacity 10, test running time 60
> seconds
> > Testing mixed read/write performance with capacity 100,000, keys
> 300,000, concurrency level 32, threads 12, read:write ratio 0:1
> > Container CHM   Ops/s 21,165,771.67  Gets/s   0.00  Puts/s
> 21,165,771.67  HitRatio 100.00  Size262,682  stdDev 77,540.73
> > Container CHMV8 Ops/s 33,513,807.09  Gets/s   0.00  Puts/s
> 33,513,807.09  HitRatio 100.00  Size262,682  stdDev 77,540.73
> >
> > So under high concurrency (12 threads, on my workstation with 12
> hardware threads - so all threads are always working), we see that
> Infinispan's CHMv8 implementation is 50% faster than JDK6's CHM
> implementation when doing puts.
> >
> > We use a fair number of CHMs all over Infinispan's codebase.  By
> default, these are all JDK-provided CHMs.  But we have the option to switch
> to our CHMv8 implementation by passing in
> -Dinfinispan.unsafe.allow_jdk8_chm=true.
> >
> > The question is, should this be the default?  Thoughts, opinions?
>
> The JDK's concurrency code - especially CHM - changes all the time.
> You'd be very well-served, in my opinion, to go with something like
> CHMv8 just because you could be so much more sure that you'll have more
> consistent (and possibly better, but definitely more consistent)
> performance across all JVMs, instead of being at the mercy of whatever
> particular implementation happens to run on whatever JVM.
>
>
> --
> - DML
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] CHM or CHMv8?

2013-04-19 Thread Dan Berindei

Testing mixed read/write performance with capacity 10, keys 30,
concurrency level 32, threads 12, read:write ratio 99:1
Container CHM   Ops/s 5178894.77  Gets/s 5127105.82  Puts/s
51788.95  HitRatio  86.23  Size 177848  stdDev   60896.42
Container CHMV8 Ops/s 5768824.37  Gets/s 5711136.13  Puts/s
57688.24  HitRatio  84.72  Size 171964  stdDev   60249.99

The test is probably limited by the 1% writes, but I think it does show
that reads in CHMV8 are not slower than reads in OpenJDK7's CHM.
I haven't measured it, but the memory footprint should also be better,
because it doesn't use segments any more.

AFAIK the memoryCHMV8 also uses copy-on-write at the bucket level, but we
could definitely do a pure read test with a HashMap to see how big the
performance difference is.




On Fri, Apr 19, 2013 at 11:07 AM, Sanne Grinovero wrote:

> Why not. Only doubt I'd have is that other usages of the CHM are - I guess
> - services registry and similar configuration tools, for which write
> performance is irrelevant: your test measured puts, are there drawbacks on
> gets or memory usage?
>
> Recently you changed all (most?) CHM creations to use a consistent
> factory, maybe we could improve on that by actually using a couple of
> factories which differentiate on the intended usage of the CHM: for example
> some maps who change very infrequently - mostly during boot or
> reconfiguration, maybe even topology change - could be better served by a
> non concurrent structure using copy-on-wrtite.
>
> Sanne
> On 19 Apr 2013 08:48, "Dan Berindei"  wrote:
>
>> +1 to make CHMv8 the default on JDK6 and JDK7
>>
>> But I'm not convinced we should make it the default for JDK8 - even
>> though we don't know exactly what we're getting with the JDK's
>> implementation.
>>
>>
>> On Fri, Apr 19, 2013 at 5:39 AM, David M. Lloyd 
>> wrote:
>>
>>> On 04/18/2013 09:35 PM, Manik Surtani wrote:
>>> > Guys,
>>> >
>>> > Based on some recent micro benchmarks I've been doing, I've seen:
>>> >
>>> > MapStressTest configuration: capacity 10, test running time 60
>>> seconds
>>> > Testing mixed read/write performance with capacity 100,000, keys
>>> 300,000, concurrency level 32, threads 12, read:write ratio 0:1
>>> > Container CHM   Ops/s 21,165,771.67  Gets/s   0.00  Puts/s
>>> 21,165,771.67  HitRatio 100.00  Size262,682  stdDev 77,540.73
>>> > Container CHMV8 Ops/s 33,513,807.09  Gets/s   0.00  Puts/s
>>> 33,513,807.09  HitRatio 100.00  Size262,682  stdDev 77,540.73
>>> >
>>> > So under high concurrency (12 threads, on my workstation with 12
>>> hardware threads - so all threads are always working), we see that
>>> Infinispan's CHMv8 implementation is 50% faster than JDK6's CHM
>>> implementation when doing puts.
>>> >
>>> > We use a fair number of CHMs all over Infinispan's codebase.  By
>>> default, these are all JDK-provided CHMs.  But we have the option to switch
>>> to our CHMv8 implementation by passing in
>>> -Dinfinispan.unsafe.allow_jdk8_chm=true.
>>> >
>>> > The question is, should this be the default?  Thoughts, opinions?
>>>
>>> The JDK's concurrency code - especially CHM - changes all the time.
>>> You'd be very well-served, in my opinion, to go with something like
>>> CHMv8 just because you could be so much more sure that you'll have more
>>> consistent (and possibly better, but definitely more consistent)
>>> performance across all JVMs, instead of being at the mercy of whatever
>>> particular implementation happens to run on whatever JVM.
>>>
>>>
>>> --
>>> - DML
>>> ___
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>>
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] CHM or CHMv8?

2013-04-19 Thread Dan Berindei

On Fri, Apr 19, 2013 at 12:58 PM, Sanne Grinovero wrote:

> On 19 April 2013 10:37, Dan Berindei  wrote:
> > Testing mixed read/write performance with capacity 10, keys 30,
> > concurrency level 32, threads 12, read:write ratio 99:1
> > Container CHM   Ops/s 5178894.77  Gets/s 5127105.82  Puts/s
> > 51788.95  HitRatio  86.23  Size 177848  stdDev   60896.42
> > Container CHMV8 Ops/s 5768824.37  Gets/s 5711136.13  Puts/s
> > 57688.24  HitRatio  84.72  Size 171964  stdDev   60249.99
>
> Nice, thanks.
> >
> > The test is probably limited by the 1% writes, but I think it does show
> that
> > reads in CHMV8 are not slower than reads in OpenJDK7's CHM.
> > I haven't measured it, but the memory footprint should also be better,
> > because it doesn't use segments any more.
> >
> > AFAIK the memoryCHMV8 also uses copy-on-write at the bucket level, but we
> > could definitely do a pure read test with a HashMap to see how big the
> > performance difference is.
>
> By copy-on-write I didn't mean on the single elements, but on the
> whole map instance:
>
> private volatile HashMap configuration;
>
> synchronized addConfigurationProperty(String, String) {
>  HashMap newcopy = new HashMap( configuration ):
>  newcopy.put(..);
>  configuration = newcopy;
> }
>
> Of course that is never going to scale for writes, but if writes stop
> at runtime after all services are started I would expect that the
> simplicity of the non-threadsafe HashMap should have some benefit over
> CHM{whatever}, or it would have been removed already?
>
>
Right, we should be able to tell whether that's worth doing with a pure
read test with a CHMV8 and a HashMap :)

But I don't think that's going to yield any difference, because all the
copy-on-write in CHMV8 adds is a few volatile reads -  and volatile reads
are more or less free on x86.


> >
> >
> >
> >
> > On Fri, Apr 19, 2013 at 11:07 AM, Sanne Grinovero 
> > wrote:
> >>
> >> Why not. Only doubt I'd have is that other usages of the CHM are - I
> guess
> >> - services registry and similar configuration tools, for which write
> >> performance is irrelevant: your test measured puts, are there drawbacks
> on
> >> gets or memory usage?
> >>
> >> Recently you changed all (most?) CHM creations to use a consistent
> >> factory, maybe we could improve on that by actually using a couple of
> >> factories which differentiate on the intended usage of the CHM: for
> example
> >> some maps who change very infrequently - mostly during boot or
> >> reconfiguration, maybe even topology change - could be better served by
> a
> >> non concurrent structure using copy-on-wrtite.
> >>
> >> Sanne
> >>
> >> On 19 Apr 2013 08:48, "Dan Berindei"  wrote:
> >>>
> >>> +1 to make CHMv8 the default on JDK6 and JDK7
> >>>
> >>> But I'm not convinced we should make it the default for JDK8 - even
> >>> though we don't know exactly what we're getting with the JDK's
> >>> implementation.
> >>>
> >>>
> >>> On Fri, Apr 19, 2013 at 5:39 AM, David M. Lloyd <
> david.ll...@redhat.com>
> >>> wrote:
> >>>>
> >>>> On 04/18/2013 09:35 PM, Manik Surtani wrote:
> >>>> > Guys,
> >>>> >
> >>>> > Based on some recent micro benchmarks I've been doing, I've seen:
> >>>> >
> >>>> > MapStressTest configuration: capacity 10, test running time 60
> >>>> > seconds
> >>>> > Testing mixed read/write performance with capacity 100,000, keys
> >>>> > 300,000, concurrency level 32, threads 12, read:write ratio 0:1
> >>>> > Container CHM   Ops/s 21,165,771.67  Gets/s   0.00
>  Puts/s
> >>>> > 21,165,771.67  HitRatio 100.00  Size262,682  stdDev
> 77,540.73
> >>>> > Container CHMV8 Ops/s 33,513,807.09  Gets/s   0.00
>  Puts/s
> >>>> > 33,513,807.09  HitRatio 100.00  Size262,682  stdDev
> 77,540.73
> >>>> >
> >>>> > So under high concurrency (12 threads, on my workstation with 12
> >>>> > hardware threads - so all threads are always working), we see that
> >>>> > Infinispan's CHMv8 implementation is 50% faster than JDK6's CHM
> >>>> > implementation when doing puts.
&g

Re: [infinispan-dev] Classloading issue with multiple modules in AS7

2013-04-22 Thread Dan Berindei

Do you really need to set the classloader in all the cache configurations?
I thought it was enough to set it in the global configuration.


On Fri, Apr 19, 2013 at 7:02 PM, Sanne Grinovero wrote:

> It turns out this resource loading issue is biting also community users;
>
> I had worked aroud the problem for deploymens in the AS7 modular
> classloader by wrapping the configuration parser with the "right"
> classloader:
>
> https://github.com/hibernate/hibernate-search/blob/master/hibernate-search-infinispan/src/main/java/org/hibernate/search/infinispan/impl/InfinispanConfigurationParser.java
>
> But today on IRC I had to point to this class as an example to another
> user looking to run Infinispan in an isolated classloader.
>
> Maybe we should have an (optional) Parser API which takes explicit
> classloaders ?
>
> Sanne
>
> On 5 April 2013 12:52, Galder Zamarreño  wrote:
> > I'm not an expert on this (Paul, Rado, Richard should help more…), but
> to do what you're trying to do, I suspect there might be a need to export
> the .dat files somehow? I know there's a way to export the metadata in
> META-INF/services.
> >
> > Alternatively, you could let the Infinispan subsystem create the cache
> managers directly, by configuring them in advance in the standalone.xml or
> similar. That's what happens with Hibernate Core, which does not start its
> own cache manager, but simply looks it up based on what's been configured
> by default in the standalone.xml (there's a named cache manager for it).
> AFAIK.
> >
> > Take this with a pinch of salt (some details are blurry). The experts on
> these are really AS7 guys…
> >
> > Cheers,
> >
> > On Mar 27, 2013, at 9:22 PM, Sanne Grinovero 
> wrote:
> >
> >> When starting an EmbeddedCacheManager from a different module deployed
> >> in the AS, I get this stacktrace:
> >>
> >> Caused by: org.hibernate.search.SearchException: Unable to initialize
> >> directory provider:
> >> org.hibernate.search.test.integration.jbossas7.model.Member
> >>   at
> org.hibernate.search.store.impl.DirectoryProviderFactory.createDirectoryProvider(DirectoryProviderFactory.java:87)
> >>   at
> org.hibernate.search.indexes.impl.DirectoryBasedIndexManager.createDirectoryProvider(DirectoryBasedIndexManager.java:232)
> >>   at
> org.hibernate.search.indexes.impl.DirectoryBasedIndexManager.initialize(DirectoryBasedIndexManager.java:100)
> >>   at
> org.hibernate.search.indexes.impl.IndexManagerHolder.createIndexManager(IndexManagerHolder.java:227)
> >>   ... 19 more
> >> Caused by: org.infinispan.config.ConfigurationException:
> >> org.infinispan.CacheException: Unable to load component metadata!
> >>   at
> org.infinispan.manager.DefaultCacheManager.(DefaultCacheManager.java:386)
> >>   at
> org.infinispan.manager.DefaultCacheManager.(DefaultCacheManager.java:341)
> >>   at
> org.infinispan.manager.DefaultCacheManager.(DefaultCacheManager.java:328)
> >>   at
> org.hibernate.search.infinispan.CacheManagerServiceProvider.start(CacheManagerServiceProvider.java:93)
> >>   at
> org.hibernate.search.engine.impl.StandardServiceManager$ServiceProviderWrapper.startVirtual(StandardServiceManager.java:178)
> >>   at
> org.hibernate.search.engine.impl.StandardServiceManager.requestService(StandardServiceManager.java:124)
> >>   at
> org.hibernate.search.infinispan.impl.InfinispanDirectoryProvider.initialize(InfinispanDirectoryProvider.java:86)
> >>   at
> org.hibernate.search.store.impl.DirectoryProviderFactory.createDirectoryProvider(DirectoryProviderFactory.java:84)
> >>   ... 22 more
> >> Caused by: org.infinispan.CacheException: Unable to load component
> metadata!
> >>   at
> org.infinispan.factories.components.ComponentMetadataRepo.initialize(ComponentMetadataRepo.java:131)
> >>   at
> org.infinispan.factories.GlobalComponentRegistry.(GlobalComponentRegistry.java:103)
> >>   at
> org.infinispan.manager.DefaultCacheManager.(DefaultCacheManager.java:381)
> >>   ... 29 more
> >> Caused by: java.lang.NullPointerException
> >>   at
> org.infinispan.factories.components.ComponentMetadataRepo.readMetadata(ComponentMetadataRepo.java:53)
> >>   at
> org.infinispan.factories.components.ComponentMetadataRepo.initialize(ComponentMetadataRepo.java:129)
> >>   ... 31 more
> >>
> >>
> >> The ComponentMetadataRepo is unable to load
> >> "infinispan-core-component-metadata.dat", which contains the
> >> critically-important information for wiring together the internal
> >> components of Infinispan core.
> >>
> >> Now I think this is quite silly as locating this resource is trivial:
> >> it's in the same jar as all the infinispan core classes:
> >> infinispan-core-[version].jar so patching this looks like trivial:
> >> it's using the ClassLoader configured as defaultClassLoader in
> >> org.infinispan.factories.AbstractComponentRegistry, but really it
> >> should just use something like
> >> AbstractComponentRegistry.class.getClassLoader() ?
>

Re: [infinispan-dev] CHM or CHMv8?

2013-04-22 Thread Dan Berindei

Right. If we have anywhere a map that's initialized from a single thread
and then accessed only for reading from many threads, it probably makes
sense to use a HashMap and wrap it in an UnmodifiableMap. But if it can be
written from multiple threads as well, I think we should use a CHMV8.

BTW, the HashMap implementation in OpenJDK 1.7 seems to have some
anti-collision features (a VM-dependent hash code generator for Strings),
but our version of CHMV8 doesn't. Perhaps we need to upgrade to the latest
CHMV8 version?



On Fri, Apr 19, 2013 at 4:32 PM, David M. Lloyd wrote:

> On 04/19/2013 08:22 AM, Sanne Grinovero wrote:
> > On 19 April 2013 13:52, David M. Lloyd  wrote:
> >> On 04/19/2013 05:17 AM, Sanne Grinovero wrote:
> >>> On 19 April 2013 11:10, Dan Berindei  wrote:
> >>>>
> >>>>
> >>>>
> >>>> On Fri, Apr 19, 2013 at 12:58 PM, Sanne Grinovero <
> sa...@infinispan.org>
> >>>> wrote:
> >>>>>
> >>>>> On 19 April 2013 10:37, Dan Berindei  wrote:
> >>>>>> Testing mixed read/write performance with capacity 10, keys
> 30,
> >>>>>> concurrency level 32, threads 12, read:write ratio 99:1
> >>>>>> Container CHM   Ops/s 5178894.77  Gets/s 5127105.82  Puts/s
> >>>>>> 51788.95  HitRatio  86.23  Size 177848  stdDev   60896.42
> >>>>>> Container CHMV8 Ops/s 5768824.37  Gets/s 5711136.13  Puts/s
> >>>>>> 57688.24  HitRatio  84.72  Size 171964  stdDev   60249.99
> >>>>>
> >>>>> Nice, thanks.
> >>>>>>
> >>>>>> The test is probably limited by the 1% writes, but I think it does
> show
> >>>>>> that
> >>>>>> reads in CHMV8 are not slower than reads in OpenJDK7's CHM.
> >>>>>> I haven't measured it, but the memory footprint should also be
> better,
> >>>>>> because it doesn't use segments any more.
> >>>>>>
> >>>>>> AFAIK the memoryCHMV8 also uses copy-on-write at the bucket level,
> but
> >>>>>> we
> >>>>>> could definitely do a pure read test with a HashMap to see how big
> the
> >>>>>> performance difference is.
> >>>>>
> >>>>> By copy-on-write I didn't mean on the single elements, but on the
> >>>>> whole map instance:
> >>>>>
> >>>>> private volatile HashMap configuration;
> >>>>>
> >>>>> synchronized addConfigurationProperty(String, String) {
> >>>>>HashMap newcopy = new HashMap( configuration ):
> >>>>>newcopy.put(..);
> >>>>>configuration = newcopy;
> >>>>> }
> >>>>>
> >>>>> Of course that is never going to scale for writes, but if writes stop
> >>>>> at runtime after all services are started I would expect that the
> >>>>> simplicity of the non-threadsafe HashMap should have some benefit
> over
> >>>>> CHM{whatever}, or it would have been removed already?
> >>>>>
> >>>>
> >>>> Right, we should be able to tell whether that's worth doing with a
> pure read
> >>>> test with a CHMV8 and a HashMap :)
> >>>
> >>> IFF you find out CHMV8 is as good as HashMap for read only, you have
> >>> two options:
> >>>- ask the JDK team to drop the HashMap code as it's no longer needed
> >>>- fix your benchmark :-P
> >>>
> >>> In other words, I'd consider it highly surprising and suspicious
> >>> (still interesting though!)
> >>
> >> It's not as surprising as you think.  On x86, volatile reads are the
> >> same as regular reads (not counting some possible reordering magic).  So
> >> if a CHM read is a hash, an array access, and a list traversal, and so
> >> is HM (and I believe this is true though I'd have to review the code
> >> again to be sure), I'd expect very similar execution performance on
> >> read.  I think some of the anti-collision features in V8 might come into
> >> play under some circumstances though which might affect performance in a
> >> negative way (wrt the constant big-O component) but overall in a
> >> positive way (by turning the linear big-O component into a logarithmic
> one).
> >
> &

Re: [infinispan-dev] CHM or CHMv8?

2013-04-22 Thread Dan Berindei

On Mon, Apr 22, 2013 at 2:37 PM, Sanne Grinovero wrote:

> We also have been toying with the idea to hash each key only once,
> instead of both with the consistent hash (to assign the node owner)
> and once in the CHM backing the datacontainer.
> I doubt we need the datacontainer to implement Map at all, but at
> least if we go this way we don't want the hash to be affected by the
> VM instance or different nodes won't agree on the expected owner ;-)
>
>
For consistent hashing it would probably be better to cache the hash after
applying MurmurHash to it anyway. So we could in theory hack our CHMV8 to
use a cached hash code computed with MurmurHash and a cluster-specific salt.



> Also there where reports of it having a very bad impact on
> performance, I'm not sure if they where resolved yet, or are going to
> be resolved at all as it was important for security reasons.
>
>
I suspect most of the performance impact came from no longer using the
cached hash code in the String class. And since String.hashCode() isn't
allowed to change, that isn't going to change any time soon.



>
> On 22 April 2013 12:19, Dan Berindei  wrote:
> > Right. If we have anywhere a map that's initialized from a single thread
> and
> > then accessed only for reading from many threads, it probably makes
> sense to
> > use a HashMap and wrap it in an UnmodifiableMap. But if it can be written
> > from multiple threads as well, I think we should use a CHMV8.
> >
> > BTW, the HashMap implementation in OpenJDK 1.7 seems to have some
> > anti-collision features (a VM-dependent hash code generator for Strings),
> > but our version of CHMV8 doesn't. Perhaps we need to upgrade to the
> latest
> > CHMV8 version?
> >
> >
> >
> > On Fri, Apr 19, 2013 at 4:32 PM, David M. Lloyd 
> > wrote:
> >>
> >> On 04/19/2013 08:22 AM, Sanne Grinovero wrote:
> >> > On 19 April 2013 13:52, David M. Lloyd 
> wrote:
> >> >> On 04/19/2013 05:17 AM, Sanne Grinovero wrote:
> >> >>> On 19 April 2013 11:10, Dan Berindei 
> wrote:
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> On Fri, Apr 19, 2013 at 12:58 PM, Sanne Grinovero
> >> >>>> 
> >> >>>> wrote:
> >> >>>>>
> >> >>>>> On 19 April 2013 10:37, Dan Berindei 
> wrote:
> >> >>>>>> Testing mixed read/write performance with capacity 10, keys
> >> >>>>>> 30,
> >> >>>>>> concurrency level 32, threads 12, read:write ratio 99:1
> >> >>>>>> Container CHM   Ops/s 5178894.77  Gets/s 5127105.82
>  Puts/s
> >> >>>>>> 51788.95  HitRatio  86.23  Size 177848  stdDev   60896.42
> >> >>>>>> Container CHMV8 Ops/s 5768824.37  Gets/s 5711136.13
>  Puts/s
> >> >>>>>> 57688.24  HitRatio  84.72  Size 171964  stdDev   60249.99
> >> >>>>>
> >> >>>>> Nice, thanks.
> >> >>>>>>
> >> >>>>>> The test is probably limited by the 1% writes, but I think it
> does
> >> >>>>>> show
> >> >>>>>> that
> >> >>>>>> reads in CHMV8 are not slower than reads in OpenJDK7's CHM.
> >> >>>>>> I haven't measured it, but the memory footprint should also be
> >> >>>>>> better,
> >> >>>>>> because it doesn't use segments any more.
> >> >>>>>>
> >> >>>>>> AFAIK the memoryCHMV8 also uses copy-on-write at the bucket
> level,
> >> >>>>>> but
> >> >>>>>> we
> >> >>>>>> could definitely do a pure read test with a HashMap to see how
> big
> >> >>>>>> the
> >> >>>>>> performance difference is.
> >> >>>>>
> >> >>>>> By copy-on-write I didn't mean on the single elements, but on the
> >> >>>>> whole map instance:
> >> >>>>>
> >> >>>>> private volatile HashMap configuration;
> >> >>>>>
> >> >>>>> synchronized addConfigurationProperty(String, String) {
> >> >>>>>HashMap newcopy = new HashMap( configuration ):
> >> >>>>>newcopy.put(..);
>

Re: [infinispan-dev] CHM or CHMv8?

2013-04-25 Thread Dan Berindei

Yeah, I don't think you could extract the hash from the "cache of hashes"
without computing the hash in the first place...

My idea was to wrap the key in a KeyWithHash object and cache the hash
there (since we're looking up the same key in a lot of maps during a single
invocation).

Sorry Galder, I forgot about your patch, but I still think we'd need a
wrapper for keys in order to cache the hash code even if we could define an
Equivalence function.




On Thu, Apr 25, 2013 at 2:27 PM, Sanne Grinovero wrote:

> On 25 April 2013 11:51, Galder Zamarreño  wrote:
> >
> > On Apr 22, 2013, at 3:09 PM, Dan Berindei 
> wrote:
> >
> >>
> >>
> >>
> >> On Mon, Apr 22, 2013 at 2:37 PM, Sanne Grinovero 
> wrote:
> >> We also have been toying with the idea to hash each key only once,
> >> instead of both with the consistent hash (to assign the node owner)
> >> and once in the CHM backing the datacontainer.
> >> I doubt we need the datacontainer to implement Map at all, but at
> >> least if we go this way we don't want the hash to be affected by the
> >> VM instance or different nodes won't agree on the expected owner ;-)
> >>
> >>
> >> For consistent hashing it would probably be better to cache the hash
> after applying MurmurHash to it anyway. So we could in theory hack our
> CHMV8 to use a cached hash code computed with MurmurHash and a
> cluster-specific salt.
> >
> > ^ Rather than hacking CHMv8, better to provide an Equivalence function
> (which CHMv8 will have an instance variable of) for the keys which keeps
> the cache of hashes or something… once that work is committed, we can
> discuss further :)
>
> +1 to use your cool implementation. Don't like too much the sound of
> "cache the hashes or something", I didn't actually look at the code,
> but from gut feeling I would hope the Equivalence function to be
> stateless? We might be able to pass the to-be-reused hash as a
> parameter in primitive form.
>
> Sanne
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [Discussion] TimeService implementation

2013-05-04 Thread Dan Berindei

On Fri, May 3, 2013 at 7:00 PM, Mircea Markus  wrote:

>
> On 3 May 2013, at 16:54, Pedro Ruivo wrote:
>
> > On 05/03/2013 04:49 PM, Manik Surtani wrote:
> >>
> >> On 2 May 2013, at 19:01, Pedro Ruivo  wrote:
> >>
> >>>
> >>> preciseTime() {return (cached = System.nanoTime());}
> >>> impreciseTime() {return cached;}
> >>
> >> How would you invalidate the cached time?
> >
> > My idea is to have a schedule thread updating the cached time. the
> > preciseTime() is just an optimization to keep the cached value more
> > up-to-date since we are calculating the nanoTime() (and assuming that
> > nanoTime() is more expensive than write in the volatile variable).
>
> Sounds like a good idea but please don't implement that for now. That's a
> performance optimisation and would require benchmarking to prove it's worth
> doing - more of a nice to have ATM.
>
>
Actually, the performance optimization would be not to write the cached
time in preciseTime would be the performance optimization.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] [Discussion] TimeService implementation

2013-05-04 Thread Dan Berindei

Because of the component registry, every component is a (non-public)
extension point by itself. So I don't see the need to do anything special,
if we're only going to use it in the test suite.


On Fri, May 3, 2013 at 2:58 PM, Sanne Grinovero wrote:

> For testing purposes it would be useful to inject a custom
> implementation. Doesn't have to be a public API, but some kind of
> extension point would be needed.
>
> On 3 May 2013 12:51, Mircea Markus  wrote:
> >
> > On 3 May 2013, at 11:46, Galder Zamarreño wrote:
> >
> >> On May 2, 2013, at 7:01 PM, Pedro Ruivo  wrote:
> >>
> > When recovery is enabled, the recovery manager creates a second
> cache.
> > Someone may want to replace the Clock/TimeService for the "normal"
> cache
> > and left the default implementation in the "recovery" cache.
> 
>  ^ Why would an end-user want to replace the Clock/TimeService?
> 
>  Remember what I said in my previous email: I can see someone changing
> the service implementation for testing reasons, and in that case, a global
> clock/timer service that's swapable via system property would work just
> fine IMO.
> >>>
> >>> I don't know. I'm being pessimist and assuming that someone in the
> world
> >>> as a dark use case and needs to replace the service.
> >>
> >> ^ We already have quite a big configuration… we should think twice
> about adding more stuff… :)
> > +1. on top of that we can always add it to config if needed, harder to
> remove it.
> >
> > Cheers,
> > --
> > Mircea Markus
> > Infinispan lead (www.infinispan.org)
> >
> >
> >
> >
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] ISPN-2281 effect on Infinispan Server

2013-05-07 Thread Dan Berindei

On Fri, May 3, 2013 at 1:49 PM, Galder Zamarreño  wrote:

> Here's what I replied in a separate email last. Since then the issue has
> been sorted:
>
> > The reason I designed a byte[] specific Equivalence class is to avoid
> doing instanceof on the type passed. This would slow things in a
> critical path, hence, I designed a purely byte[] Equivalence class,
> and why there's no instanceof in AnyEquivalence either, to be as
> performant as possible.
>
> So yeah, as you suggest, the workaround would be for AnyEquivalence to
> check if the parameter is a byte[], in which case, delegate to
> ByteArrayEquivalence, but to reiterate, this is only a workaround and
> not the optimal solution.
>
>
I also checked if the Hot Rod server could add this itself to the
> caches, but this is complex stuff because it's given a cache manager
> already built, so it'd need to go and change the default configuration
> to apply this change programmatically, which is not easy because
> you're given a Configuration object and not the buillder, and making
> Configuration mutable just for that, where you're just trying to
> override what it's been configured in the cache manager is a hack.
>
> Since we controlled the way the servers are started via Infinispan
> Server, I assumed we controlled its configuration, hence I expected
> configuring BAEquivalence to be a safe assumption. We've made a bad
> job of waiting to integrate this and test Infinispan Server until now,
> with 7 days since the pull req has been up. Maybe the pull req test
> execution needs to also execute the Infinispan Servers testsuite
> automatically to avoid future issues
>

-1 to add more stuff to the pull request build, it already takes half a day
for all the pull requests to be revalidated after a push to master. (10 PRs
* 30 mins/PR = 5h)

Besides, if this change broke Infinispan Server, isn't there a risk that it
broke 3rd party applications relying on the HotRod server as well?



>
> On May 2, 2013, at 5:05 PM, Tristan Tarrant  wrote:
>
> > Hi all (Galder in particular),
> >
> > the integration of ISPN-2281 has caused breakage of Infinispan Server
> > because the caches created by the server have key/value equivalence set
> > to AnyEquivalence instead of ByteArrayEquivalence (like the testsuite
> does).
> > I believe the fix rests in making AnyEquivalence true to its name and
> > handle array equivalence too.
> >
> > HotRod on Infinispan Server is essentially broken until this is fixed.
> >
> > Tristan
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> --
> Galder Zamarreño
> gal...@redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] ISPN-2281 effect on Infinispan Server

2013-05-07 Thread Dan Berindei

That won't help, the moment Tristan sees there are less than 10 PRs open
he'll come up with a couple more :)


On Tue, May 7, 2013 at 7:57 PM, Sanne Grinovero wrote:

> You can avoid half a day of trouble by merging the trivial pulls I've set
> a week ago ;-)
>  On 7 May 2013 17:53, "Dan Berindei"  wrote:
>
>>
>>
>>
>> On Fri, May 3, 2013 at 1:49 PM, Galder Zamarreño wrote:
>>
>>> Here's what I replied in a separate email last. Since then the issue has
>>> been sorted:
>>>
>>> > The reason I designed a byte[] specific Equivalence class is to avoid
>>> doing instanceof on the type passed. This would slow things in a
>>> critical path, hence, I designed a purely byte[] Equivalence class,
>>> and why there's no instanceof in AnyEquivalence either, to be as
>>> performant as possible.
>>>
>>> So yeah, as you suggest, the workaround would be for AnyEquivalence to
>>> check if the parameter is a byte[], in which case, delegate to
>>> ByteArrayEquivalence, but to reiterate, this is only a workaround and
>>> not the optimal solution.
>>>
>>>
>> I also checked if the Hot Rod server could add this itself to the
>>> caches, but this is complex stuff because it's given a cache manager
>>> already built, so it'd need to go and change the default configuration
>>> to apply this change programmatically, which is not easy because
>>> you're given a Configuration object and not the buillder, and making
>>> Configuration mutable just for that, where you're just trying to
>>> override what it's been configured in the cache manager is a hack.
>>>
>>> Since we controlled the way the servers are started via Infinispan
>>> Server, I assumed we controlled its configuration, hence I expected
>>> configuring BAEquivalence to be a safe assumption. We've made a bad
>>> job of waiting to integrate this and test Infinispan Server until now,
>>> with 7 days since the pull req has been up. Maybe the pull req test
>>> execution needs to also execute the Infinispan Servers testsuite
>>> automatically to avoid future issues
>>>
>>
>> -1 to add more stuff to the pull request build, it already takes half a
>> day for all the pull requests to be revalidated after a push to master. (10
>> PRs * 30 mins/PR = 5h)
>>
>> Besides, if this change broke Infinispan Server, isn't there a risk that
>> it broke 3rd party applications relying on the HotRod server as well?
>>
>>
>>
>>>
>>> On May 2, 2013, at 5:05 PM, Tristan Tarrant  wrote:
>>>
>>> > Hi all (Galder in particular),
>>> >
>>> > the integration of ISPN-2281 has caused breakage of Infinispan Server
>>> > because the caches created by the server have key/value equivalence set
>>> > to AnyEquivalence instead of ByteArrayEquivalence (like the testsuite
>>> does).
>>> > I believe the fix rests in making AnyEquivalence true to its name and
>>> > handle array equivalence too.
>>> >
>>> > HotRod on Infinispan Server is essentially broken until this is fixed.
>>> >
>>> > Tristan
>>> > ___
>>> > infinispan-dev mailing list
>>> > infinispan-dev@lists.jboss.org
>>> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>>
>>> --
>>> Galder Zamarreño
>>> gal...@redhat.com
>>> twitter.com/galderz
>>>
>>> Project Lead, Escalante
>>> http://escalante.io
>>>
>>> Engineer, Infinispan
>>> http://infinispan.org
>>>
>>>
>>> ___
>>> infinispan-dev mailing list
>>> infinispan-dev@lists.jboss.org
>>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>>
>>
>>
>> ___
>> infinispan-dev mailing list
>> infinispan-dev@lists.jboss.org
>> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] usage of alwaysRun=true on @BeforeMethod and alike

2013-05-09 Thread Dan Berindei

Why would TestNG run an @AfterMethod method if the test didn't run? What
method would it run after?


On Thu, May 9, 2013 at 12:29 AM, Mircea Markus  wrote:

>
> On 8 May 2013, at 20:46, Adrian Nistor wrote:
>
> > @BeforeMethod + alwaysRun=true is indeed pretty wrong and pointless in
> relation to groups but @AfterMethod + alwaysRun=true seems to make sense
> given this bit of javadoc: "If set to true, this configuration method will
> be run even if one or more methods invoked previously failed or was
> skipped". Does this imply our precious teardown method will be skipped
> because the test failed unless we add alwaysRun=true?
> right, that's what I understand as well. Given that it might make sense to
> add alwaysRun to @AfterMethod as long as you expect it to be invoked in
> certain situations without the corresponding @BeforeMethod. The situation
> I'm talking about is when @BeforeMethod doesn't have "alwaysRun" and the
> owner class is in another test group, e.g.:
>
>
> @Test(groups="functional")
> class NeverFailIntemittentlyTest {
>
> CacgeManager cm;
>
> @BeforeMethod
> void setUp() {
>cm = new EmbeddedCacheManager();
> }
>
>
> @AfterMethod (alwaysRun=true)
> void tearDown() {
>cm.stop();
> }
>
> }
>
> If we run all the tests in the "xsite" profile, tearDown will throw NPE as
> setUp is not invoked.
>
> Writing tearDown like this should solve the problem:
> @AfterMethod (alwaysRun=true)
> void tearDown() {
>if (cm != null) cm.stop();
> }
>
>
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] usage of alwaysRun=true on @BeforeMethod and alike

2013-05-09 Thread Dan Berindei

On Thu, May 9, 2013 at 2:23 PM, Mircea Markus  wrote:

>
> On 9 May 2013, at 08:02, Dan Berindei wrote:
>
> > Why would TestNG run an @AfterMethod method if the test didn't run?
> if you set alwaysRun=true on that method it will run it disregarding if
> the test was run or not.
> > What method would it run after?
> not sure i get the question.
>
>
A test method is "skipped" only if a dependency of the test failed (e.g. a
@BeforeMethod method). A test that's in a different group may be considered
"ignored" or "disabled", but it's not "skipped". Since TestNG is not even
trying to run the method, it shouldn't run any @BeforeMethod/@AfterMethod
methods for it either.

I wrote a test class with different groups for the test method and each
confguration method, and the only configuration methods that ran even with
no test method in the default group were @BeforeTest/@AfterTest.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break

2013-05-09 Thread Dan Berindei

On Wed, May 8, 2013 at 2:06 PM, Mircea Markus  wrote:

>
> On 8 May 2013, at 10:40, Pedro Ruivo wrote:
>
> > On 05/08/2013 10:36 AM, Manik Surtani wrote:
> >>
> >> On 8 May 2013, at 10:34, Pedro Ruivo  wrote:
> >>
> >>> Hi guys,
> >>>
> >>> In order to use the TimeService inside the cache loaders/stores I had
> to
> >>> change the method init() to include a new parameter the TimeService.
> >>
> >> Won't this break custom/3rd party impls?
> >
> > probably/definitely yes.
> -1
> >
> > but I don't want to create a cache in all the cache loader/store tests
> > that will be used to pick the TimeService.
> >
> > I tried to mock the ComponentRegistry but it is not possible for final
> > classes. I don't want to remove the final :(
> >
> > Another alternative that come to my mind was to add a new method in
> > AdvancedCache that returns the TimeService (and this I can mock it in
> > the test suite)
>
> +1
>
>
Couldn't you change CacheLoaderManager to call
ComponentRegistry.wireDependencies(cacheStore)?

That way, each cache store could have a separate @Inject method, and it
could depend on any cache-scoped or global-scoped component. It may require
an infinispan-module.properties file in each cache store module, but it
then it could be used for any other component.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break

2013-05-09 Thread Dan Berindei

On Thu, May 9, 2013 at 11:10 PM, Mircea Markus  wrote:

>
> On 9 May 2013, at 20:56, Dan Berindei wrote:
>
> > > Another alternative that come to my mind was to add a new method in
> > > AdvancedCache that returns the TimeService (and this I can mock it in
> > > the test suite)
> >
> > +1
> >
> >
> > Couldn't you change CacheLoaderManager to call
> ComponentRegistry.wireDependencies(cacheStore)?
> >
> > That way, each cache store could have a separate @Inject method, and it
> could depend on any cache-scoped or global-scoped component.
> > It may require an infinispan-module.properties file in each cache store
> module, but it then it could be used for any other component.
> if you do ComponentRegistry.wireDependencies(cacheStore) any annotated
> method would get invoked, just curious why would it require an
> module.properties...
>

Well, the component registry looks at the component metadata in the jar, so
it needs a way to load the metadata for all the modules. But it looks like
it doesn't use module.properties, it needs a file called
META-INF/services/org.infinispan.factories.components.ModuleMetadataFileFinder
instead.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Splitting the Lucene Directory in its own top-level project

2013-05-10 Thread Dan Berindei

+1

Would that mean that you could change the lucene-v3 and lucene-v4 modules
to be different branches in the new repository instead? Or would you even
want to do that?


On Fri, May 10, 2013 at 11:55 AM, Sanne Grinovero wrote:

> Following on the idea about CacheStores, I'd like to propose having
> the Lucene Directory code to live in its own repository with an
> independent release cycle.
>
> Tests-wise it should follow the same policy of the CacheStore: adding
> enough tests to the core so that it won't break easily, and in worst
> case have the core team jump in to help.
>
> But in this case there is a strong benefit: having the Lucene
> Directory to release independently would make it easier to roll out
> updates needed by consuming projects which are pinned to the
> Infinispan (core) version included in the application server. Most
> notably this would break the "circular release dependency" between
> Search and Infinispan, and allow quicker innovation in Hibernate
> Search while staying compatible with the application server.
>
> Sanne
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] moving (some) cache stores in a different github repository

2013-05-10 Thread Dan Berindei

On Fri, May 10, 2013 at 12:54 PM, Mircea Markus  wrote:

>
> On 10 May 2013, at 10:06, Manik Surtani wrote:
>
> > There seems to be a bit of confusion on this thread.  The things I hope
> to achieve here are:
> >
> > 1.  De-coupled release cycle.
> > Most of our releases include new versions of XYZCacheStore, even though
> there are no changes to it.  This creates noise, IMO.  Cache Stores should
> only be released when there are changes made to it.  Now this wasn't so
> much of a problem when we just had a small handful of cache stores, but as
> this increases, this becomes even more noisy/confusing to end-users.
> >
> > 2.  Smaller download.
> > Not everyone uses all cache stores; not including everything in a zip
> ball will reduce download size.  But as pointed out before, this can be
> achieved via other techniques.
> >
> > 3.  Scalability.
> > Moving cache stores to separate repos will allow us to add more cache
> stores, accept more contribs for experimental cache stores, build out a
> richer ecosystem.  Right now, we restrict the number of cache store impls
> to prevent bloat of the core distribution.
>
> well summarised.
>
> >
> > This does *not* impact the developer at all, IMO.  CI (and test) runs on
> core will still involve testing all *non-experimental* cache stores.  I
> think this should happen every time and not just daily.
> I don't think we need to do it on every build. But this is more of an
> configuration option and we can do it if needed.
>

I think configuring the cache stores build to run on every snapshot build
should be enough - just like Hibernate Search.
But if we find that we break the cache stores too often, we could change
the pull request build to include the cache stores as well.



> >
> > In terms of a compatibility matrix (which cache stores work with which
> versions of core), we'd need to devise a scheme.  For example, match on
> major.minor, like CacheStore 5.3.x will work with any version of core 5.3.x.
> +1
>

Wouldn't that require us to release a JBDM cache store version 5.3.0 even
if the JDBM cache store didn't change at all between 5.2.0 and 5.3.0?
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break

2013-05-10 Thread Dan Berindei

On Fri, May 10, 2013 at 11:53 AM, Manik Surtani  wrote:

>
> On 9 May 2013, at 20:56, Dan Berindei  wrote:
>
> Couldn't you change CacheLoaderManager to call
> ComponentRegistry.wireDependencies(cacheStore)?
>
> That way, each cache store could have a separate @Inject method, and it
> could depend on any cache-scoped or global-scoped component. It may require
> an infinispan-module.properties file in each cache store module, but it
> then it could be used for any other component.
>
>
> -1.  That would expose the injection fwk to custom cache store impls.
>  Unless you're assuming that custom impls would't use the TimeService
> (since it isn't public API), and just call System.nanoTime() directly?
>
>
Well, that was my point: to allow custom cache stores to use the injection
framework.

The custom cache stores can use the component registry right now, because
they have access to the cache. And they can also use injection for their
own custom components, by writing a
org.infinispan.factories.components.ModuleMetadataFileFinder. Not allowing
them to use injection in the cache store itself seems like an arbitrary
limitation.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break

2013-05-10 Thread Dan Berindei

On Fri, May 10, 2013 at 1:31 PM, Manik Surtani  wrote:

>
> On 10 May 2013, at 11:14, Dan Berindei  wrote:
>
>
>
>
> On Fri, May 10, 2013 at 11:53 AM, Manik Surtani wrote:
>
>>
>> On 9 May 2013, at 20:56, Dan Berindei  wrote:
>>
>> Couldn't you change CacheLoaderManager to call
>> ComponentRegistry.wireDependencies(cacheStore)?
>>
>> That way, each cache store could have a separate @Inject method, and it
>> could depend on any cache-scoped or global-scoped component. It may require
>> an infinispan-module.properties file in each cache store module, but it
>> then it could be used for any other component.
>>
>>
>> -1.  That would expose the injection fwk to custom cache store impls.
>>  Unless you're assuming that custom impls would't use the TimeService
>> (since it isn't public API), and just call System.nanoTime() directly?
>>
>>
> Well, that was my point: to allow custom cache stores to use the injection
> framework.
>
> The custom cache stores can use the component registry right now, because
> they have access to the cache. And they can also use injection for their
> own custom components, by writing a
> org.infinispan.factories.components.ModuleMetadataFileFinder. Not allowing
> them to use injection in the cache store itself seems like an arbitrary
> limitation.
>
>
> It's not arbitrary at all.  It makes such internals an SPI with rules
> around compatibility.  I'd rather keep these internal and reserve the right
> to change/modify them without impact to extension points.
>
>
Isn't it already an SPI if the component registry can be accessed via
AdvancedCache and we allow any external module to inject its own components?
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break

2013-05-13 Thread Dan Berindei

100% agree, most users will have to interact with AdvancedCache at some
point - if only because of lock() and withFlags().

That doesn't mean everyone uses AdvancedCache.getComponentRegistry(), but
it does mean that a cache store implementation can use any component it
wants to.


On Mon, May 13, 2013 at 1:16 PM, Sanne Grinovero wrote:

> AdvancedCache is the API we use the most. I'd rather say I don't care for
> Cache: all I use it for us to get an AdvancedCache.
>
> On 13 May 2013 10:12, "Manik Surtani"  wrote:
> >
> >
> > On 10 May 2013, at 12:32, Dan Berindei  wrote:
> >
> >>
> >> On Fri, May 10, 2013 at 1:31 PM, Manik Surtani 
> wrote:
> >>>
> >>>
> >>> On 10 May 2013, at 11:14, Dan Berindei  wrote:
> >>>
> >>>>
> >>>>
> >>>>
> >>>> On Fri, May 10, 2013 at 11:53 AM, Manik Surtani 
> wrote:
> >>>>>
> >>>>>
> >>>>> On 9 May 2013, at 20:56, Dan Berindei 
> wrote:
> >>>>>
> >>>>>> Couldn't you change CacheLoaderManager to call
> ComponentRegistry.wireDependencies(cacheStore)?
> >>>>>>
> >>>>>> That way, each cache store could have a separate @Inject method,
> and it could depend on any cache-scoped or global-scoped component. It may
> require an infinispan-module.properties file in each cache store module,
> but it then it could be used for any other component.
> >>>>>
> >>>>>
> >>>>> -1.  That would expose the injection fwk to custom cache store
> impls.  Unless you're assuming that custom impls would't use the
> TimeService (since it isn't public API), and just call System.nanoTime()
> directly?
> >>>>>
> >>>>
> >>>> Well, that was my point: to allow custom cache stores to use the
> injection framework.
> >>>>
> >>>> The custom cache stores can use the component registry right now,
> because they have access to the cache. And they can also use injection for
> their own custom components, by writing a
> org.infinispan.factories.components.ModuleMetadataFileFinder. Not allowing
> them to use injection in the cache store itself seems like an arbitrary
> limitation.
> >>>
> >>>
> >>> It's not arbitrary at all.  It makes such internals an SPI with rules
> around compatibility.  I'd rather keep these internal and reserve the right
> to change/modify them without impact to extension points.
> >>>
> >>
> >> Isn't it already an SPI if the component registry can be accessed via
> AdvancedCache and we allow any external module to inject its own components?
> >
> >
> > It is.  But at least AdvancedCache isn't a critical, core interface that
> every Infinispan user will at some point interact with.
> >
> > --
> > Manik Surtani
> > ma...@jboss.org
> > twitter.com/maniksurtani
> >
> > Platform Architect, JBoss Data Grid
> > http://red.ht/data-grid
> >
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break

2013-05-14 Thread Dan Berindei

On Mon, May 13, 2013 at 8:44 PM, Sanne Grinovero wrote:

> On 13 May 2013 18:32, Manik Surtani  wrote:
> >
> > On 13 May 2013, at 16:25, Mircea Markus  wrote:
> >
> >>
> >> On 13 May 2013, at 15:05, Manik Surtani wrote:
> >>
>  100% agree, most users will have to interact with AdvancedCache at
> some point - if only because of lock() and withFlags().
> >>>
> >>> I've seen quite a bit of end-user code that doesn't touch
> AdvancedCache.
> >> I'm on Dan's side here, I think it's pretty popular through the users
> and should be considered as public API. A note on the same lines, we also
> recommend all our users to use Flag.IGNORE_RETURN_VALUE, which again goes
> trough AdvancedCache.
> >
> > So you're saying getTimeService() should be in EmbeddedCacheManager?
>  That's Dan's argument... I really don't think this should be accessible by
> end-user applications.
>
> +1 to keep it hidden, but SPI kind of API wouldn't be too bad.
>
>
If we want to keep it hidden, then I think it would be best to leave the
getTimeService() method only in ComponentRegistry/GlobalComponentRegistry
and remove it from the AdvancedCache interface.

We might want to remove it from the configuration, too.



> More importantly, I'd design it in such a way that different Caches
> could be using a different one. Doesn't have to be supported in the
> current code implementation, I just mean API-wise this should not be
> on a "global" component but on a Cache-specific one.
>
>
Any particular usage in mind for having a different time service in each
cache?

We definitely need a global component, because JGroupsTransport uses it, so
we'd have to support both.

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break

2013-05-14 Thread Dan Berindei

On Tue, May 14, 2013 at 10:52 AM, Sanne Grinovero wrote:

> On 14 May 2013 08:33, Dan Berindei  wrote:
> >
> >
> >
> > On Mon, May 13, 2013 at 8:44 PM, Sanne Grinovero 
> > wrote:
> >>
> >> On 13 May 2013 18:32, Manik Surtani  wrote:
> >> >
> >> > On 13 May 2013, at 16:25, Mircea Markus  wrote:
> >> >
> >> >>
> >> >> On 13 May 2013, at 15:05, Manik Surtani wrote:
> >> >>
> >> >>>> 100% agree, most users will have to interact with AdvancedCache at
> >> >>>> some point - if only because of lock() and withFlags().
> >> >>>
> >> >>> I've seen quite a bit of end-user code that doesn't touch
> >> >>> AdvancedCache.
> >> >> I'm on Dan's side here, I think it's pretty popular through the users
> >> >> and should be considered as public API. A note on the same lines, we
> also
> >> >> recommend all our users to use Flag.IGNORE_RETURN_VALUE, which again
> goes
> >> >> trough AdvancedCache.
> >> >
> >> > So you're saying getTimeService() should be in EmbeddedCacheManager?
> >> > That's Dan's argument... I really don't think this should be
> accessible by
> >> > end-user applications.
> >>
> >> +1 to keep it hidden, but SPI kind of API wouldn't be too bad.
> >>
> >
> > If we want to keep it hidden, then I think it would be best to leave the
> > getTimeService() method only in ComponentRegistry/GlobalComponentRegistry
> > and remove it from the AdvancedCache interface.
> >
> > We might want to remove it from the configuration, too.
> >
> >
> >>
> >> More importantly, I'd design it in such a way that different Caches
> >> could be using a different one. Doesn't have to be supported in the
> >> current code implementation, I just mean API-wise this should not be
> >> on a "global" component but on a Cache-specific one.
> >>
> >
> > Any particular usage in mind for having a different time service in each
> > cache?
>
> Different precision requirements for eviction / expiry & co.
>
>
Playing a bit of devil's advocate here... Why would you want different
precision requirements for expiration in different caches?
Is there any other cache that supports this?



> I would expect each TimeService to be differently configured, but all
> of them could share the same "clockwork";
> I'd rather avoid discussing implementation details of such services at
> this stage, but to make a practical example let's assume
> our main clockwork uses a Timer thread to periodically update a
> volatile long; in such case the update frequency needs
> to accommodate for the requirements of each different Cache, and
> having different services makes configuration easier.
> However if you have some Cache instance which requires nanosecond
> precision, making a Timer thread an unsuitable
> implementation choice, you might still want to use the Timer thread
> for the other caches.
> It seems quite clear that a clever implementation would need different
> strategies depending on the Cache, and on the
>
requirements of the different invocation context: we don't need to
>

So you're saying having a TimeService per cache wouldn't be enough, we
might need different TimeServices for each command?



> implement the smartest TimeService today but the
> chosen API should be flexible enough to encourage such experiments.
>

The most important experiment would be to implement a TimeService with
caching and see if it really is better than what we have now. That's
certainly doable with a global TimeService.

Once we determine that the performance is indeed better, we can investigate
what would stop regular users from using it and improve those particular
scenarios. But I don't think we should start with an all-encompassing API
only to give up on it when we realize the performance benefits are minimal
and the code complexity is much higher.

https://blogs.oracle.com/ksrini/entry/we_take_java_performance_very

[...] In order to confirm the performance improvement, a constant date
value was assigned to the date, field and it was noted that a 3%
improvement may be achievable. [...]
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break

2013-05-14 Thread Dan Berindei

On Tue, May 14, 2013 at 11:51 AM, Pedro Ruivo  wrote:

>
>
> On 05/14/2013 09:44 AM, Dan Berindei wrote:
> >
> >
> >
> > On Tue, May 14, 2013 at 11:37 AM, Manik Surtani  > <mailto:msurt...@redhat.com>> wrote:
> >
> >
> > On 14 May 2013, at 08:33, Dan Berindei  > <mailto:dan.berin...@gmail.com>> wrote:
> >
> >> If we want to keep it hidden, then I think it would be best to
> >> leave the getTimeService() method only in
> >> ComponentRegistry/GlobalComponentRegistry and remove it from the
> >> AdvancedCache interface.
> >
> > +1.
>
> first I have two situations here:
>
> in production: it's indifferent for me have it in the AdvancedCache or
> in the ComponentRegistry, in the cache loader/store and in the Extended
> Stats I have access to the cache and I can pick from everywhere.
>
> in the test suite: I first try to have it only in the ComponentRegistry
> and I was not be able to mock it because the class is final (I believe
> that is final for some reason). In addition, all the cache store tests
> mocks the Cache interface. That's why I put it in the AdvancedCache.
>
>
Yeah, being able to inject the TimeService directly in the cache store
would have been nice :)
I think we can remove the final modifier from the ComponentRegistry class,
though.



> >
> >> We might want to remove it from the configuration, too.
> >
> > It is definitely *not* configurable and *not* a part of the
> > configuration.  See an earlier thread on this subject:
> > http://bit.ly/102aQ9R
>
> This is another issue I have. I need to have the TimeService in the
> Extended Stats and this are a CustomInterceptor. My first try was to
> replace and rewire the GlobalComponent but this does not work because I
> don't have any @Inject method in the CustomInterceptor where I can
> replace for TimeService implementation in the test suite. That's why it
> is in the GlobalConfiguration.
>
>
cache.getAdvancedCache().getComponentRegistry().getTimeService() should
work in a custom interceptor.
You just have to mock the cache and the component registry...
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] TimeService (ISPN-3069): CacheLoader API break

2013-05-14 Thread Dan Berindei

On Tue, May 14, 2013 at 12:38 PM, Pedro Ruivo  wrote:

>
>
> On 05/14/2013 10:31 AM, Dan Berindei wrote:
> >
> >
> >
> > On Tue, May 14, 2013 at 11:51 AM, Pedro Ruivo  > <mailto:pe...@infinispan.org>> wrote:
> >
> >
> >
> > On 05/14/2013 09:44 AM, Dan Berindei wrote:
> >  >
> >  >
> >  >
> >  > On Tue, May 14, 2013 at 11:37 AM, Manik Surtani
> > mailto:msurt...@redhat.com>
> >  > <mailto:msurt...@redhat.com <mailto:msurt...@redhat.com>>> wrote:
> >  >
> >  >
> >  > On 14 May 2013, at 08:33, Dan Berindei
> > mailto:dan.berin...@gmail.com>
> >  > <mailto:dan.berin...@gmail.com
> > <mailto:dan.berin...@gmail.com>>> wrote:
> >  >
> >  >> If we want to keep it hidden, then I think it would be best
> to
> >  >> leave the getTimeService() method only in
> >  >> ComponentRegistry/GlobalComponentRegistry and remove it from
> the
> >  >> AdvancedCache interface.
> >  >
> >  > +1.
> >
> > first I have two situations here:
> >
> > in production: it's indifferent for me have it in the AdvancedCache
> or
> > in the ComponentRegistry, in the cache loader/store and in the
> Extended
> > Stats I have access to the cache and I can pick from everywhere.
> >
> > in the test suite: I first try to have it only in the
> ComponentRegistry
> > and I was not be able to mock it because the class is final (I
> believe
> > that is final for some reason). In addition, all the cache store
> tests
> > mocks the Cache interface. That's why I put it in the AdvancedCache.
> >
> >
> > Yeah, being able to inject the TimeService directly in the cache store
> > would have been nice :)
> > I think we can remove the final modifier from the ComponentRegistry
> > class, though.
> >
> >  >
> >  >> We might want to remove it from the configuration, too.
> >  >
> >  > It is definitely *not* configurable and *not* a part of the
> >  > configuration.  See an earlier thread on this subject:
> >  > http://bit.ly/102aQ9R
> >
> > This is another issue I have. I need to have the TimeService in the
> > Extended Stats and this are a CustomInterceptor. My first try was to
> > replace and rewire the GlobalComponent but this does not work
> because I
> > don't have any @Inject method in the CustomInterceptor where I can
> > replace for TimeService implementation in the test suite. That's why
> it
> > is in the GlobalConfiguration.
> >
> >
> > cache.getAdvancedCache().getComponentRegistry().getTimeService() should
> > work in a custom interceptor.
> > You just have to mock the cache and the component registry...
>
> I cannot mock the cache in the test suite otherwise I'm not able to test
> the statistics. Even if I register and rewire the ComponentRegistry,
> does not work because I'm setting the TimeService when the start() is
> invoked.
>
>
Ok, I think I understand the problem now, and the simplest solution would
be to use TestingUtil.replaceField to replace the TimeService in your
interceptor with a mock.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] commit not failing but transaction reported as in-doubt

2013-05-16 Thread Dan Berindei

Mircea, I think I'm missing something here. Why would the originator send a
TxCompletionNotificationCommand at all if the commit command was
asynchronous?

I don't think recovery should require the originator to send a
TxCompletionNotificationCommand. Our commit commands can't fail anyway, so
the only way for a transaction to become in-doubt would be if the cache
crashed before sending the command. (Or maybe if another resource's commit
phase failed.)

Cheers
Dan


On Wed, May 15, 2013 at 5:06 PM, Mircea Markus  wrote:

> Thanks again for nice explanation Jonathan!
> @Pedro - seems like you're doing the right thing by encouraging people to
> be properly paranoid :-)
> Otherwise we'd leak tx logs (in infinispan parlance the PrepareCommands in
> the recovery cache) which would not be nice.
>
> On 15 May 2013, at 13:32, Jonathan Halliday 
> wrote:
>
> >
> > No, it's out of scope for the TM, at least as far as the JTA/XA specs
> > are concerned. The TM would not retain any txlog information to allow it
> > to perform useful recovery anyhow.   Usually you just log it in the hope
> > a human notices and sorts out the mess.  Of course properly paranoid
> > humans don't use async commit in the first place.
> >
> > There has been various talk around making JTA TM.commit() support an
> > async callback, such that the business logic thread can continue as soon
> > as the prepare phase is successful, whilst still receiving a callback
> > handler invocation if the commit phase subsequently fails. Extending
> > that to the XA protocol would be nice, but won't happen as there is no
> > upward (RM->TM) communication in XA - it's all driven top down.  So as
> > you point out, adding the failed tx to the in-doubt list is the only way
> > of signalling a problem. That's bad, since you'd also need a positive
> > 'it worked' callback in the proto to allow GC of the txlog, otherwise
> > you have to throw away the log eagerly and can't then do anything useful
> > with the subsequent error callback anyhow.
> >
> > Associated with that discussion is the expectation around the semantics
> > of afterCompletion, which may mean 'after successful prepare' or 'after
> > successful commit' in such case, the latter effectively removing the
> > need for a new JTA callback api in the first place.
> >
> > If you don't need a callback at all, then there is already an async
> > commit option in the TM config, it's just non-standard and marginally
> > dangerous. It simply logs commit phase failures and hopes a human
> notices.
> >
> > Jonathan.
> >
> > On 05/15/2013 01:13 PM, Mircea Markus wrote:
> >> Hi Jonathan,
> >>
> >> In the scope of ISPN-3063 [1] we came to a problem we need some advice
> on :-)
> >>
> >> Would a transaction manager expect/handle this situation: for a
> transaction the commit is successful but at a further point the same
> transaction would be reported as "in-doubt" to the recovery process. In our
> case this can happen when we send the commit async and this might only fail
> after the commit is acknowledged to the TM.
> >>
> >> [1] https://issues.jboss.org/browse/ISPN-3063
> >>
> >> Cheers,
> >>
> >
> > --
> > Registered in England and Wales under Company Registration No. 03798903
> > Directors: Michael Cunningham (USA), Mark Hegarty (Ireland), Matt Parson
> > (USA), Charlie Peters (USA)
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

[infinispan-dev] AtomicHashMap concurrent modifications in pessimistic mode

2013-05-16 Thread Dan Berindei

Hi guys

I'm working on an intermittent failure in NodeMoveAPIPessimisticTest and I
think I've come across what I think is underspecified behaviour in
AtomicHashMap.

Say we have two transactions, tx1 and tx2, and they both work with the same
atomic map in a pessimistic cache:

1. tx1: am1 = AtomicMapLookup.get(cache, key)
2. tx2: am2 = AtomicMapLookup.get(cache, key)
3. tx1: am1.put(subkey1, value1) // locks the map
4. tx2: am2.get(subkey1) // returns null
5. tx1: commit // the map is now {subkey1=value1}
6. tx2: am2.put(subkey2, value2) // locks the map
7. tx2: commit // the map is now {subkey2=value2}

It's not clear to me from the AtomicMap/AtomicHashMap javadoc if this is ok
or if it's a bug...

Note that today the map is overwritten by tx2 even without step 4 ("tx2:
am2.get(subkey1)"). I'm pretty sure that's a bug and I fixed it locally by
using the FORCE_WRITE_LOCK in AtomicHashMapProxy.getDeltaMapForWrite.

However, when the Tree API moves a node it first checks for the existence
of the destination node, which means NodeMoveAPIPessimisticTest is still
failing. I'm not sure if I should fix that by forcing a write lock for all
AtomicHashMap reads, for all TreeCache reads, or only in TreeCache.move().

Cheers
Dan
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] commit not failing but transaction reported as in-doubt

2013-05-16 Thread Dan Berindei

Yeah, I got a bit confused because you didn't say anything about nodes
crashing or how the question actually related to Infinispan. So I assumed
it had something to do with https://issues.jboss.org/browse/ISPN-3063 :)


On Thu, May 16, 2013 at 6:50 PM, Mircea Markus  wrote:

> I think we're discussing about two different things here.
>
> My question was a general one: what if an XAResouce ack the commit to the
> TransactionManager and then the Recovery Process (TM-side process
> independent of Infinispan) determines that the given transaction is in
> doubt. As per Jonathan's email the TM doesn't handle this well.
>
> In our case with async commit the scenario above is possible when the node
> crashes after ack the commit to the TM and before broadcasting the
> CommitCommand. So we simply shouldn't support async commit when the users
> want recovery.
>
> HTH
>
> On 16 May 2013, at 10:32, Dan Berindei  wrote:
>
> > Mircea, I think I'm missing something here. Why would the originator
> send a TxCompletionNotificationCommand at all if the commit command was
> asynchronous?
> >
> > I don't think recovery should require the originator to send a
> TxCompletionNotificationCommand.
> > Our commit commands can't fail anyway, so the only way for a transaction
> to become in-doubt would be if the cache crashed before sending the
> command. (Or maybe if another resource's commit phase failed.)
>
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] AtomicHashMap concurrent modifications in pessimistic mode

2013-05-16 Thread Dan Berindei

On Thu, May 16, 2013 at 8:27 PM, Mircea Markus  wrote:

>
> On 16 May 2013, at 15:04, Dan Berindei  wrote:
>
> > Hi guys
> >
> > I'm working on an intermittent failure in NodeMoveAPIPessimisticTest and
> I think I've come across what I think is underspecified behaviour in
> AtomicHashMap.
> >
> > Say we have two transactions, tx1 and tx2, and they both work with the
> same atomic map in a pessimistic cache:
> >
> > 1. tx1: am1 = AtomicMapLookup.get(cache, key)
> > 2. tx2: am2 = AtomicMapLookup.get(cache, key)
> > 3. tx1: am1.put(subkey1, value1) // locks the map
> > 4. tx2: am2.get(subkey1) // returns null
> > 5. tx1: commit // the map is now {subkey1=value1}
> > 6. tx2: am2.put(subkey2, value2) // locks the map
> > 7. tx2: commit // the map is now {subkey2=value2}
> >
> > It's not clear to me from the AtomicMap/AtomicHashMap javadoc if this is
> ok or if it's a bug...
> as a user I find that a bit confusing so I think tx2 should merge stuff in
> the AtomiMap.
> Id be curious to hear Manik(author) and Sanne's (user) opinion on this.
>
>
Merging should work with pessimistic locking, but I don't think we could do
it with optimistic locking and write skew check enabled: we only do the
write skew check for the whole map. Would it be worth making this change if
it meant making the behaviour of AtomicHashMap more complex?

On the other hand, I believe FineGrainedAtomicHashMap doesn't do separate
write skew checks for each key in the map either, so users probably have to
deal with this difference between pessimistic and optimistic locking
already.


>
>  >
> > Note that today the map is overwritten by tx2 even without step 4 ("tx2:
> am2.get(subkey1)"). I'm pretty sure that's a bug and I fixed it locally by
> using the FORCE_WRITE_LOCK in AtomicHashMapProxy.getDeltaMapForWrite.
> >
> > However, when the Tree API moves a node it first checks for the
> existence of the destination node, which means NodeMoveAPIPessimisticTest
> is still failing. I'm not sure if I should fix that by forcing a write lock
> for all AtomicHashMap reads, for all TreeCache reads, or only in
> TreeCache.move().
> >
>

I tried using the FORCE_WRITE_LOCKS flag for all TreeCache reads. This
seems to work fine, and move() doesn't throw any exceptions in pessimistic
mode any more. In optimistic mode, it doesn't change anything, and
concurrent moves still fail with WriteSkewException. The only downside is
the performance, having extra locks will certainly slow things down.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] AtomicHashMap concurrent modifications in pessimistic mode

2013-05-17 Thread Dan Berindei

On Fri, May 17, 2013 at 1:59 PM, Mircea Markus  wrote:

>
> On 17 May 2013, at 07:35, Dan Berindei  wrote:
>
> >
> >
> >
> > On Thu, May 16, 2013 at 8:27 PM, Mircea Markus 
> wrote:
> >
> > On 16 May 2013, at 15:04, Dan Berindei  wrote:
> >
> > > Hi guys
> > >
> > > I'm working on an intermittent failure in NodeMoveAPIPessimisticTest
> and I think I've come across what I think is underspecified behaviour in
> AtomicHashMap.
> > >
> > > Say we have two transactions, tx1 and tx2, and they both work with the
> same atomic map in a pessimistic cache:
> > >
> > > 1. tx1: am1 = AtomicMapLookup.get(cache, key)
> > > 2. tx2: am2 = AtomicMapLookup.get(cache, key)
> > > 3. tx1: am1.put(subkey1, value1) // locks the map
> > > 4. tx2: am2.get(subkey1) // returns null
> > > 5. tx1: commit // the map is now {subkey1=value1}
> > > 6. tx2: am2.put(subkey2, value2) // locks the map
> > > 7. tx2: commit // the map is now {subkey2=value2}
> > >
> > > It's not clear to me from the AtomicMap/AtomicHashMap javadoc if this
> is ok or if it's a bug...
> > as a user I find that a bit confusing so I think tx2 should merge stuff
> in the AtomiMap.
> > Id be curious to hear Manik(author) and Sanne's (user) opinion on this.
> >
> >
> > Merging should work with pessimistic locking, but I don't think we could
> do it with optimistic locking and write skew check enabled: we only do the
> write skew check for the whole map.
> if the WSC is enabled, then the 2nd transaction should fail: tx2 reads the
> version at 2. and at 7. The WSC should forbid it to commit, so I we
> shouldn't have this problem at all.
>

Right, the 2nd transaction must fail with WSC enabled, so we can't
implement merging.



> > Would it be worth making this change if it meant making the behaviour of
> AtomicHashMap more complex?
> how more complex? If it's not a quick fix (2h) I'd say no as this is more
> of a nice to have/no user requires this functionality ATM.
>

The behaviour of AtomicMap will be more complex because we're adding a bit
of functionality that only works with pessimistic locking. Or maybe with
optimistic locking as well, only not when write skew check is enabled.

This is definitely not a 2h fix. As you can see, it's taking more than 2h
just to figure out what needs to change :)
What other options do we have? Leave it as it is and document the
limitation?



> >
> > On the other hand, I believe FineGrainedAtomicHashMap doesn't do
> separate write skew checks for each key in the map either, so users
> probably have to deal with this difference between pessimistic and
> optimistic locking already.
> For FGAM I think the WSC should be performed on a per FGAM's key basis,
> and not for the whole map.
>

I agree, but I think implementing fine-grained WSC will be tricky. I'll
create a feature request in JIRA.


>
> >
> > >
> > > Note that today the map is overwritten by tx2 even without step 4
> ("tx2: am2.get(subkey1)"). I'm pretty sure that's a bug and I fixed it
> locally by using the FORCE_WRITE_LOCK in
> AtomicHashMapProxy.getDeltaMapForWrite.
> > >
> > > However, when the Tree API moves a node it first checks for the
> existence of the destination node, which means NodeMoveAPIPessimisticTest
> is still failing. I'm not sure if I should fix that by forcing a write lock
> for all AtomicHashMap reads, for all TreeCache reads, or only in
> TreeCache.move().
> > >
> >
> > I tried using the FORCE_WRITE_LOCKS flag for all TreeCache reads. This
> seems to work fine, and move() doesn't throw any exceptions in pessimistic
> mode any more. In optimistic mode, it doesn't change anything, and
> concurrent moves still fail with WriteSkewException. The only downside is
> the performance, having extra locks will certainly slow things down.
> >
> > ___
> > infinispan-dev mailing list
> > infinispan-dev@lists.jboss.org
> > https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] AtomicHashMap concurrent modifications in pessimistic mode

2013-05-20 Thread Dan Berindei

On Mon, May 20, 2013 at 1:57 PM, Manik Surtani  wrote:

>
> On 16 May 2013, at 15:04, Dan Berindei  wrote:
>
> Hi guys
>
> I'm working on an intermittent failure in NodeMoveAPIPessimisticTest and I
> think I've come across what I think is underspecified behaviour in
> AtomicHashMap.
>
> Say we have two transactions, tx1 and tx2, and they both work with the
> same atomic map in a pessimistic cache:
>
> 1. tx1: am1 = AtomicMapLookup.get(cache, key)
> 2. tx2: am2 = AtomicMapLookup.get(cache, key)
> 3. tx1: am1.put(subkey1, value1) // locks the map
> 4. tx2: am2.get(subkey1) // returns null
> 5. tx1: commit // the map is now {subkey1=value1}
> 6. tx2: am2.put(subkey2, value2) // locks the map
> 7. tx2: commit // the map is now {subkey2=value2}
>
> It's not clear to me from the AtomicMap/AtomicHashMap javadoc if this is
> ok or if it's a bug...
>
>
> If optimistic, step 7 should fail with a write skew check.  If
> pessimistic, step 2 would *usually* block assuming that another thread is
> updating the map, but since neither tx1 or tx2 has started updating the map
> yet, neither has a write lock on the map.  So that succeeds.  I'm not sure
> if this is any different from not using an atomic map:
>
> 1.  tx1: cache.get(k, v); // reads into tx context
> 2.  tx2: cache.get(k, v);
> 3.  tx1: cache.put(k, v + 1 );
> 4.  tx1: commit
> 5.  tx2: cache.put(k, v + 1 );
> 6.  tx2: commit
>
> here as well, if using optimistic, step 6 will fail with a WSC but if
> pessimistic this will work (since tx2 only requested a write lock after tx1
> committed/released its write lock).
>
>
The difference is that in your scenario, you see in the code that tx2
writes to key k, so it's not surprising to find that tx2 overwrote the
value written by tx1. But it would be surprising if tx2 also overwrote an
unrelated key k2.

With an atomic map, you only see in the code "map.put(subkey2, value)". Tx2
doesn't touch subkey1, so it's not that obvious that it should remove it.
It is clear to me why it behaves the way it does now, after reading the
implementation, but I don't think it's what most users would expect. (The
proof, I guess, is in the current implementation of TreeCache.move()).

With a FineGrainedAtomicMap in optimistic mode, it's not obvious why tx1
writing to subkey1 should cause tx2's write to subkey2 fail, either (see
https://issues.jboss.org/browse/ISPN-3123).


> Note that today the map is overwritten by tx2 even without step 4 ("tx2:
> am2.get(subkey1)"). I'm pretty sure that's a bug and I fixed it locally by
> using the FORCE_WRITE_LOCK in AtomicHashMapProxy.getDeltaMapForWrite.
>
> However, when the Tree API moves a node it first checks for the existence
> of the destination node, which means NodeMoveAPIPessimisticTest is still
> failing. I'm not sure if I should fix that by forcing a write lock for all
> AtomicHashMap reads, for all TreeCache reads, or only in TreeCache.move().
>
>
> I think only in TreeCache.move()
>
>
I tend to disagree. I think it's way too easy to introduce a read of a
node's structure in a transaction and start losing data without knowing it.
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] configuring fetchInMemoryState for topology caches

2013-05-21 Thread Dan Berindei

I wouldn't want to deprecate CCL, I think it definitely has a purpose - at
least in invalidation mode.

Even in replication mode, having a lazy alternative to state transfer may
be useful. Maybe not for the topology cache, but it might make sense for
large caches.


On Tue, May 21, 2013 at 4:36 PM, Mircea Markus  wrote:

>
> On 21 May 2013, at 08:30, Tristan Tarrant  wrote:
>
> > On 05/21/2013 08:58 AM, Galder Zamarreño wrote:
> >> Shouldn't it be enabled by default/enforced?
> >> ^ Either that, or the cluster cache loader are used, both of which
> serve the same purpouse.
> >>
> > I think what Mircea is getting at, is that there is an intention to
> > deprecate / remove the CCL. I think that we can do that in 6.0 (with the
> > CacheStore redesign) and remove all potential users of CCL (including
> > the lazy topology transfer).
> Mind reader :-)
>
> Cheers,
> --
> Mircea Markus
> Infinispan lead (www.infinispan.org)
>
>
>
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Re: [infinispan-dev] Supporting notifications for entries expired while in the cache store - ISPN-694

2013-05-21 Thread Dan Berindei

On Tue, May 21, 2013 at 6:07 PM, Galder Zamarreño  wrote:

>
> On May 6, 2013, at 2:20 PM, Mircea Markus  wrote:
>
> >
> > On 3 May 2013, at 20:15, Paul Ferraro wrote:
> >
> >> Is it essential?  No - but it would simplify things on my end.
> >> If Infinispan can't implement expiration notifications, then I am forced
> >> to use immortal cache entries and perform expiration myself.  To do
> >> this, I have to store meta information about the cache entry along with
> >> my actual cache values, which normally I would get for free via mortal
> >> cache entries.
> >
> > In the scope of 5.2, what galder suggested was to fully support
> notifications for the entries in memory. In order to fully support your use
> case you'd need to add some code to trigger notifications in the cache
> store as well - I think that shouldn't be too difficult. What cache store
> implementation are you using any way?
>
> ^ Personally, I'd do in-memory entry expiration notifications for 5.2, and
> I'd leave cache store based entry expiration for 6.0, when we'll revisit
> cache store API, and we can address cache store based entry expiration
> notification properly.
>
> Agree everyone?
>
>
Agree, if you meant for 5.3 :)



>  >
> >>
> >> So, it would be nice to have.  If I have to wait for 6.0 for this,
> >> that's ok.
> >>
> >> On Thu, 2013-05-02 at 17:03 +0200, Galder Zamarreño wrote:
> >>> Hi,
> >>>
> >>> Re: https://issues.jboss.org/browse/ISPN-694
> >>>
> >>> We've got a little problem here. Paul requires that entries that might
> >>> have been expired while in the cache store, when loaded, we send
> >>> expiration notifications for them.
> >>>
> >>> The problem is that expiration checking is currently done in the
> >>> actual cache store implementations, which makes supporting this (even
> >>> outside the purgeExpired business) specific to each cache store. Not
> >>> ideal.
> >>>
> >>> The alternative would be for CacheLoaderInterceptor to load, do the
> >>> checks and then remove the entries accordingly. The big problem here
> >>> is that you're imposing a way to deal with expiration handling for all
> >>> cache store implementations, and some might be able to do these checks
> >>> and removals in a more efficient way if they were left to do it
> >>> themselves. For example, having to load all entries and then decide
> >>> which are to expire might require a lot of work, instead of
> >>> potentially communicating directly with the cache store (imagine a
> >>> remote cache store…) and asking it to return all the entries filtered
> >>> by those whose expiry has not expired.
> >>>
> >>> However, even if a cache store can do that, it would lead to loading
> >>> only those entries not expired, but then how do you send the
> >>> notifications if those expired entries have been filtered out? You
> >>> probably need multiple load methods here...
> >>>
> >>> @Paul, do you really need this for your use case?
> >>>
> >>> The simplest thing to do might be to go for option 1, and let each
> >>> cache store send notifications for expired entries for the moment, and
> >>> then in 6.0 revise not only the API for purgeExpired, but also the API
> >>> for load/loadAll() to find a way that, if any expiry listeners are in
> >>> place, a different method can be called on the cache store that
> >>> signals it to return all entries: both expired and non-expired, and
> >>> then let the CacheLoaderInterceptor send notifications from a central
> >>> location.
> >>>
> >>> Thoughts?
> >>>
> >>> Cheers,
> >>> --
> >>> Galder Zamarreño
> >>> gal...@redhat.com
> >>> twitter.com/galderz
> >>>
> >>> Project Lead, Escalante
> >>> http://escalante.io
> >>>
> >>> Engineer, Infinispan
> >>> http://infinispan.org
> >>>
> >>
> >>
> >> ___
> >> infinispan-dev mailing list
> >> infinispan-dev@lists.jboss.org
> >> https://lists.jboss.org/mailman/listinfo/infinispan-dev
> >
> > Cheers,
> > --
> > Mircea Markus
> > Infinispan lead (www.infinispan.org)
> >
> >
> >
> >
>
>
> --
> Galder Zamarreño
> gal...@redhat.com
> twitter.com/galderz
>
> Project Lead, Escalante
> http://escalante.io
>
> Engineer, Infinispan
> http://infinispan.org
>
>
> ___
> infinispan-dev mailing list
> infinispan-dev@lists.jboss.org
> https://lists.jboss.org/mailman/listinfo/infinispan-dev
>
___
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

1 2 3 4 5 6 7 8 9 >

1 - 100 of 834 matches

Mail list logo