Re: [infinispan-dev] Dynamic Externalizers sorted for ISPN-986 - looking for some feedback
+1 for Externalizer/AdvancedExternalizer. Dan On Tue, Apr 12, 2011 at 10:17 AM, Galder Zamarreño gal...@redhat.com wrote: On Apr 11, 2011, at 12:06 PM, Galder Zamarreño wrote: Guys, any thoughts on this? I want this in for BETA2... On Apr 1, 2011, at 5:54 PM, Galder Zamarreño wrote: Hi, Re: https://issues.jboss.org/browse/ISPN-986 As indicated in my comments, there's two room for two types of serialization mechanisms: one for end users and the other for SPIs. I've got a solution for this in https://github.com/galderz/infinispan/commit/09096f7998c0d0a5aae76d55bf59c72fe1cb510e and wanted to give a heads up to everyone on what it involves: - Two separate externalizer interfaces: Externalizer (which currently, to disrupt as little code as possible, is named EndUserExternalier) and ExternalizerSpi or ServiceProviderExternalizer (currently named Externalizer). The first API is basic read/write methods and the second one with a couple of more methods for more specialised behaivour. Do people like these names? Or can someone come up with better names? More detailed info on the use cases in the JIRA. I'm currently leaning towards: Externalizer (only readObject/writeObject methods) and ExternalizerSpi (would contain the current https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/marshall/Externalizer.java) Actually, I don't like ExternalizerSpi name, it makes configuration getters ackward: getExternalizerSpis... Instead I'm leaning for: Externalizer and AdvancedExternalizer which maps very nicely with Cache and AdvancedCache and makes getters more readable: getAdvancedExternalizers much clearer :) - A related factor here would be to find a better name for the XML/programmatic configuration, i.e. getServiceProviderExternalizers()? serviceProviderExternalzer or getExternalizeSpis() externalizerSpi? This is one thing and the other is that I'd want this XML and programmatic configuration to be a bit hidden away cos it's specialised or for edge cases. The obvious route the average Infinispan user should be annotation and implement Infinispan's Externalizer interface. However, I'm don't think there's anything special that can be done in the current architechture of Infinispan without rethinking end user and spi configuration. Since the XML is only relevant for the SPI version, the programmatic API would go along the lines of getExternalizerSpis() and externalizerSpi - Naming methods like this gives a direct link to the interface name while not being too verbose - To hide JBoss Marshaller details away and to simplify some of the API it provides, I've created a new @MarshallableBy annotation that maps directly to what JBMAR's @Externalizer does. To get an idea of the differences for the end users, see https://github.com/galderz/infinispan/commit/09096f7998c0d0a5aae76d55bf59c72fe1cb510e#diff-10 as opposed to https://github.com/galderz/infinispan/commit/09096f7998c0d0a5aae76d55bf59c72fe1cb510e#diff-9. Are people happy with the annotation name? The cool thing is that if someone wants to really use JBoss Marshaller Externalizers, they can, but I think the majority will be happy with just a read/write object method. And that's about it! Afterwards it just needs proper documentation in wiki and javadocs, but right now I'm mostly focused at getting naming right. Thoughts? Ideally I'd like to get this into BETA1 (release date next Tuesday, 5th April), but I'll prob hold till BETA2 to get the naming right. Cheers, -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
On Wed, May 4, 2011 at 10:14 AM, Galder Zamarreño gal...@redhat.com wrote: On May 3, 2011, at 2:33 PM, Sanne Grinovero wrote: 2011/5/3 이희승 (Trustin Lee) trus...@gmail.com: On 05/03/2011 05:08 AM, Sanne Grinovero wrote: 2011/5/2 Manik Surtani ma...@jboss.org: On 1 May 2011, at 13:38, Pete Muir wrote: As in, user API? That's a little intrusive... e.g., put(K, V, cl) ? Not for put, since you have the class, just get, and I was thinking something more like: Foo foo = getUsing(key, Foo.class) This would be a pretty useful addition to the API anyway to avoid user casts. Maybe as an advanced API, so as not to pollute the basic API? A bit like: Foo f = cache.getAdvancedCache().asClass(Foo.class).get(key); doesn't look much better than a cast, but is more cryptical :) getting back to the classloader issue, what about: Cache c = cacheManager.getCache( cacheName, classLoader ); or Cache c = cacheManager.getCache( cacheName ).usingClassLoader(classLoader ); BTW if that's an issue on the API, maybe you should propose it to JSR-107 as well ? We have a configurable Marshaller, right? Then why don't we just use the class loader that the current Marshaller uses? +1 I like the clean approach, not sure how you configure the current Marshaller to use the correct CL ? Likely hard to do via configuration file :) Well, the marshaller is a global component and so it's a cache manager level. You can't make any assumptions about it's classloader, particularly when lazy deserialization is configured and you want to make sure that the data of the cache is deserialized with the correct classloader when the user reads the data from the cache. This is gonna become even more important when we for example move to having a single cache for all 2LC entities or all EJB3 SFSBs where we'll definitely need multiple classloaders to access a single cache. The current unmarshaller uses the TCCL, which is a great idea for non-modular environments and will still work in AS7 for application classes (so it's still a good default). It probably won't work if Hibernate wants to store its own classes in the cache, because Hibernate's internal classes may not be reachable from the application's classloader. It gets even trickier if Hibernate wants to store a PrivateHibernateCollection class in the cache containing instances of application classes inside. Then I don't think there will be any single classloader that could reach both the Hibernate classes and the application classes so it can properly unmarshal both. Perhaps that's just something for the Hibernate folks to worry about? Or maybe we should allow the users to register more than one classloader with a cache? Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
On Wed, May 4, 2011 at 7:18 AM, 이희승 (Trustin Lee) trus...@gmail.com wrote: On 05/03/2011 09:33 PM, Sanne Grinovero wrote: 2011/5/3 이희승 (Trustin Lee) trus...@gmail.com: On 05/03/2011 05:08 AM, Sanne Grinovero wrote: 2011/5/2 Manik Surtani ma...@jboss.org: On 1 May 2011, at 13:38, Pete Muir wrote: As in, user API? That's a little intrusive... e.g., put(K, V, cl) ? Not for put, since you have the class, just get, and I was thinking something more like: Foo foo = getUsing(key, Foo.class) This would be a pretty useful addition to the API anyway to avoid user casts. Maybe as an advanced API, so as not to pollute the basic API? A bit like: Foo f = cache.getAdvancedCache().asClass(Foo.class).get(key); doesn't look much better than a cast, but is more cryptical :) getting back to the classloader issue, what about: Cache c = cacheManager.getCache( cacheName, classLoader ); or Cache c = cacheManager.getCache( cacheName ).usingClassLoader(classLoader ); BTW if that's an issue on the API, maybe you should propose it to JSR-107 as well ? We have a configurable Marshaller, right? Then why don't we just use the class loader that the current Marshaller uses? +1 I like the clean approach, not sure how you configure the current Marshaller to use the correct CL ? Likely hard to do via configuration file :) By default, we could use the class loader that the current Unmarshaller uses, and let user choose a class loader for a certain get() call. So, we need to deal with the two issues here: 1) Make sure that user can configure the default class loader in runtime and calling get() (and consequently related unmarshaller) will use the default class loader: cache.setDefaultClassLoader(UserClass.class.getClassLoader()) 2) Provide an additional API that allows a user to specify a class loader for the current transaction or context: cache.get(K key, ClassV clazz) // +1 to Pete's suggestion If V is SetMyObject then SetMyObject.class.getClassLoader() will give you the system classloader, which in most cases won't be able the MyObject class. It may not even be a SetMyObject of course, it could be just Set?. We could have a shortcut so the users can pass in a class instead of a classloader to avoid writing .getClassLoader(), but we shouldn't require any relation between the class passed in and the class of the return value. Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
On Wed, May 4, 2011 at 5:09 PM, Pete Muir pm...@redhat.com wrote: On 4 May 2011, at 05:34, Dan Berindei wrote: On Wed, May 4, 2011 at 10:14 AM, Galder Zamarreño gal...@redhat.com wrote: On May 3, 2011, at 2:33 PM, Sanne Grinovero wrote: 2011/5/3 이희승 (Trustin Lee) trus...@gmail.com: On 05/03/2011 05:08 AM, Sanne Grinovero wrote: 2011/5/2 Manik Surtani ma...@jboss.org: On 1 May 2011, at 13:38, Pete Muir wrote: As in, user API? That's a little intrusive... e.g., put(K, V, cl) ? Not for put, since you have the class, just get, and I was thinking something more like: Foo foo = getUsing(key, Foo.class) This would be a pretty useful addition to the API anyway to avoid user casts. Maybe as an advanced API, so as not to pollute the basic API? A bit like: Foo f = cache.getAdvancedCache().asClass(Foo.class).get(key); doesn't look much better than a cast, but is more cryptical :) getting back to the classloader issue, what about: Cache c = cacheManager.getCache( cacheName, classLoader ); or Cache c = cacheManager.getCache( cacheName ).usingClassLoader(classLoader ); BTW if that's an issue on the API, maybe you should propose it to JSR-107 as well ? We have a configurable Marshaller, right? Then why don't we just use the class loader that the current Marshaller uses? +1 I like the clean approach, not sure how you configure the current Marshaller to use the correct CL ? Likely hard to do via configuration file :) Well, the marshaller is a global component and so it's a cache manager level. You can't make any assumptions about it's classloader, particularly when lazy deserialization is configured and you want to make sure that the data of the cache is deserialized with the correct classloader when the user reads the data from the cache. This is gonna become even more important when we for example move to having a single cache for all 2LC entities or all EJB3 SFSBs where we'll definitely need multiple classloaders to access a single cache. The current unmarshaller uses the TCCL, which is a great idea for non-modular environments and will still work in AS7 for application classes (so it's still a good default). It probably won't work if Hibernate wants to store its own classes in the cache, because Hibernate's internal classes may not be reachable from the application's classloader. It gets even trickier if Hibernate wants to store a PrivateHibernateCollection class in the cache containing instances of application classes inside. Then I don't think there will be any single classloader that could reach both the Hibernate classes and the application classes so it can properly unmarshal both. Perhaps that's just something for the Hibernate folks to worry about? Or maybe we should allow the users to register more than one classloader with a cache? You need to use a bridge classloader in this case. You're right of course, Hibernate must have received/guessed the application's classloader and so it is able to create a bridge classloader that includes both. And if the application classes include references to classes from another module(s), the application has to provide a bridge classloader to Hibernate anyway. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
2011/5/4 Pete Muir pm...@redhat.com: On 4 May 2011, at 09:55, Dan Berindei wrote: On Wed, May 4, 2011 at 7:18 AM, 이희승 (Trustin Lee) trus...@gmail.com wrote: On 05/03/2011 09:33 PM, Sanne Grinovero wrote: 2011/5/3 이희승 (Trustin Lee) trus...@gmail.com: On 05/03/2011 05:08 AM, Sanne Grinovero wrote: 2011/5/2 Manik Surtani ma...@jboss.org: On 1 May 2011, at 13:38, Pete Muir wrote: As in, user API? That's a little intrusive... e.g., put(K, V, cl) ? Not for put, since you have the class, just get, and I was thinking something more like: Foo foo = getUsing(key, Foo.class) This would be a pretty useful addition to the API anyway to avoid user casts. Maybe as an advanced API, so as not to pollute the basic API? A bit like: Foo f = cache.getAdvancedCache().asClass(Foo.class).get(key); doesn't look much better than a cast, but is more cryptical :) getting back to the classloader issue, what about: Cache c = cacheManager.getCache( cacheName, classLoader ); or Cache c = cacheManager.getCache( cacheName ).usingClassLoader(classLoader ); BTW if that's an issue on the API, maybe you should propose it to JSR-107 as well ? We have a configurable Marshaller, right? Then why don't we just use the class loader that the current Marshaller uses? +1 I like the clean approach, not sure how you configure the current Marshaller to use the correct CL ? Likely hard to do via configuration file :) By default, we could use the class loader that the current Unmarshaller uses, and let user choose a class loader for a certain get() call. So, we need to deal with the two issues here: 1) Make sure that user can configure the default class loader in runtime and calling get() (and consequently related unmarshaller) will use the default class loader: cache.setDefaultClassLoader(UserClass.class.getClassLoader()) 2) Provide an additional API that allows a user to specify a class loader for the current transaction or context: cache.get(K key, ClassV clazz) // +1 to Pete's suggestion If V is SetMyObject then SetMyObject.class.getClassLoader() will give you the system classloader, which in most cases won't be able the MyObject class. It may not even be a SetMyObject of course, it could be just Set?. We could have a shortcut so the users can pass in a class instead of a classloader to avoid writing .getClassLoader(), but we shouldn't require any relation between the class passed in and the class of the return value. There are two forces at play here. Firstly the desire to introduce greater type safety into the Cache API, by returning a type for user, rather than just an object, and secondly the need to specify the CL. We could have an API like: V, CL V get(Object key, ClassCL classLoaderType); and an overloaded form V V get(Object key, ClassLoader cl); This does involve Infinispan having to do a runtime cast to V without the class type passed in as a parameter, but this may be a necessary evil in the interest of improving the user api. CacheK,V extends MapK,V, so we already have runtime casts to V for V get(Object key) It seems natural to extend it to also return objects of the cache's V type parameter instead of returning an arbitrary type: CL V get(Object key, ClassCL classLoaderType); V get(Object key, ClassLoader cl); Besides, allowing get() to return an arbitrary type will only make the user code look more type safe, without changing its behaviour at all. If the user wants type safety he can already define different caches with different V type parameters and in that case we will actually enforce that the put argument is assignable to V at compile time. An alternative which is somewhat more verbose, somewhat safer, and somewhat more complex to explain (but ultimately more logical): // Use the classloader of the expectedType, this would be the normal case V V get(Object key, ClassV expectedType); // Specify the classloader to use as a class V, CL V get(Object key, ClassV expectedType, ClassCL classLoader); // Specify the classloader to use as a classloader V V get(Object key, ClassV expectedType, ClassLoader classLoader); We could also include // Use the TCCL V get(Object key); Any thoughts? I'd allow the user to set another default classloader on the cache instead of the TCCL, like in Sanne's proposal: Cache c = cacheManager.getCache( cacheName, defaultClassLoader ); or Cache c = cacheManager.getCache( cacheName ).usingDefaultClassLoader( classLoader ); Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
On Thu, May 5, 2011 at 10:05 PM, Emmanuel Bernard emman...@hibernate.org wrote: Quick question. In case of 2.a (ie setting the TCCL), this is a requirement for frameworks and libraries using Infinispan, right? Not / never for a user application using Infinispan (in this case it seems that the application class loader will do the right thing and is set as the TCCL by the AS environment)? My understanding is that we only need to worry about the TCCL if a library is creating threads and is sharing those threads between applications - when you create a new thread it inherits the TCCL from it's parent thread. I'm not sure if we should expect the library to set the TCCL around user callbacks in this case though, because the TCCL is only necessary + guaranteed to work in a JEE environment: in a standalone application you have basically one classloader, in an OSGi application the TCCL is not guaranteed to be set properly anyway. It's also possible that an application will want to store in the cache an object whose class is not accessible through the application's classloader. E.g. if another module has an interface A and a factory for A objects in an exported package that's accessible from the application's classloader but the actual implementation class is not exported and so it's not accessible. For option 2.b we already have the InvocationContext, we only need to add the classloader to it. Even if we would go the 2.a route, for async calls I think we can unmarshal the object on a different thread (which I also think could be shared between apps), so we may need to save the TCCL at the time of the get() call in the InvocationContext and use that for unmarshalling. So I'd choose 2.b with TCCL as the default classloader. The only catch, as Galder said, is we have to be very careful not to forget invocation contexts lying around or we will leak classloaders. Dan On 5 mai 2011, at 10:55, Pete Muir wrote: I talked about this with Emmanuel last night, and we were a) concerned about the pollution of the API that this implies b) not sure why we need to do this So I also spoke to Jason to understand his initial motivation, and from this chat I think it's clear that things have gone off course here. Jason raised two issues with modularity/classloading in Infinispan: 1) Using the TCCL as the classloader to load Infinispan classes is a bad idea. Instead we should use org.infinispan.foo.Bar.getClass().getClassLoader(). 2) When we need to load application classes we need a different approach to that used today. Most of the time the TCCL is perfect for this. However *sometimes* Infinispan may be invoked outside of the application, when the TCCL may not be set in AS7. Jason and I discussed three options: a) require (through a platform integration documentation contract) that the TCCL must always be set when Infinispan is invoked. b) Have some sort of InvocationContext which knows what the correct classloader to use is (aka Hibernate/Seam/Weld design where there is a per-application construct based on a ThreadLocal). Given this hasn't been designed into the core, it seems like a large retrofit c) Make users specify the CL (directly or indirectly) via the API (as we discussed). Personally I think that (a) is the best approach right now, and is not that onerous a requirement. We might want to make the TCCL usage pluggable for OSGI envs. Cc'd David to get his feedback. On 4 May 2011, at 10:46, Dan Berindei wrote: On Wed, May 4, 2011 at 5:09 PM, Pete Muir pm...@redhat.com wrote: On 4 May 2011, at 05:34, Dan Berindei wrote: On Wed, May 4, 2011 at 10:14 AM, Galder Zamarreño gal...@redhat.com wrote: On May 3, 2011, at 2:33 PM, Sanne Grinovero wrote: 2011/5/3 이희승 (Trustin Lee) trus...@gmail.com: On 05/03/2011 05:08 AM, Sanne Grinovero wrote: 2011/5/2 Manik Surtani ma...@jboss.org: On 1 May 2011, at 13:38, Pete Muir wrote: As in, user API? That's a little intrusive... e.g., put(K, V, cl) ? Not for put, since you have the class, just get, and I was thinking something more like: Foo foo = getUsing(key, Foo.class) This would be a pretty useful addition to the API anyway to avoid user casts. Maybe as an advanced API, so as not to pollute the basic API? A bit like: Foo f = cache.getAdvancedCache().asClass(Foo.class).get(key); doesn't look much better than a cast, but is more cryptical :) getting back to the classloader issue, what about: Cache c = cacheManager.getCache( cacheName, classLoader ); or Cache c = cacheManager.getCache( cacheName ).usingClassLoader(classLoader ); BTW if that's an issue on the API, maybe you should propose it to JSR-107 as well ? We have a configurable Marshaller, right? Then why don't we just use the class loader that the current Marshaller uses? +1 I like the clean approach, not sure how you configure the current Marshaller to use the correct CL ? Likely hard to do via configuration file
Re: [infinispan-dev] JGroupsDistSync and ISPN-83
On Wed, May 11, 2011 at 11:25 AM, Vladimir Blagojevic vblag...@redhat.com wrote: The more I research ReentrantReadWriteLock the more shocked I am. Not only that a thread wanting to acquire write lock first has to release read lock, but we can block forever even if we release the read lock if we have acquired that read lock reentrantly. Each call to unlock just reduces the hold count, and the lock is only actually released when the hold count hits zero. Surreal! If ReentrantReadWriteLock would allow upgrades then you would get a deadlock when two threads both hold the read lock and try to upgrade to a write lock at the same time. There's always a trade-off... I'm not familiar with the code, but are you sure the read lock is being held by the same thread that is trying to acquire the write lock? Dan People have already debated this issue: http://stackoverflow.com/questions/464784/java-reentrantreadwritelocks-how-to-safely-acquire-write-lock In conlusion we have to seriously revisit all our uses of ReentrantReadWriteLock! Vladimir On 11-05-10 11:31 AM, Vladimir Blagojevic wrote: Hi, Would you please confirm my understanding and reading of javadoc for ReentrantReadWriteLock under section for lock downgrading. It says: Reentrancy also allows downgrading from the write lock to a read lock, by acquiring the write lock, then the read lock and then releasing the write lock. However, upgrading from a read lock to the write lock is not possible. ReentrantReadWriteLock javadoc code example with class CacheData also shows how a thread owning a read lock first has to release it in order to obtain a write lock! So a thread owning a read lock first has to release a read lock in order to obtain a write lock? This is very symptomatic in logs of ISPN-83 such as this one https://issues.jboss.org/secure/attachment/12341409/server1.log Recall how FLUSH stops all invocations on cluster and therefore all read lock acquisitions in JGroupsDistSync ultimately enabling smooth write lock acquisitions for state transfer and what not. In conclusion this leads me wondering if the culprit of all this FLUSH mess is rooted in read/write lock semantics from above? Regards, Vladimir ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] https://issues.jboss.org/browse/ISPN-977
On Wed, May 11, 2011 at 5:51 PM, Mircea Markus mircea.mar...@jboss.com wrote: On 11 May 2011, at 13:57, Dan Berindei wrote: With the new push-based rebalancing code, we install the new CH before doing any rebalancing work, so the inconsistency window should be much smaller. In theory we also block new transactions, maybe you could use that to stop generating new keys while rehashing. that's sounds interesting, I'll take a look! On the other hand, in 5.0 you should be able to implement KeyAffinityService on top of Pete's grouping work for https://issues.jboss.org/browse/ISPN-312 and then you won't need any listeners. Not sure how could that work? KAS generates pseudo-random keys that map to a certain nice. It's up to the user to place them in cache at some point, if ever. I was thinking of wrapping the keys generated by the user's KeyGenerator together with a node address in our own MagicKey class and set the @Group annotation on the address field. We already have a MagicKey class for testing, the only problem with it is that it only uses the address for hashing so you'd get lots of collisions. With Pete's changes MagicKey becomes something we can actually expose to the users (I was thinking through the KeyAffinityService API). I didn't think this through however, because MagicKeyK would have to extend K in order to satisfy the KeyAffinityService API (and even to be able to store the key in the CacheK,V later). Your question does raise an interesting point, though: a new node can join between the user call to KeyAffinityService.getKeyForAddress(address) and Cache.put(generatedKey), making generatedKey map to the new node. So there is no way for us to completely eliminate the issue of stale keys. Dan On Wed, May 11, 2011 at 3:29 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: First thing I thought when reading your email was OMG do we support on-the-fly hash implementation changes? crazy! That's obviously not the case, but if you name it as @ConsistenHashChangeListene that's what I would think. Wouldn't it be better to change the exact timing of the viewchange event? I don't see why Infinispan users might be more interested in knowing about the topology details according to the transport than what they are about the actual Infinispan hashing topology - I would expect that when I receive which notification the new view is already installed and ready to go; actually I thought that was the case since ever. What would be the use cases to get the notification *before* the new hash is installed? Cheers, Sanne 2011/5/11 Mircea Markus mircea.mar...@jboss.com: Hi, The basic problem behind this is that I need to be notified when a new consistent hash is installed. ATM there isn't any support (of which I know) for a @ConsistenHashChangeListener. I'm thinking to add such notifications either: a) internally: Observer pattern on DistributionManager or even on DistributionManagerImpl b) more generically, as a fully flagged listener. I favor a) and then if more people ask for it we will expose it as a fully flagged listener. Suggestions? Cheers, Mircea ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] IDEA:building non-OSGi libraries...
I disabled the Osmorc plugin and that seemed to stop it. Dan On Wed, May 11, 2011 at 8:50 PM, Mircea Markus mircea.mar...@jboss.com wrote: Hi, Whenever I'm switching branches from 4.2.x to master the build in IDEA takes ages(even more, about 5 mins!!!). Most of the time is spent is Building non-OSGi libraries for module... Any idea on what can be done to speed things up? Cheers, Mircea ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
On Wed, May 11, 2011 at 11:18 PM, David Bosschaert da...@redhat.com wrote: On 11/05/2011 17:54, Dan Berindei wrote: On Wed, May 11, 2011 at 7:08 PM, Pete Muirpm...@redhat.com wrote: Were we developing for OSGi I would certainly agree with you. However in many environments today we can reasonably expect the TCCL to be set and to be able to load the classes we need. So whilst making it part of the API is the safest option, it's also making complicated an API for the sake of the few at the cost of the many. Further this also seems kinda nasty to me. We know the class (and hence bundle/module) when we put the object into Infinispan, therefore why do we require people to respecify this again? David, can we not actually do something here akin to what we are discussing for Weld? Whereby we can serialize out the bundle id and then find the correct CL based on that when we deserialize. What if the object is a java.util.ArrayList? Each element in the list could belong to a different bundle, so you'd have to write a bundle id for every element in the list. Yes, if you know the Bundle-SymbolicName and Version (or the Bundle ID) you can find its classloader. On the other question, if you're passing in a class object then you can obtain its classloader and hence the bundle where it came from. But, and I think this is what Dan allused to above, is it always true that the class your passing in comes from the bundle that you need to have or could it also come from one of its parent class loaders? Exactly David, sorry if my message was a little cryptic. I think in order to handle every case properly you would have to go through the entire object graph being stored in the cache in order to find all the classloaders/bundle ids that you will need on get(). That seems like a lot of overhead to me, and forcing the user to provide the classloader doesn't seem that bad in comparison. Perhaps we should use something other than a thread-local for this though, so that users can do a onto the result of a cacheManager.getCache(A).usingClassLoader(A.class) and never have to provide the classloader again. In fact I think this is a good idea for the invocation flags we already have, too. It would involve creating lots of overloads in CacheDelegate with a PreInvocationContext parameter and a new CacheDelegateWithContext class to invoke those methods, but the public API would remain the same. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
On Thu, May 12, 2011 at 1:22 PM, Emmanuel Bernard emman...@hibernate.org wrote: On 12 mai 2011, at 11:18, Dan Berindei wrote: That seems like a lot of overhead to me, and forcing the user to provide the classloader doesn't seem that bad in comparison. Perhaps we should use something other than a thread-local for this though, so that users can do a onto the result of a cacheManager.getCache(A).usingClassLoader(A.class) and never have to provide the classloader again. Note that I don't think it works in the case you mentioned earlier ie a ArrayList with classes from many different bundles. It works, only now it's the user's job to provide the right classloader. I suspect most people will have the same situation as Augusto, the guy who was asking about classloading rules on the forum (http://community.jboss.org/message/602759): they already have a bundle that has access to the classes that will ever be in the cache (in Augusto's case via DynamicImport-Package: *), their only problem is convincing Infinispan to use that bundle's classloader for unmarshalling. If they don't like the idea of having a catch-all bundle, they can instead write a bridge classloader themselves that holds a list of classloaders and loads classes from any of them. They can even write the bundle ids in the cache if the list of bundles can only be determined at runtime. Of course, Augusto's bundles don't access each other's objects in the cache, so his problem could also be solved by giving each bundle it's own CacheDelegateWithContext (or CacheDelegateSession, if you will) instance configured with that bundle's classloader, removing the need for the DynamicImport-Package directive in his central bundle. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] https://issues.jboss.org/browse/ISPN-977
On Wed, May 11, 2011 at 8:47 PM, Erik Salter esal...@bnivideo.com wrote: Wouldn't any rehash affect the locality of these generated keys, or am I missing something? Indeed, the moment a new node joins the cluster you have no guarantees that the keys that used to be on the same node are still stored together. Cheers Dan -Original Message- From: infinispan-dev-boun...@lists.jboss.org [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of Dan Berindei Sent: Wednesday, May 11, 2011 12:47 PM To: infinispan -Dev List Subject: Re: [infinispan-dev] https://issues.jboss.org/browse/ISPN-977 On Wed, May 11, 2011 at 5:51 PM, Mircea Markus mircea.mar...@jboss.com wrote: On 11 May 2011, at 13:57, Dan Berindei wrote: With the new push-based rebalancing code, we install the new CH before doing any rebalancing work, so the inconsistency window should be much smaller. In theory we also block new transactions, maybe you could use that to stop generating new keys while rehashing. that's sounds interesting, I'll take a look! On the other hand, in 5.0 you should be able to implement KeyAffinityService on top of Pete's grouping work for https://issues.jboss.org/browse/ISPN-312 and then you won't need any listeners. Not sure how could that work? KAS generates pseudo-random keys that map to a certain nice. It's up to the user to place them in cache at some point, if ever. I was thinking of wrapping the keys generated by the user's KeyGenerator together with a node address in our own MagicKey class and set the @Group annotation on the address field. We already have a MagicKey class for testing, the only problem with it is that it only uses the address for hashing so you'd get lots of collisions. With Pete's changes MagicKey becomes something we can actually expose to the users (I was thinking through the KeyAffinityService API). I didn't think this through however, because MagicKeyK would have to extend K in order to satisfy the KeyAffinityService API (and even to be able to store the key in the CacheK,V later). Your question does raise an interesting point, though: a new node can join between the user call to KeyAffinityService.getKeyForAddress(address) and Cache.put(generatedKey), making generatedKey map to the new node. So there is no way for us to completely eliminate the issue of stale keys. Dan On Wed, May 11, 2011 at 3:29 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: First thing I thought when reading your email was OMG do we support on-the-fly hash implementation changes? crazy! That's obviously not the case, but if you name it as @ConsistenHashChangeListene that's what I would think. Wouldn't it be better to change the exact timing of the viewchange event? I don't see why Infinispan users might be more interested in knowing about the topology details according to the transport than what they are about the actual Infinispan hashing topology - I would expect that when I receive which notification the new view is already installed and ready to go; actually I thought that was the case since ever. What would be the use cases to get the notification *before* the new hash is installed? Cheers, Sanne 2011/5/11 Mircea Markus mircea.mar...@jboss.com: Hi, The basic problem behind this is that I need to be notified when a new consistent hash is installed. ATM there isn't any support (of which I know) for a @ConsistenHashChangeListener. I'm thinking to add such notifications either: a) internally: Observer pattern on DistributionManager or even on DistributionManagerImpl b) more generically, as a fully flagged listener. I favor a) and then if more people ask for it we will expose it as a fully flagged listener. Suggestions? Cheers, Mircea ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev The information contained in this message is legally privileged and confidential, and is intended for the individual or entity to whom it is addressed (or their designee). If this message is read by anyone other than the intended recipient, please be advised that distribution of this message, in any form, is strictly prohibited
Re: [infinispan-dev] Generated keys affected by rehash Was: https://issues.jboss.org/browse/ISPN-977
On Fri, May 13, 2011 at 6:38 PM, Erik Salter esal...@bnivideo.com wrote: Yes, collocation of all keys is a large concern of my application(s). Currently, I can handle keys I'm in control of (like database-generated keys), where I can play around with the hash code. What I would love to do is collocate that data with keys I can't control (like UUIDs) so that all cache operations can be done in the same transaction on the data owner's node. By playing around with the hash code do you mean you set the hashcode for all the keys you want on a certain server to the same value? I imagine that would degrade performance quite a bit, because we have HashMaps everywhere and your keys will always end up in the same hash bucket. ISPN-312 seems to me like a much better fit for your use case than the KeyAffinityService. Even if you added a listener to change your keys when the topology changes, that would mean on a rehash the keys could get moved to the new server and then back to the old server, whereas with ISPN-312 they will either all stay on the old node or they will all move to the new node. Cheers Dan Erik -Original Message- From: infinispan-dev-boun...@lists.jboss.org [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of Manik Surtani Sent: Friday, May 13, 2011 5:25 AM To: infinispan -Dev List Subject: [infinispan-dev] Generated keys affected by rehash Was: https://issues.jboss.org/browse/ISPN-977 On 11 May 2011, at 18:47, Erik Salter wrote: Wouldn't any rehash affect the locality of these generated keys, or am I missing something? It would. And hence ISPN-977, to address that. Or is your concern keys already generated before the rehash? The latter would certainly be a problem. Perhaps, if this was important to the application, on detecting a change in topology, re-generate keys and move data around? For other apps, move the session to the appropriate node? Cheers Manik -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev The information contained in this message is legally privileged and confidential, and is intended for the individual or entity to whom it is addressed (or their designee). If this message is read by anyone other than the intended recipient, please be advised that distribution of this message, in any form, is strictly prohibited. If you have received this message in error, please notify the sender immediately and delete or destroy all copies of this message. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
On Fri, May 13, 2011 at 5:48 PM, Jason T. Greene jason.gre...@redhat.com wrote: On 5/12/11 4:18 AM, Dan Berindei wrote: On Wed, May 11, 2011 at 11:18 PM, David Bosschaertda...@redhat.com wrote: On 11/05/2011 17:54, Dan Berindei wrote: On Wed, May 11, 2011 at 7:08 PM, Pete Muirpm...@redhat.com wrote: Were we developing for OSGi I would certainly agree with you. However in many environments today we can reasonably expect the TCCL to be set and to be able to load the classes we need. So whilst making it part of the API is the safest option, it's also making complicated an API for the sake of the few at the cost of the many. Further this also seems kinda nasty to me. We know the class (and hence bundle/module) when we put the object into Infinispan, therefore why do we require people to respecify this again? David, can we not actually do something here akin to what we are discussing for Weld? Whereby we can serialize out the bundle id and then find the correct CL based on that when we deserialize. What if the object is a java.util.ArrayList? Each element in the list could belong to a different bundle, so you'd have to write a bundle id for every element in the list. Yes, if you know the Bundle-SymbolicName and Version (or the Bundle ID) you can find its classloader. On the other question, if you're passing in a class object then you can obtain its classloader and hence the bundle where it came from. But, and I think this is what Dan allused to above, is it always true that the class your passing in comes from the bundle that you need to have or could it also come from one of its parent class loaders? Exactly David, sorry if my message was a little cryptic. I think in order to handle every case properly you would have to go through the entire object graph being stored in the cache in order to find all the classloaders/bundle ids that you will need on get(). This approach just doesn't scale, and it would not work in all environments (there is no gaurantee you can get the original classloader if the environment doesn't allow you to look them up by some kind of unique id). That seems like a lot of overhead to me, and forcing the user to provide the classloader doesn't seem that bad in comparison. Perhaps we should use something other than a thread-local for this though Yes thread locals are brittle, and tend to leak if they aren't managed correctly. Using a context specific instance is a much better approach. so that users can do a onto the result of a cacheManager.getCache(A).usingClassLoader(A.class) and never have to provide the classloader again. That's similar to what I was proposing with the CacheSession notion. The basic notion is that you move thread/context specific state to a separate object that the caller uses (optionaly), and it mirrors the same Cache API. The way I understood your proposal was that the session would be something visible to the user, so he would have to call cacheManager.getCache(A).getSession() in order to access the context functionality. My version essentially the same thing, except the session would be created transparently when the user calls usingClassLoader() or withFlags(). In fact I think this is a good idea for the invocation flags we already have, too. It would involve creating lots of overloads in CacheDelegate with a PreInvocationContext parameter and a new CacheDelegateWithContext class to invoke those methods, but the public API would remain the same. You dont need to reuse the same impl. Just create a new delegate which passes an extra invocation state parameter with ever call. The overhead of that is tiny. Some of the methods in CacheDelegate do have lots of logic in them, so we have to reuse them. The way I see it the existing methods will change to receive the context parameter explicitly instead of implicitly, and the methods with the old signature will call them with a default context. usingClassLoader/withFlags will create an instance of CacheDelegateSession, and the methods in CacheDelegateSession will modify/use the context stored in that instance. I agree the overhead is tiny, and the best thing is users can optimize it away by moving the usingClassLoader/withFlags calls outside of loops. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] Compilation errors in the infinispan-spring module
Hi Olaf, Did you see any problems with RHQ + Spring interaction that determined you to exclude the rhq-pluginAnnotations dependency in the spring module? dependency groupId${project.groupId}/groupId artifactIdinfinispan-core/artifactId version${project.version}/version scopecompile/scope exclusions exclusion groupIdorg.rhq.helpers/groupId artifactIdrhq-pluginAnnotations/artifactId /exclusion /exclusions /dependency I've been getting some weird errors while building the infinispan-spring module, both with OpenJDK 1.6.0_20 and with Sun JDK 1.6.0_24 and 1.6.0_25, and they seem to appear because the compiler doesn't have access to the RHQ annotations: /tmp/privatebuild/home/dan/Work/infinispan/master/core/classes/org/infinispan/manager/DefaultCacheManager.class: warning: Cannot find annotation method 'displayName()' in type 'org.rhq.helpers.pluginAnnotations.agent.Metric': class file for org.rhq.helpers.pluginAnnotations.agent.Metric not found /tmp/privatebuild/home/dan/Work/infinispan/master/core/classes/org/infinispan/manager/DefaultCacheManager.class: warning: Cannot find annotation method 'dataType()' in type 'org.rhq.helpers.pluginAnnotations.agent.Metric' An exception has occurred in the compiler (1.6.0_24). Please file a bug at the Java Developer Connection (http://java.sun.com/webapps/bugreport) after checking the Bug Parade for duplicates. Include your program and the following diagnostic in your report. Thank you. com.sun.tools.javac.code.Symbol$CompletionFailure: class file for org.rhq.helpers.pluginAnnotations.agent.DataType not found Galder has seen it too with Sun JDK 1.6.0_24, but strangely enough everyone else is able to build without any errors. I'm thinking of removing the rhq-pluginAnnotations exclusion from the infinispan-spring pom.xml, the question is whether this would break something on the Spring side. Do you know of any potential problems, or did you do this just to reduce the number of dependencies brought in by infinispan-spring into an application? Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
Not to be the guy that breaks the circle, how about the threads that handle incoming RPCs from other nodes? With storeAsBinary=false those threads have to unmarshal the objects in order to store them on the local node, so we'd need a way to register a classloader with the cache, not just with the cache session. If we use a single cache for all Hibernate entities, regardless of application, this means the cache could receive application objects from the other nodes without the application even being deployed. Does this mean we'll require storeAsBinary=true in all Hibernate 2LC configurations? Dan On Mon, May 16, 2011 at 9:54 PM, David M. Lloyd david.ll...@redhat.com wrote: I know, we can just attach the class loader to the cache! Okay, just kidding, but Galder is right, this conversation is going in circles. We already discussed that in this thread and a number of points were raised for and against. On 05/16/2011 01:20 PM, Adrian Cole wrote: What about a helper that just returns a cache with a specific classloader from a cache? cache.withLoader(cl).get(K key) -a On Mon, May 16, 2011 at 11:14 AM, Galder Zamarreño gal...@redhat.com mailto:gal...@redhat.com wrote: On May 16, 2011, at 7:57 PM, Sanne Grinovero wrote: I don't like the TCCL either, so I'll repeat my suggestion from two weeks ago to just have: Cache c = cacheManager.getCache( cacheName, classLoader ); sounds reasonable to me to have the application declare it's intentions once ? BTW I don't like cache.get(K key, ClassV clazz) as we're not speaking only about the get(K) method, but about many methods and this will explode the number of method of Cache; on the other hand I think it;s acceptable to have a single Cache instance used by a single application/classloader. You can still have multiple applications share the same underlying cache and use different classloaders: Guys, we're going around in circles. As I said the other week, you can't assume 1 cache = 1 classloader cos for example in the Hibernate 2LC all entities will be stored in a single cache as opposed to today where we have a cache per entity. And if all entities are stored in the same cache, we potentially have a cache that contains data belonging to multiple cache loaders. And the reason for all this is cos we don't support asymmetric clusters. Could someone start a design wiki to grab all the requirements? getCache( cacheName, classLoader ) would return a delegate to the original cache, having a specific marshaller in the invocation context as Trustin was suggesting. Cheers, Sanne 2011/5/16 Pete Muir pm...@redhat.com mailto:pm...@redhat.com: On 16 May 2011, at 18:20, Galder Zamarreño wrote: On May 12, 2011, at 11:18 AM, Dan Berindei wrote: On Wed, May 11, 2011 at 11:18 PM, David Bosschaert da...@redhat.com mailto:da...@redhat.com wrote: On 11/05/2011 17:54, Dan Berindei wrote: On Wed, May 11, 2011 at 7:08 PM, Pete Muirpm...@redhat.com mailto:pm...@redhat.com wrote: Were we developing for OSGi I would certainly agree with you. However in many environments today we can reasonably expect the TCCL to be set and to be able to load the classes we need. So whilst making it part of the API is the safest option, it's also making complicated an API for the sake of the few at the cost of the many. Further this also seems kinda nasty to me. We know the class (and hence bundle/module) when we put the object into Infinispan, therefore why do we require people to respecify this again? David, can we not actually do something here akin to what we are discussing for Weld? Whereby we can serialize out the bundle id and then find the correct CL based on that when we deserialize. What if the object is a java.util.ArrayList? Each element in the list could belong to a different bundle, so you'd have to write a bundle id for every element in the list. Yes, if you know the Bundle-SymbolicName and Version (or the Bundle ID) you can find its classloader. On the other question, if you're passing in a class object then you can obtain its classloader and hence the bundle where it came from. But, and I think this is what Dan allused to above, is it always true that the class your passing in comes from the bundle that you need to have or could it also come from one of its parent class loaders? Exactly David, sorry if my message was a little cryptic. I think in order to handle every case properly you would have to go through the entire object graph being
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
On Wed, May 18, 2011 at 7:06 PM, Manik Surtani ma...@jboss.org wrote: Hi guys Sorry I've been absent from this thread for a while now (it's been growing faster than I've been able to deal with email backlog!) Anyway, this is a very interesting discussion. To summarise - as Pete did at some point - there are 2 goals here: 1. Safe and intuitive use of an appropriate classloader 2. Safe type system for return values. I think the far more pressing concern is (1) so I'd like to focus on that. If we think (2) is pressing enough a concern, we should spawn a separate thread and discuss there. So, onto the issue of safe classloading. 1) Class loader per session/cache. I like Jason/Sanne/Trustin's suggestions of a session-like contract, and specifically I think this is best achieved as a delegate to a cache, again as suggested elsewhere by Pete, etc. E.g., Cache?, ? myCache = cacheManager.getCache(myCache, myClassLoader); and what is returned is something that delegates to the actual cache, making sure the TCCL is set and re-set appropriately. The handle to the cache is effectively your session and each webapp, etc in an EE environment will have its own handle. I propose using the TCCL as an internal implementation detail within this delegate, helps with making sure it is carefully managed and cleaned up while not re-engineering loads of internals. I like the API but I would not recommend using the TCCL for this. I was able to get a nice perf jump in the HotRod client by skipping 2 Thread.setContextClassLoader() calls on each cache operation (1 to set the TCCL we wanted and 1 to restore the original TCCL). Setting the TCCL is a privileged operation, so it has to go through a SecurityManager and that is very slow. I think EmbeddedCacheManager.getCache(String name, ClassLoader cl) is enough ... it is clear enough, and I don't see the need for overloaded getCache(name, classOfWhichClassLoaderIWishToUse). I agree, a Cache.usingClassloader(classOfWhichClassLoaderIWishToUse) overload would have made sense because the method name already communicates the intention, but a getCache(name, clazz) overload is too obscure. 2) Class loader per invocation. I've been less than happy with this, not just because it pollutes the API but that it adds a layer of confusion. If all use cases discussed can be solved with (1) above, then I'd prefer to just do that. The way I see it, most user apps directly using Infinispan would only be exposed to a single class loader per cache reference (even if multiple references talk to the same cache). Frameworks, OTOH, are a bit tougher, Hibernate being a good example on this thread. So this is a question for Galder - is it feasible to maintain several references to a cache, one for each app/persistence unit? 3) Can all OSGi requirements be handled by (1)? I would guess so, from what I have read here, since the class loader is explicitly passed in when getting a handle on a cache. Yes, the only difference is that OSGi goesn't make any guarantees about the TCCL, so passing the classloader explicitly will work in all environments. However, 4) What about storeAsBinary=false? Threads processing requests from other nodes are not associated with any CacheManager.getCache(name, classloader) call, and they also have to unmarshall values with this setting. Since Hibernate already mandates storeAsBinary=true for its 2LC, we can probably get away with only supporting one classloader per cache in the storeAsBinary=false case. Still, we can't rely on the TCCL of the background threads because those threads are shared between all the caches in a CacheManager. In fact we should probably set the TCCL to null for all background threads created by Infinispan, or we risk keeping the classes of the first application/bundle that called us alive as long as those threads are still running. Dan Cheers Manik On 27 Apr 2011, at 17:39, Jason T. Greene wrote: Available here: https://github.com/infinispan/infinispan/pull/278 The problem is basically that infinispan currently is using TCCL for all class loading and resource loading. This has a lot of problems in modular containers (OSGi, AS7, etc), where you dont have framework implementation classes on the same classloader as user classes (that is how they achieve true isolation) You can read about this in more detail here: http://community.jboss.org/wiki/ModuleCompatibleClassloadingGuide The patch in the pull request is a first step, and should fix many issues, but it does not address all (there is still a lot of TCCL usage spread out among cacheloaders and so on), and ultimately it's just a work around. It should, however, be compatible in any other non-modular environment. Really the ultimate solution is to setup a proper demarcation between what the user is supposed to provide, and what is expected to be bundled with infinispan. Whenever there
Re: [infinispan-dev] [Pull Request] Modular Classloading Compatibility
On Thu, May 19, 2011 at 1:31 PM, Galder Zamarreño gal...@redhat.com wrote: On May 19, 2011, at 12:11 PM, Manik Surtani wrote: Guys - what are we talking about? Specifying ClassLoaders is only meaningful if storeAsBinary is set to true. Ok, that was not clear to me throughout the discussion. That wasn't clear for me either. And we have at least one user trying storeAsBinary==true because it has much better performance (http://community.jboss.org/message/604831). Maybe the performance difference isn't so great any more after the latest commits, but I'm sure there is still a difference. I'm not suggesting that we absolutely need to make storeAsBinary==true work with multiple classloaders per cache or even per cache manager, but we should support an alternative for OSGi users who want to use storeAsBinary==true. That means at least having a clear way to specify in one classloader per cache manager. Note that this will also help us in the storeAsBinary==false case, because then we will be able to set the bootstrap classloader as the cache manager classloader and so will be able to avoid the cache manager threads hanging on to an undeployed application's classes. In general, any situation where you have code booted off different ClassLoaders running in the same JVM and sharing the same cache (or cache manager), you would *need* to set storeAsBinary to true to get around deserialization issues on remote nodes. StoreAsBinary = false only really works for trivial cases where caches/cache managers run in environments where only one cache loader is in effect. I.e., *not* JBoss/Java EE/Hibernate/OSGi/etc. This is one of the reasons why we considered setting storeAsBinary to true by default (and we see similar techniques in competing data grids). We already have an example of a guy who managed to make it work, we just need to specify the behaviour of Infinispan so that he can rely on it working. Cheers Dan This is clear now, thanks. Cheers Manik On 19 May 2011, at 10:55, Galder Zamarreño wrote: would be different cache instances. The problem then is that if an RPC comes for entities cache and entity P1, which of the entities caches do I go for? You'd need to know which classloader P1 is living in the remote node and you'd have to now that at the Infinispan level to be able to store it in a non-binary format. -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] chunking ability on the JDBC cacheloader
On Mon, May 23, 2011 at 7:04 AM, 이희승 (Trustin Lee) trus...@gmail.com wrote: On 05/20/2011 03:54 PM, Manik Surtani wrote: Is spanning rows the only real solution? As you say it would mandate using transactions to keep multiple rows coherent, and 'm not sure if everyone would want to enable transactions for this. There are more hidden overheads. To update a value, the cache store must determine how many chunks already exists in the cache store and selectively delete and update them. To simply aggressively, we could delete all chunks and insert new chunks. Both at the cost of great overhead. Even MySQL supports a blog up to 4GiB, so I think it's better update the schema? +1 BLOBs are only stored in external storage if the actual data can't fit in a normal table row, so the only penalty in using a LONGBLOB compared to a VARBINARY(255) is 3 extra bytes for the length. If the user really wants to use a data type with a smaller max length, we can just report an error when the data column size is too small. We will need to check the length and throw an exception ourselves though, with MySQL we can't be sure that it is configured to raise errors when a value is truncated. Cheers Dan On 19 May 2011, at 19:06, Sanne Grinovero wrote: As mentioned on the user forum [1], people setting up a JDBC cacheloader need to be able to define the size of columns to be used. The Lucene Directory has a feature to autonomously chunk the segment contents at a configurable specified byte number, and so has the GridFS; still there are other metadata objects which Lucene currently doesn't chunk as it's fairly small (but undefined and possibly growing), and in a more general sense anybody using the JDBC cacheloader would face the same problem: what's the dimension I need to use ? While in most cases the maximum size can be estimated, this is still not good enough, as when you're wrong the byte array might get truncated, so I think the CacheLoader should take care of this. what would you think of: - adding a max_chunk_size option to JdbcStringBasedCacheStoreConfig and JdbcBinaryCacheStore - have them store in multiple rows the values which would be bigger than max_chunk_size - this will need transactions, which are currently not being used by the cacheloaders It looks like to me that only the JDBC cacheloader has these issues, as the other stores I'm aware of are more blob oriented. Could it be worth to build this abstraction in an higher level instead of in the JDBC cacheloader? Cheers, Sanne [1] - http://community.jboss.org/thread/166760 ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Trustin Lee, http://gleamynode.net/ ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Adaptive marshaller buffer sizes - ISPN-1102
Hi Galder Sorry I'm replying so late On Thu, May 19, 2011 at 2:02 PM, Galder Zamarreño gal...@redhat.com wrote: Hi all, Re: https://issues.jboss.org/browse/ISPN-1102 First of all thanks to Dan for his suggestion on reservoir sampling+percentiles, very good suggestion:). So, I'm looking into this and Trustin's http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/AdaptiveReceiveBufferSizePredictor.html but in this email I wanted to discuss the reservoir sampling mechanism (http://en.wikipedia.org/wiki/Reservoir_sampling). So, the way I look at it, to implement this you'd keep track of N buffer sizes used so far, and out of those chose K samples based on reservoir sampling, and then of those K samples left, take the 90th percentile. Calculating the percentile is easy with those K samples stored in an ordered collection. Now, my problem with this is that reservoir sampling is an O(n) operation and you would not want to be doing that per each request for a buffer that comes in. The way I see it the R algorithm is only O(n) because you have to do a random(1, i) call for each request i. Because random(1, i) tends to return bigger and bigger values with every request, the probability of actually modifying the collection of samples becomes smaller and smaller. So it would be ok to compute the buffer size estimate on every modification because those modifications are very rare after startup. One option I can think of that instead of ever letting a user thread calculate this, the user thread could just feed the buffer size collection (a concurrent collection) and we could have a thread in the background that periodically or based on some threshold calculates the reservoir sample + percentile and this is what's used as next buffer size. My biggest problem here is the concurrent collection in the middle. You could have a priority queue ordered by buffer sizes but it's unbounded. The concurrent collection does not require to be ordered though, the reservoir sampling could do that, but you want it the collection bounded. But if bounded and the limit is hit, you would not want it to block but instead override values remove the last element and insert again. You only care about the last N relevant buffer sizes... Based on my assumption that modifications become very rare after startup I was thinking of using a fixed size array with a synchronized block around each access (we'd probably need synchronization for the Random, too). Note that you must use an array or a list, because you have to replace the element random(1, i) and not the smallest/greatest/something else. Another option that would avoid the use of a concurrent collection would be if this was calculated per thread and stored in a thread local. The calculation could be done every X requests still in the client thread, or could be sent to a separate thread wrapping it around a callable and keeping the future as thread local, you could query it next time the thread wants to marshall something. True, per thread statistics could remove the need of synchronization completely. But using a separate thread for doing the computations would require synchronization again, and I don't think there's enough work to be done to justify it. I feel a bit more inclined towards the latter option although it limits the calculation to be per-thread for several reasons: - We already have org.jboss.marshalling.Marshaller and org.jboss.marshalling.Unmarshaller instances as thread local which have proven to perform well. - So we could tap into this set up to maintain not only the marshaller, but the size of the buffer too. - It could offer the possibility of being extended further to avoid creating buffers all the time and instead reuse them as long as the size is constant. After a size recalculation we'd ditch it and create a new one. I totally agree, combining adaptive size with buffer reuse would be really cool. I imagine when passing the buffer to JGroups we'd still make an arraycopy, but we'd get rid of a lot of arraycopy calls to resize the buffer when the average object size is 500 bytes. At the same time, if a small percentage of the objects are much bigger than the rest, we wouldn't reuse those huge buffers so we wouldn't waste too much memory. Thoughts? -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Adaptive marshaller buffer sizes - ISPN-1102
On Mon, May 23, 2011 at 7:44 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: To keep stuff simple, I'd add an alternative feature instead: have the custom externalizers to optionally recommend an allocation buffer size. In my experience people use a set of well known types for the key, and maybe for the value as well, for which they actually know the output byte size, so there's no point in Infinispan to try guessing the size and then adapting on it; an exception being the often used Strings, even in composite keys, but again as user of the API I have a pretty good idea of the size I'm going to need, for each object I store. Excellent idea, if the custom externalizer can give us the exact size of the serialized object we wouldn't need to do any guesswork. I'm a bit worried about over-zealous externalizers that will spend just as much computing the size of a complex object graph as they spend on actually serializing the whole thing, but as long as our internal externalizers are good examples I think we're ok. Big plus: we could use the size of the serialized object to estimate the memory usage of each cache entry, so maybe with this we could finally constrain the cache to use a fixed amount of memory :) Also in MarshalledValue I see that an ExposedByteArrayOutputStream is created, and after serialization if the buffer is found to be bigger than the buffer we're referencing a copy is made to create an exact matching byte[]. What about revamping the interface there, to expose the ExposedByteArrayOutputStream instead of byte[], up to the JGroups level? In case the value is not stored in binary form, the expected life of the stream is very short anyway, after being pushed directly to network buffers we don't need it anymore... couldn't we pass the non-truncated stream directly to JGroups without this final size adjustement ? Of course when values are stored in binary form it might make sense to save some memory, but again if that was an option I'd make use of it; in case of Lucene I can guess the size with a very good estimate (some bytes off), compared to buffer sizes of potentially many megabytes which I'd prefer to avoid copying - especially not interested in it to safe 2 bytes even if I where to store values in binary. Yeah, but ExposedByteArrayOutputStream grows by 100% percent, so if you're off by 1 in your size estimate you'll waste 50% of the memory by keeping that buffer around. Even if your estimate is perfect you're still wasting at least 32 bytes on a 64-bit machine: 16 bytes for the buffer object header + 8 for the array reference + 4 (rounded up to 8) for the count, though I guess you could get that down to 4 bytes by keeping the byte[] and count as members of MarshalledValue. Besides, for Lucene couldn't you store the actual data separately as a byte[] so that Infinispan doesn't wrap it in a MarshalledValue? Then if we just keep the ExposedByteArrayOutputStream around in the MarshalledValue, we could save some copying by replacing the output.write(raw) in writeObject(ObjectOutput output, MarshalledValue mv) by using a output.write( byte[] , offset, length ); Cheers, Sanne 2011/5/23 Bela Ban b...@redhat.com: On 5/23/11 6:15 PM, Dan Berindei wrote: I totally agree, combining adaptive size with buffer reuse would be really cool. I imagine when passing the buffer to JGroups we'd still make an arraycopy, but we'd get rid of a lot of arraycopy calls to resize the buffer when the average object size is 500 bytes. At the same time, if a small percentage of the objects are much bigger than the rest, we wouldn't reuse those huge buffers so we wouldn't waste too much memory. From my experience, reusing and syncing on a buffer will be slower than making a simple arraycopy. I used to reuse buffers in JGroups, but got better perf when I simply copied the buffer. Plus the reservoir sampling's complexity is another source of bugs... -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Adaptive marshaller buffer sizes - ISPN-1102
On Mon, May 23, 2011 at 10:12 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: 2011/5/23 Dan Berindei dan.berin...@gmail.com: On Mon, May 23, 2011 at 7:44 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: To keep stuff simple, I'd add an alternative feature instead: have the custom externalizers to optionally recommend an allocation buffer size. In my experience people use a set of well known types for the key, and maybe for the value as well, for which they actually know the output byte size, so there's no point in Infinispan to try guessing the size and then adapting on it; an exception being the often used Strings, even in composite keys, but again as user of the API I have a pretty good idea of the size I'm going to need, for each object I store. Excellent idea, if the custom externalizer can give us the exact size of the serialized object we wouldn't need to do any guesswork. I'm a bit worried about over-zealous externalizers that will spend just as much computing the size of a complex object graph as they spend on actually serializing the whole thing, but as long as our internal externalizers are good examples I think we're ok. I'd expect many to return a static constant, not to compute anything actually. Didn't prototype it, I'm just tossing some ideas so I might miss something. I guess it depends on how we document it... Big plus: we could use the size of the serialized object to estimate the memory usage of each cache entry, so maybe with this we could finally constrain the cache to use a fixed amount of memory :) that's very interesting! Also in MarshalledValue I see that an ExposedByteArrayOutputStream is created, and after serialization if the buffer is found to be bigger than the buffer we're referencing a copy is made to create an exact matching byte[]. What about revamping the interface there, to expose the ExposedByteArrayOutputStream instead of byte[], up to the JGroups level? In case the value is not stored in binary form, the expected life of the stream is very short anyway, after being pushed directly to network buffers we don't need it anymore... couldn't we pass the non-truncated stream directly to JGroups without this final size adjustement ? Of course when values are stored in binary form it might make sense to save some memory, but again if that was an option I'd make use of it; in case of Lucene I can guess the size with a very good estimate (some bytes off), compared to buffer sizes of potentially many megabytes which I'd prefer to avoid copying - especially not interested in it to safe 2 bytes even if I where to store values in binary. Yeah, but ExposedByteArrayOutputStream grows by 100% percent, so if you're off by 1 in your size estimate you'll waste 50% of the memory by keeping that buffer around. Not really, the current implementation has some smart logic in that area which reduces the potential waste to 20%; there's a max doubling threshold in bytes, we could set that to the expected value or work on the idea. The max doubling threshold is now at 4MB, so that wouldn't kick in in your case. But since we know we are are starting from a good enough estimate I guess we could get rid of it and always grow the buffer by 25%. Anyway if I happen to be often off by one, I'd better add +1 to what my externalizer computes as estimate ;) I was thinking more of what happens when the structure of the object changes (e.g. a new field is added) and the externalizer is not updated. Even if your estimate is perfect you're still wasting at least 32 bytes on a 64-bit machine: 16 bytes for the buffer object header + 8 for the array reference + 4 (rounded up to 8) for the count, though I guess you could get that down to 4 bytes by keeping the byte[] and count as members of MarshalledValue. this is true, but only if you have to keep the ExposedByteArrayOutputStream for longer after the RPC, otherwise the overhead is zero as it's being thrown away very quickly - exactly as is done now. Maybe we should write a couple of alternative implementations for MarshalledValue, depending on which options are enabled as it's getting complex. Besides, for Lucene couldn't you store the actual data separately as a byte[] so that Infinispan doesn't wrap it in a MarshalledValue? exactly as we do ;) so in the Lucene case these optimizations apply to the keys only, which are mostly tuples like long, int, String, considering that the strings contain very short filenames I'd love to: a) default to something smaller than the current 128B b) no need to make another copy if I've overestimated by a couple of bytes, the buffer is very short lived anyway. Hmm, if your keys are much shorter then I'm not sure eliminating the arraycopy is going to help performance that much anyway. Cheers, Sanne Then if we just keep the ExposedByteArrayOutputStream around in the MarshalledValue, we could save some copying
Re: [infinispan-dev] Adaptive marshaller buffer sizes - ISPN-1102
On Mon, May 23, 2011 at 11:50 PM, Bela Ban b...@redhat.com wrote: On 5/23/11 6:44 PM, Sanne Grinovero wrote: To keep stuff simple, I'd add an alternative feature instead: have the custom externalizers to optionally recommend an allocation buffer size. In my experience people use a set of well known types for the key, and maybe for the value as well, for which they actually know the output byte size, so there's no point in Infinispan to try guessing the size and then adapting on it; an exception being the often used Strings, even in composite keys, but again as user of the API I have a pretty good idea of the size I'm going to need, for each object I store. Also in MarshalledValue I see that an ExposedByteArrayOutputStream is created, and after serialization if the buffer is found to be bigger than the buffer we're referencing a copy is made to create an exact matching byte[]. What about revamping the interface there, to expose the ExposedByteArrayOutputStream instead of byte[], up to the JGroups level? No need to expose the ExposedByteArrayOutputStream, a byte[] buffer, offset and length will do it, and we already use this today. In case the value is not stored in binary form, the expected life of the stream is very short anyway, after being pushed directly to network buffers we don't need it anymore... couldn't we pass the non-truncated stream directly to JGroups without this final size adjustement ? The problem is that byte[] first has to be copied to another buffer together with the rest of the ReplicableCommand before getting to JGroups. AFAIK in JGroups you must have 1 buffer for each message. You do that, yes. However, afair, the issue is not on the *sending*, but on the *receiving* side. That's where the larger-than-needed buffer sticks around. On the sending side, as you mentioned, Infinispan passes a buffer/offset/length to JGroups and JGroups passes this right on to the network layer, which copies that data into a buffer. I don't think so... on the receiving size the buffer size is controlled exclusively by JGroups, the unmarshaller doesn't create any buffers. The only buffers on the receiving side are those created by JGroups, and JGroups knows the message size before creating the buffer so it doesn't have to worry about predicting buffer sizes. On sending however I understood that JGroups keeps the buffer with the offset and length in the NakReceivingWindow exactly as it got it from Infinispan, without any trimming, until it receives a STABLE message from all the other nodes in the cluster. -- Bela Ban Lead JGroups / Clustering Team JBoss ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Adaptive marshaller buffer sizes - ISPN-1102
On Tue, May 24, 2011 at 11:57 AM, Sanne Grinovero sanne.grinov...@gmail.com wrote: 2011/5/24 Galder Zamarreño gal...@redhat.com: Guys, Some interesting discussions here, keep them coming! Let me summarise what I submitted yesterday as pull req for https://issues.jboss.org/browse/ISPN-1102 - I don't think users can really provide such accurate predictions of the objects sizes because first java does not give you an easy way of figuring out how much your object takes up and most of the people don't have such knowledge. What I think could be more interesting is potentially having a buffer predictor that predicts sizes per type, so rather than calculate the next buffer size taking all objects into account, do that per object type. To enable to do this in the future, I'm gonna add the object to be marshalled as parameter to https://github.com/infinispan/infinispan/pull/338/files#diff-2 - This enhancement allows for your suggestions on externalizers providing estimate size to be implemented, but I'm not keen on that. - For a solution to ISPN-1102, I've gone for a simpler adaptive buffer size algorithm that Netty uses for determining the receiver buffer size. The use cases are different but I liked the simplicity of the algorithm since calculating the next buffer size was an O(1) op and can grow both ways very easily. I agree that it might not be as exact as reservoir sampling+percentile, but at least it's cheaper to compute and it resolves the immediate problem of senders keeping too much memory for sent buffers before STABLE comes around. - Next step would be to go and test this and compare it with Bela/Dan were seeing (+1 to another interactive debugging session), and if we are still not too happy about the memory consumption, maybe we can look into providing a different implementation for BufferSizePredictor that uses R sampling. - Finally, I think once ISPN-1102 is in, we should make the BufferSizePredictor implementation configurable programmatically and via XML - I'll create a separate JIRA for this. great wrap up, +1 on all points. BTW I definitely don't expect every user to be able to figure out the proper size, just that some of them might want (need?) to provide hints. Looks great Galder, although I could use some comments on how the possible buffer sizes are chosen in your algorithm :-) I guess we were thinking of different things with the externalizer extension. I was imagining something like an ObjectOutput implementation that doesn't really write anything but instead it just records the size of the object that would be written. That way the size estimate would always be accurate, but of course the performance wouldn't be very good for complex object graphs. Still I'd like to play with something like this to see if we can estimate the memory usage of the cache and base the eviction on the (estimated) memory usage instead of a fixed number of entries, it seems to me like that's the first question people ask when they start using Infinispan. Cheers Dan Cheers, Sanne Cheers, On May 24, 2011, at 8:12 AM, Bela Ban wrote: On 5/23/11 11:09 PM, Dan Berindei wrote: No need to expose the ExposedByteArrayOutputStream, a byte[] buffer, offset and length will do it, and we already use this today. In case the value is not stored in binary form, the expected life of the stream is very short anyway, after being pushed directly to network buffers we don't need it anymore... couldn't we pass the non-truncated stream directly to JGroups without this final size adjustement ? The problem is that byte[] first has to be copied to another buffer together with the rest of the ReplicableCommand before getting to JGroups. AFAIK in JGroups you must have 1 buffer for each message. If you use ExposedByteArrayOutputStream, you should have access to the underlying buffer, so you don't need to copy it. You do that, yes. However, afair, the issue is not on the *sending*, but on the *receiving* side. That's where the larger-than-needed buffer sticks around. On the sending side, as you mentioned, Infinispan passes a buffer/offset/length to JGroups and JGroups passes this right on to the network layer, which copies that data into a buffer. I don't think so... on the receiving size the buffer size is controlled exclusively by JGroups, the unmarshaller doesn't create any buffers. The only buffers on the receiving side are those created by JGroups, and JGroups knows the message size before creating the buffer so it doesn't have to worry about predicting buffer sizes. On sending however I understood that JGroups keeps the buffer with the offset and length in the NakReceivingWindow exactly as it got it from Infinispan, without any trimming, until it receives a STABLE message from all the other nodes in the cluster. Ah, ok. I think we should really do what we said before JBW, namely have an interactive debugging
Re: [infinispan-dev] chunking ability on the JDBC cacheloader
On Tue, May 24, 2011 at 11:42 AM, Sanne Grinovero sanne.grinov...@gmail.com wrote: 2011/5/24 Dan Berindei dan.berin...@gmail.com: On Mon, May 23, 2011 at 8:05 PM, Sanne Grinovero sanne.grinov...@gmail.com wrote: 2011/5/23 이희승 (Trustin Lee) trus...@gmail.com: On 05/23/2011 07:40 PM, Sanne Grinovero wrote: 2011/5/23 Dan Berindeidan.berin...@gmail.com: On Mon, May 23, 2011 at 7:04 AM, 이희승 (Trustin Lee)trus...@gmail.com wrote: On 05/20/2011 03:54 PM, Manik Surtani wrote: Is spanning rows the only real solution? As you say it would mandate using transactions to keep multiple rows coherent, and 'm not sure if everyone would want to enable transactions for this. There are more hidden overheads. To update a value, the cache store must determine how many chunks already exists in the cache store and selectively delete and update them. To simply aggressively, we could delete all chunks and insert new chunks. Both at the cost of great overhead. I see no alternative to delete all values for each key, as we don't know which part of the byte array is dirty; At which overhead are you referring? We would still store the same amount of data, slit or not split, but yes multiple statements might require clever batching. Even MySQL supports a blog up to 4GiB, so I think it's better update the schema? You mean by accommodating the column size only, or adding the chunk_id ? I'm just asking, but all of yours and Dan's feedback have already persuaded me that my initial idea of providing chunking should be avoided. I mean user's updating the column type of the schema. +1 BLOBs are only stored in external storage if the actual data can't fit in a normal table row, so the only penalty in using a LONGBLOB compared to a VARBINARY(255) is 3 extra bytes for the length. If the user really wants to use a data type with a smaller max length, we can just report an error when the data column size is too small. We will need to check the length and throw an exception ourselves though, with MySQL we can't be sure that it is configured to raise errors when a value is truncated. +1 it might be better to just check for the maximum size of stored values to fit in something; I'm not sure if we can guess the proper size from database metadata: not only the column maximum size is involved, but MySQL (to keep it as reference example, but might apply to others) also has a default maximum packet size for the connections which is not very big, when using it with Infinispan I always had to reconfigure the database server. Also as BLOBs are very poor as primary key, people might want to use a limited and well known byte size for their keys. So, shall we just add a method to check to not have surpassed a user defined threshold, checking for both key and value but on different configurable sizes? Should an exception be raised in that case? Exception will be raised by JDBC driver if key doesn't fit into the key column, so we could simply wrap it? If that always happens, the I wouldn't wrap it. entering the business of wrapping driver specific exceptions is very tricky ;) I was more concerned about the fact that some database might not raise any exception ? Not sure if that's the case, and possibly not our problem. By default MySQL only gives a warning if the value is truncated. We could throw an exception every time we got a warning from the DB, but the wrong value has already been inserted in the DB and if the key was truncated then we don't even have enough information to delete it. yes I had some bell ringing, it was MySQL then indeed. A better option to avoid checking ourselves may be to check on startup if STRICT_ALL_TABLES (http://dev.mysql.com/doc/refman/5.5/en/server-sql-mode.html#sqlmode_strict_all_tables) is enabled with SELECT @@SESSION.sql_mode in the MySQL implementation and refuse to start if it's not. There is another STRICT_TRANS_TABLES mode, but I don't know how to find out if a table is transactional or not... You check the transactional capabilities of a table by using SHOW CREATE TABLE and check which engine it's using. Still I don't think we should prevent people from shooting themselves in the foot, I'm not going to raise another exception if I don't detect they have a proper backup policy either, nor if they're using a JDBC driver having known bugs. I think that what people need from us is a way to understand the size of what they're going to store; logging it as we did for ISPN-1125 is a first step, and maybe it's enough? Maybe it would be useful to collect max sizes, and print them too regularly in the logs, or expose that through MBeans? Ok, this probably won't be such a big problem with the redesigned JDBC cache store, but with the actual design you can't put a new value without first reading the old value, and you can't read the old value because the data has been truncated, so the only way to get out of this mess
Re: [infinispan-dev] 5.0.0.CR5
I've only got ISPN-1092 to review with Pete and some improvements planned around ISPN-1000. Monday sounds good. Dan On Wed, Jun 8, 2011 at 9:08 AM, Mircea Markus mmar...@redhat.com wrote: I'll be mainly working on perf bechmark, so no deliverables from me. On 7 Jun 2011, at 17:19, Manik Surtani ma...@jboss.org wrote: Hi guys What do we have in store for CR5? What's been committed since CR4, and what's in the pipeline for CR5? Does a release on Monday (13th) make sense? Cheers Manik -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] frequent stack in testsuite
I don't think it's an externalizer issue, as I also see some exceptions on the node that generates state: 2011-06-09 18:16:18,250 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeA-57902) ISPN00095: Caught while responding to state transfer request org.infinispan.statetransfer.StateTransferException: java.util.concurrent.TimeoutException: STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeA-57902 could not obtain exclusive processing lock after 10 seconds. Locks in question are java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock@a35c90[Read locks = 1] and java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock@111fb7f[Unlocked] at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:177) at org.infinispan.remoting.InboundInvocationHandlerImpl.generateState(InboundInvocationHandlerImpl.java:248) at org.infinispan.remoting.transport.jgroups.JGroupsTransport.getState(JGroupsTransport.java:585) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:690) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771) at org.jgroups.JChannel.up(JChannel.java:1484) at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074) at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderHandler.process(STREAMING_STATE_TRANSFER.java:651) at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderThreadSpawner$1.run(STREAMING_STATE_TRANSFER.java:580) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.util.concurrent.TimeoutException: STREAMING_STATE_TRANSFER-sender-1,Infinispan-Cluster,NodeA-57902 could not obtain exclusive processing lock after 10 seconds. Locks in question are java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock@a35c90[Read locks = 1] and java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock@111fb7f[Unlocked] at org.infinispan.remoting.transport.jgroups.JGroupsDistSync.acquireProcessingLock(JGroupsDistSync.java:100) at org.infinispan.statetransfer.StateTransferManagerImpl.generateTransactionLog(StateTransferManagerImpl.java:204) at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:167) ... 11 more I guess we could write an error marker in the stream to prevent the EOFException on the receiving side, but the end result would be the same. Dan On Thu, Jun 9, 2011 at 5:58 PM, Sanne Grinovero sa...@infinispan.org wrote: Hello all, if I happen to look at the console while the tests are running, I see this exception popup very often: 2011-06-09 15:32:18,092 ERROR [JGroupsTransport] (Incoming-1,Infinispan-Cluster,NodeB-32230) ISPN00096: Caught while requesting or applying state org.infinispan.statetransfer.StateTransferException: java.io.EOFException: Read past end of file at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:333) at org.infinispan.remoting.InboundInvocationHandlerImpl.applyState(InboundInvocationHandlerImpl.java:230) at org.infinispan.remoting.transport.jgroups.JGroupsTransport.setState(JGroupsTransport.java:602) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:711) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771) at org.jgroups.JChannel.up(JChannel.java:1441) at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074) at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.connectToStateProvider(STREAMING_STATE_TRANSFER.java:523) at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.handleStateRsp(STREAMING_STATE_TRANSFER.java:462) at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.up(STREAMING_STATE_TRANSFER.java:223) at org.jgroups.protocols.FRAG2.up(FRAG2.java:189) at org.jgroups.protocols.FC.up(FC.java:479) at org.jgroups.protocols.pbcast.GMS.up(GMS.java:891) at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:246) at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:613) at org.jgroups.protocols.UNICAST.up(UNICAST.java:294) at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:703) at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:133) at org.jgroups.protocols.FD.up(FD.java:275) at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:275) at org.jgroups.protocols.MERGE2.up(MERGE2.java:209) at
Re: [infinispan-dev] Issue posted for JBoss Logging and slf4j
Hi Jeff On Mon, Jun 20, 2011 at 10:48 AM, Jeff Ramsdale jeff.ramsd...@gmail.com wrote: I've posted https://issues.jboss.org/browse/JBLOGGING-65 to address an issue I've run into converting to Infinispan 5.0.0.CR5. The issue concerns the switch to JBoss Logging and its failure to support log4j-over-slf4j, a jar provided by slf4j to adapt log4j events and route them to slf4j. JBoss Logging doesn't detect that org.apache.log4j.LogManager is provided by the log4j-over-slf4j jar and selects the Log4jLoggerProvider instead of the Slf4jLoggerProvider. I still haven't nailed down what's going on with logging in Infinispan 5.0.0 but I'm not getting any logging output and have run into several issues. See https://issues.jboss.org/browse/ISPN-1177 as well (it seems Maven 3 is required to build Infinispan 5?). I have concerns that folks converting from 4.x will have issues as I have and wanted to raise a red flag before release. Anybody else seeing a degradation in logging behavior? I'm building Infinispan with Maven 2.2.1 on Fedora 14 and it works fine. I haven't noticed ISPN-1177 before, but I do get the error from time to time when I build from IntelliJ, because the IntelliJ compiler doesn't do annotation processing (or at least I haven't been able to configure it properly). Then when you run 'mvn install' maven sees that the class is up to date and doesn't run annotation processing either. Running 'mvn clean install' usually sorts everything out. Dan -jeff ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Alert from a failing test
On Mon, Jun 20, 2011 at 11:42 PM, Sanne Grinovero sa...@infinispan.org wrote: 2011/6/20 Manik Surtani ma...@jboss.org: Oddly enough, I don't see any other tests exhibiting this behaviour. Let me know if you see it in more recent CI runs, and we'll investigate in detail. In fact there are not many tests in core which verify a full stream is received; but as in another thread I mentioned I was seeing the following exception relatively often (it never caught my attention some months ago) Caused by: java.io.EOFException: The stream ended unexpectedly. Please check whether the source of the stream encountered any issues generating the stream. at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:193) at org.infinispan.statetransfer.StateTransferManagerImpl.processCommitLog(StateTransferManagerImpl.java:218) at org.infinispan.statetransfer.StateTransferManagerImpl.applyTransactionLog(StateTransferManagerImpl.java:245) at org.infinispan.statetransfer.StateTransferManagerImpl.applyState(StateTransferManagerImpl.java:284) ... 27 more Caused by: java.io.EOFException: Read past end of file at org.jboss.marshalling.SimpleDataInput.eofOnRead(SimpleDataInput.java:126) at org.jboss.marshalling.SimpleDataInput.readUnsignedByteDirect(SimpleDataInput.java:263) at org.jboss.marshalling.SimpleDataInput.readUnsignedByte(SimpleDataInput.java:224) at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) at org.jboss.marshalling.AbstractObjectInput.readObject(AbstractObjectInput.java:37) at org.infinispan.marshall.jboss.GenericJBossMarshaller.objectFromObjectStream(GenericJBossMarshaller.java:191) at org.infinispan.marshall.VersionAwareMarshaller.objectFromObjectStream(VersionAwareMarshaller.java:191) ... 30 more The line at org.jboss.marshalling.river.RiverUnmarshaller.doReadObject(RiverUnmarshaller.java:209) suggests EOF has been reached while reading the lead byte of the object, not a partial object. This is consistent with StateTransferManagerImpl.generateTransactionLog() getting a timeout while trying to acquire the processing lock (at StateTransferManagerImpl.java:192) and closing the stream in the middle of the transaction log, not a transmission error. We could probably get rid of the exception in the logs by inserting another delimiter here. Back to the original problem, if this was a stream corruption issue I'd expect a lot more instances of deserializing errors because the length of the buffer was smaller/larger than the number of bytes following it and the next object to be deserialized from the stream found garbage instead. This looks to me more like an index segment has been created with size x on node A and also on node B, then it was updated with size y x on node A but only the metadata got to node B, the segment's byte array remained the same. I don't know anything about the Lucene directory implementation yet, so I have no idea if/how this could happen and I haven't been able to reproduce it on my machine. Is there a way to see the Jenkins test logs? Dan This looks like a suspicious correlation to me, as I think the reported errors are similar in nature. Cheers, Sanne On 18 Jun 2011, at 20:18, Sanne Grinovero wrote: Hello all, I'm not in state to fully debug the issue this week, but even though this failure happens in the Lucene Directory it looks like it's reporting an issue with Infinispan core: https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/90/org.infinispan$infinispan-lucene-directory/testReport/junit/org.infinispan.lucene/SimpleLuceneTest/org_infinispan_lucene_SimpleLuceneTest_testIndexWritingAndFinding/ In this test we're writing to the index, and then asserting on the expected state on both nodes, but while it is successful on the same node as the writer, it fails with java.io.IOException: Read past EOF on the second node. This exception can mean only one thing: the value, which is a buffer[], was not completely transferred to the second node, which seems quite critical as the caches are using sync. I can't reproduce the error locally, but it's not the first time it is reported by CI: builds 60, 62, 65 for example (and more) show the same testcase fail in the same manner. Cheers, Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list
Re: [infinispan-dev] Atomic operations and transactions
On Tue, Jul 5, 2011 at 12:23 PM, Galder Zamarreño gal...@redhat.com wrote: On Jul 4, 2011, at 11:25 AM, Sanne Grinovero wrote: I agree they don't make sense, but only in the sense of exposed API during a transaction: some time ago I admit I was expecting them to just work: the API is there, nice public methods in the public interface with javadocs explaining that that was exactly what I was looking for, no warnings, no failures. Even worse, all works fine when running a local test because how the locks currently work they are acquired locally first, so unless you're running such a test in DIST mode, and happen to be *not* the owner of the being tested key, people won't even notice that this is not supported. Still being able to use them is very important, also in combination with transactions: I might be running blocks of transactional code (like a CRUD operation via OGM) and still require to advance a sequence for primary key generation. This needs to be an atomic operation, and I should really not forget to suspend the transaction. Fair point. At first glance, the best way to deal with this is suspending the tx cos that guarantees the API contract while not forcing locks to be acquired for too long. I'd advice though that whoever works on this though needs to go over existing use cases and see if the end result could differ somehow if this change gets applied. If any divergences are found and are to be expected, these need to be thoroughly documented. I've gone through some cases and end results would not differ at first glance if the atomic ops suspend the txs. The only thing that would change would be the expectations of lock acquisition timeouts by atomic ops within txs. For example: Cache contains: k1=galder 1. Tx1 does a cache.replace(k1, galder, sanne) - suspends tx and applies change - k1=sanne now 2. Tx2 does a cache.replace(k1, galder, manik) - suspends tx and is not able to apply change 3. Tx2 commits 4. Tx1 commits End result: k1=sanne 1. Tx1 does a cache.replace(k1, galder, sanne) - acquires lock 2. Tx2 does a cache.replace(k1, galder, manik) - waits for lock 3. Tx2 rollback - times out acquiring lock 4. Tx1 commits - applies change End result: k1=sanne Galder, the big difference would be that with optimistic transactions you don't acquire the lock on the spot, so the tx will be rolled back if someone else modifies the key between your get and the prepare. We could make a compromise, instead of forcing the atomic operations to happen outside the transaction we could force them to always use pessimistic locking. Actually Mircea and I discussed this Friday evening too, but I forgot about it until now. After all Sanne has two use cases for atomic operations: sequences and reference counts. Sequences can and should happen outside transactions, but as we discussed on the train we could make each node/thread acquire a range of say 100 seq numbers at a time and remove the need for any locking to get a new sequence number. Reference counts on the other hand should remain inside the transaction, because you would have to to the refcount rollback by hand (plus I don't think you should let other transactions see the modified refcount before they see the new data). Dan Sanne 2011/7/4 Galder Zamarreño gal...@redhat.com: Do these atomic operations really make sense within an (optimitic) transaction? For example, putIfAbsent(): it stores a k,v pair if the key is not present. But the key about it's usability is that the return of putIfAbsent can tell you whether the put succeeded or not. Once you go into transactions, the result is only valid once the transaction has been prepared unless the pessimistic locking as per definition in http://community.jboss.org/docs/DOC-16973 is in use, and that's already pretty confusing IMO. I get the feeling that those atomic operations are particularly useful when transactions are not used cos they allow you to reduce to cache operations to one, hence avoiding the need to use a lock or synchronized block, or in our case, a transaction. On Jun 30, 2011, at 3:11 PM, Sanne Grinovero wrote: Hello all, some team members had a meeting yesterday, one of the discussed subjects was about using atomic operations (putIfAbsent, etc..). Mircea just summarised it in the following proposal: The atomic operations, as defined by the ConcurrentHashMap, don't fit well within the scope of optimistic transaction: this is because there is a discrepancy between the value returned by the operation and the value and the fact that the operation is applied or not: E.g. putIfAbsent(k, v) might return true as there's no entry for k in the scope of the current transaction, but in fact there might be a value committed by another transaction, hidden by the fact we're running in repeatable read mode. Later on, at prepare time when the same operation is applied on the node that actually holds k,
Re: [infinispan-dev] Atomic operations and transactions
On Tue, Jul 5, 2011 at 12:49 PM, Sanne Grinovero sa...@infinispan.org wrote: 2011/7/5 Dan Berindei dan.berin...@gmail.com: After all Sanne has two use cases for atomic operations: sequences and reference counts. Sequences can and should happen outside transactions, but as we discussed on the train we could make each node/thread acquire a range of say 100 seq numbers at a time and remove the need for any locking to get a new sequence number. Reference counts on the other hand should remain inside the transaction, because you would have to to the refcount rollback by hand (plus I don't think you should let other transactions see the modified refcount before they see the new data). To clarify my use case, I'm never going to need refcounts together to a running transaction. The refcount example was about why I need atomic counters, but this thread is about the dangers of exposing these atomic operations API which are broken when using transactions. O, I understand refcounts are not really the topic you wanted to talk about, but I'd really like to know why you don't need transactions with refcounts. I was assuming the refcount is a link from a key A in the cache to another key B, so adding another link would involve incrementing the counter as well as adding the key A. Or do you mean that refcount increment/decrement should be a special operation that never fails, so it could be done after the commit of the tx that added key A? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Atomic operations and transactions
On Tue, Jul 5, 2011 at 12:46 PM, Sanne Grinovero sa...@infinispan.org wrote: 2011/7/5 Galder Zamarreño gal...@redhat.com: On Jul 4, 2011, at 11:25 AM, Sanne Grinovero wrote: I agree they don't make sense, but only in the sense of exposed API during a transaction: some time ago I admit I was expecting them to just work: the API is there, nice public methods in the public interface with javadocs explaining that that was exactly what I was looking for, no warnings, no failures. Even worse, all works fine when running a local test because how the locks currently work they are acquired locally first, so unless you're running such a test in DIST mode, and happen to be *not* the owner of the being tested key, people won't even notice that this is not supported. Still being able to use them is very important, also in combination with transactions: I might be running blocks of transactional code (like a CRUD operation via OGM) and still require to advance a sequence for primary key generation. This needs to be an atomic operation, and I should really not forget to suspend the transaction. Fair point. At first glance, the best way to deal with this is suspending the tx cos that guarantees the API contract while not forcing locks to be acquired for too long. I'd advice though that whoever works on this though needs to go over existing use cases and see if the end result could differ somehow if this change gets applied. If any divergences are found and are to be expected, these need to be thoroughly documented. I've gone through some cases and end results would not differ at first glance if the atomic ops suspend the txs. The only thing that would change would be the expectations of lock acquisition timeouts by atomic ops within txs. For example: Cache contains: k1=galder 1. Tx1 does a cache.replace(k1, galder, sanne) - suspends tx and applies change - k1=sanne now 2. Tx2 does a cache.replace(k1, galder, manik) - suspends tx and is not able to apply change 3. Tx2 commits 4. Tx1 commits End result: k1=sanne Right. To clarify, this is what would happen with the current implementation: 1. Tx2 does a cache.get(k1) - it reads the value of k1, and is returned galder 2. Tx1 does a cache.replace(k1, galder, sanne) - k1=sanne in the scope of this transaction, but not seen by other tx 3. Tx2 does a cache.replace(k1, galder, manik) - k1=manik is assigned, as because of repeatable read we're still seeing galder 4. Tx2 Tx1 commit ..and the end result depends on who commits first. 1. Tx1 does a cache.replace(k1, galder, sanne) - acquires lock 2. Tx2 does a cache.replace(k1, galder, manik) - waits for lock 3. Tx2 rollback - times out acquiring lock 4. Tx1 commits - applies change End result: k1=sanne I'm not sure we're on the same line here. 1) should apply the operation right away, so even if it might very briefly have to acquire a lock on it, it's immediately released (not at the end of the transaction), so why would TX2 have to wait for it to the point it needs to rollback? I think it would make sense to make atomic operations pessimistic by default, so they would behave like in Galder's example. Then if you wanted to reduce contention you could suspend/resume the transaction around your atomic operations and make them behave like you're expecting them to. Here is a contrived example: 1. Start tx Tx1 2. cache.get(k) - v0 3. cache.replace(k, v0, v1) 4. gache.get(k) - ?? With repeatable read and suspend/resume around atomic operations, I believe operation 4 would return v0, and that would be very surprising for a new user. So I'd rather require explicit suspend/resume calls to make sure anyone who uses atomic operations in a transaction understands what results he's going to get. Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Atomic operations and transactions
On Tue, Jul 5, 2011 at 1:45 PM, Sanne Grinovero sa...@infinispan.org wrote: That's an interesting case, but I wasn't thinking about that. So it might become useful later on. The refcount scenario I'd like to improve first is about garbage collection of old unused index segments, we're counting references from open searchers and allow to finish to run queries on still open segments while they are being deleted: at the same time, these delete operations are run in batch operations in background threads, I couldn't possibly run them in a transaction as it would likely not have enough memory to complete it, and anyway they're run async to the rest of the application. So timing is not very critical, but having a wrong increment/decrement on the counter can cause many issues and so this must rely on atomic operations. Current implementation acquires a pessimistic lock on the integer, having a distributed AtomicInteger as discusses on the train would be a simple improvement. I think I got it now, your references are from application objects to cache keys, so you don't have anything else that you need to modify in the same transaction. You only have reads, unless the refcount reaches 0, in which case you also do a remove. It's clear to me now that the distributed AtomicInteger approach would be best in your case, so that means I haven't found a real-life problem that requires in-transaction atomic operations yet ;-) Dan Sanne Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Atomic operations and transactions
On Tue, Jul 5, 2011 at 1:39 PM, Sanne Grinovero sa...@infinispan.org wrote: 2011/7/5 Dan Berindei dan.berin...@gmail.com: Here is a contrived example: 1. Start tx Tx1 2. cache.get(k) - v0 3. cache.replace(k, v0, v1) 4. gache.get(k) - ?? With repeatable read and suspend/resume around atomic operations, I believe operation 4 would return v0, and that would be very surprising for a new user. So I'd rather require explicit suspend/resume calls to make sure anyone who uses atomic operations in a transaction understands what results he's going to get. The problem is that as a use case it makes no sense to use an atomic operation without evaluating the return value. so 3) should actually read like 3. boolean done = cache.replace(k, v0, v1) and based on this value, the application would branch in some way, and so acquiring locks and waiting for each other is not enough, we can only support this if write skew checks are enabled, and mandate the full operation to rollback in the end. That might be one option, but I really don't like to make it likely to rollback transactions, I'd prefer to have an alternative like a new flag which enforces a fresh read skipping the repeatable read guarantees. Of course this wouldn't work if we're not actually sending the operations to the key owners, so suspending the transaction is a much nicer approach from the user perspective. Though I agree this behaviour should be selectable. Ok, I'm slowly remembering your arguments... do you think the fresh read flag should be available for all operations, or does it make sense to make it an internal flag that only the atomic operations will use? To summarize, with this example: 1. Start tx 2. cache.get(k) - v0 3. cache.replace(k, v0, v1) 4. gache.get(k) - ?? 5. Commit tx The options we could support are: a. Tx suspend: no retries, but also no rollback for replace() and 4) will not see the updated value b. Optimistic locking + write skew check: if the key is modified by another tx between 2) and 5), the entire transaction has to be redone c. Optimistic locking + write skew check + fresh read: we only have to redo the tx if the key is modified by another tx between 3) and 5) d. Pessimistic locking: if the key is modified between 2) and 5), the entire transaction has to be redone e. Pessimistic locking + fresh read: no redo, but decreased throughput because we hold the lock between 3) and 5) I guess there is no reason to support option d), as we're making an RPC to the owner in order to get the lock anyway. I think I'm leaning towards supporting only a) and e), but there might be cases where b) and c) would perform better. Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Atomic operations and transactions
On Tue, Jul 5, 2011 at 4:04 PM, Sanne Grinovero sa...@infinispan.org wrote: 2011/7/5 Dan Berindei dan.berin...@gmail.com: On Tue, Jul 5, 2011 at 1:39 PM, Sanne Grinovero sa...@infinispan.org wrote: 2011/7/5 Dan Berindei dan.berin...@gmail.com: Here is a contrived example: 1. Start tx Tx1 2. cache.get(k) - v0 3. cache.replace(k, v0, v1) 4. gache.get(k) - ?? With repeatable read and suspend/resume around atomic operations, I believe operation 4 would return v0, and that would be very surprising for a new user. So I'd rather require explicit suspend/resume calls to make sure anyone who uses atomic operations in a transaction understands what results he's going to get. The problem is that as a use case it makes no sense to use an atomic operation without evaluating the return value. so 3) should actually read like 3. boolean done = cache.replace(k, v0, v1) and based on this value, the application would branch in some way, and so acquiring locks and waiting for each other is not enough, we can only support this if write skew checks are enabled, and mandate the full operation to rollback in the end. That might be one option, but I really don't like to make it likely to rollback transactions, I'd prefer to have an alternative like a new flag which enforces a fresh read skipping the repeatable read guarantees. Of course this wouldn't work if we're not actually sending the operations to the key owners, so suspending the transaction is a much nicer approach from the user perspective. Though I agree this behaviour should be selectable. Ok, I'm slowly remembering your arguments... do you think the fresh read flag should be available for all operations, or does it make sense to make it an internal flag that only the atomic operations will use? To summarize, with this example: 1. Start tx 2. cache.get(k) - v0 3. cache.replace(k, v0, v1) 4. gache.get(k) - ?? 5. Commit tx The options we could support are: a. Tx suspend: no retries, but also no rollback for replace() and 4) will not see the updated value might work, but looks like a crazy user experience. We could support it by letting the user suspend/resume the tx manually. Then the only people experiencing this would be the ones who explicitly requested it. b. Optimistic locking + write skew check: if the key is modified by another tx between 2) and 5), the entire transaction has to be redone might work as well, since people opted in for optimistic they should be prepared to experience failures. I'm not sure what the + stands for, how can you have optimistic locking without write skew checks? I was never sure in what situations we do the write skew check, so I thought I'd mention it to be clear. Certainly atomic operations would never work without write skew checks, but I don't think they're required for regular writes. c. Optimistic locking + write skew check + fresh read: we only have to redo the tx if the key is modified by another tx between 3) and 5) in this case we're breaking the repeatable read guarantees, so we should clarify this very well. Yes, it would definitely need to be documented that atomic operations always use read_committed if we go this way. d. Pessimistic locking: if the key is modified between 2) and 5), the entire transaction has to be redone I don't understand what's pessimistic about this? To be pessimistic it would attempt to guarantee success by locking at 2): during the get operation, before returning the value. Also if they key is modified implies write skew checks, so how would this be different than previous proposals? Generally as a user if I'm opting in for a pessimistic lock the only exception I'm prepared to handle is a timeout, definitely not a try again, the values changed. Sorry, I meant if the key is modified between 2) and 3), since 3) would not read the key again we'd have to check the value in the prepare phase. But you're right that it doesn't really make sense, since we're going to the owner to get the lock anyway we might as well check the value in one go. e. Pessimistic locking + fresh read: no redo, but decreased throughput because we hold the lock between 3) and 5) I assume you really mean to do explicit pessimistic locking: 1. Start tx 2. cache.lock(k); 3. cache.get(k) - v0 4. cache.replace(k, v0, v1) /// - throw an exception if we're not owning the lock 5. gache.get(k) - ?? 6. Commit tx No, I meant the replace call would do the locking itself and hold the lock until the end of the tx, just like a regular put. This option would be very similar to what we currently have, since any write command already acquires a lock on the key. The only difference would be that value check would be done on the main data owner, with the current value of the key instead of the value in the invocation context. One big gotcha is that we will only update the value in the invocation context if the replace succeeds, so the user will have
Re: [infinispan-dev] Faster LRU
On Tue, Jul 5, 2011 at 7:23 PM, Vladimir Blagojevic vblag...@redhat.com wrote: Hey guys, In the past few days I've look around how to squeeze every bit of performance out of BCHM and particularly our LRU impl. What I did not like about current LRU is that a search for an element in the queue is not constant time operation but requires full queue traversal if we need to find an element[1]. It would be be particularly nice to have a hashmap with a constant cost for look up operations. Something like LinkedHashMap. LinkedHashMap seems to be a good container for LRU as it provides constant time lookup but also a hook, a callback call for eviction of the oldest entry in the form of removeEldestEntry callback. So why not implement our segment eviction policy by using a LinkedHashMap [2]? +1 Vladimir, I had a similar idea to implement a BCHM segment entirely using a single LinkedHashMap - obviously that requires a lot more work to integrate with the existing BCHM then using a LHM inside the eviction policy, but it should also be more memory efficient. I've seen about 50% performance increase for smaller caches (100K) and even more for larger and more contended caches - about 75% increase. After this change BCHM performance was not that much worse than CHM and it was faster than synchronized hashmap. Should we include this impl as FAST_LRU as I would not want to remove current LRU just yet? We have to prove this one is correct and that it does not have have any unforeseen issues. I would definitely remove the old LRU, it's much harder to understand because of the batching and the JDK already has tests to guarantee that LinkedHashMap is correct. One idea for testing I was discussing with Galder on my pull request's page was to simulate a real cache workload, with get misses triggering a put and also a small delay, in order to evaluate how good an eviction policy is at keeping the most used keys in. We definitely need to *some* testing for this, we can't afford to wait for another community member to come in a year's time and prove to us that we're rubbish :) Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Faster LRU
On Tue, Jul 5, 2011 at 7:47 PM, Alex Kluge java_kl...@yahoo.com wrote: Hi, I have a completely in-line limit on the cache entries built on a clock cache that is an approximation to an LRU cache, and is extremely fast (O(1)). It is not a strick LRU, but chooses a not recently used item for removal. I'll provide some more details soon. I'm not sure how far along you are, as some of this is in the future tense, and some in the past. But I'll dig up this code - it's been a while since I worked on the cache. :) This sounds great Alex. We're not strictly LRU either because the policy is enforced at the map segment level, so we don't always evict the oldest element in the map. Maybe you also have a good cache/eviction policy test? ;-) Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Faster LRU
On Thu, Jul 7, 2011 at 1:21 PM, Manik Surtani ma...@jboss.org wrote: I think we leave the old LRU as LRU_OLD and mark it as deprecated. I for one am against keeping the old policy around, as the new LRU policy implements exactly the same algorithm only in O(1) instead of O(n). It would have made sense to keep it if there was a difference in the algorithm, but Vladimir even kept the batching from the old LRU policy. We should test it as thoroughly as we can, and to that end I've been working on some additions to MapStressTest that try to measure how good the eviction algorithm is. I think it already helped me find a problem with the new LRU. I've updated pull #414 (https://github.com/infinispan/infinispan/pull/414) to work on top of Vladimir's pull request, in case you want to have a look. You might want to adjust the number of keys and/or disable some of the options in the data providers before running it though, it takes a lot of time to run (and it also needs -Xmx2000m). I've left it running overnight on the test cluster (cluster01 and cluster10), I'll send an update with the results in the morning. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Faster LRU
On Fri, Jul 8, 2011 at 2:53 AM, Dan Berindei dan.berin...@gmail.com wrote: I've updated pull #414 (https://github.com/infinispan/infinispan/pull/414) to work on top of Vladimir's pull request, in case you want to have a look. You might want to adjust the number of keys and/or disable some of the options in the data providers before running it though, it takes a lot of time to run (and it also needs -Xmx2000m). I've left it running overnight on the test cluster (cluster01 and cluster10), I'll send an update with the results in the morning. Morning update: Ok, apparently -Xmx2000m wasn't enough for 2 million keys, so I had to start the tests again in the morning, running each scenario on a different machine. I haven't run the tests with concurrency level 512, as the total number of threads is only 100, but I suspect the old LRU still won't catch up with the new LRU's performance. It's interesting that in the writeOnMiss test the new LRU performance dropped when I increased the concurrency level from 32 to 128. I think it might be because the eviction.thresholdExpired() check in BCHM.attemptEviction() is done without a lock and so it could return true simultaneously for multiple threads - which will all proceed to wait on the segment lock and attempt eviction at the same time. Another strange pattern is that neither eviction policy respects the capacity parameter exactly. LIRS rounds up the capacity to the next power of 2, and LRU/LRUOld do the same rounding and then multiply by 0.75. I'll report again once I fixed these and once I update the reporting - I think the total number of misses might be more relevant than the standard deviation of the keys at the end. Cheers Dan Testing independent read/write/remove performance with 50 max elements, 100 keys, 32 concurrency level, 90 readers, 9 writers, 1 removers Container BCHM:LIRS Average get ops/s 17692.17 Average put ops/s 3871.06 Average remove ops/s 2043.00 Size 515661 stdDev 175722.99 Container BCHM:LRUAverage get ops/s 36023.40 Average put ops/s 1084.62 Average remove ops/s 1131.47 Size 393216 stdDev 145373.51 Container BCHM:LRU_OLDAverage get ops/s 849.86 Average put ops/s 273.88 Average remove ops/s244.41 Size 398940 stdDev 154739.53 Container CLHMAverage get ops/s 48761.40 Average put ops/s 47008.30 Average remove ops/s 49627.01 Size 50 stdDev 187244.21 Container SLHMAverage get ops/s 16866.18 Average put ops/s 23715.77 Average remove ops/s 15630.28 Size 50 stdDev 178134.07 [testng-MapStressTest] Test testReadWriteRemove(org.infinispan.stress.MapStressTest) succeeded. Test suite progress: tests succeeded: 1, failed: 0, skipped: 0. Testing independent read/write/remove performance with 50 max elements, 100 keys, 128 concurrency level, 90 readers, 9 writers, 1 removers Container BCHM:LIRS Average get ops/s 18868.45 Average put ops/s 6599.76 Average remove ops/s 3316.55 Size 521216 stdDev 175810.97 Container BCHM:LRUAverage get ops/s 32797.94 Average put ops/s 7976.01 Average remove ops/s 8463.70 Size 393113 stdDev 153926.63 Container BCHM:LRU_OLDAverage get ops/s 4093.76 Average put ops/s 1675.52 Average remove ops/s 1335.27 Size 389083 stdDev 152948.92 Container CLHMAverage get ops/s 68179.61 Average put ops/s 64678.23 Average remove ops/s 70212.72 Size 499968 stdDev 181346.12 Container SLHMAverage get ops/s 17893.75 Average put ops/s 24506.01 Average remove ops/s 14617.09 Size 491341 stdDev 181870.67 [testng-MapStressTest] Test testReadWriteRemove(org.infinispan.stress.MapStressTest) succeeded. Test suite progress: tests succeeded: 2, failed: 0, skipped: 0. Testing combined read/write performance with 50 max elements, 100 keys, 32 concurrency level, 100 threads, 9:1 read:write ratio Container BCHM:LIRS Average ops/s 12903.45 Size 524258 stdDev 170614.40 Container BCHM:LRUAverage ops/s 25041.20 Size 393216 stdDev 154386.88 Container BCHM:LRU_OLDAverage ops/s 545.07 Size 393243 stdDev 156098.32 Container CLHMAverage ops/s 45277.37 Size50 stdDev 169786.71 Container SLHMAverage ops/s 17522.45 Size50 stdDev 171034.64 [testng-MapStressTest] Test testReadWrite
Re: [infinispan-dev] Faster LRU
On Fri, Jul 8, 2011 at 12:44 PM, Dan Berindei dan.berin...@gmail.com wrote: It's interesting that in the writeOnMiss test the new LRU performance dropped when I increased the concurrency level from 32 to 128. I think it might be because the eviction.thresholdExpired() check in BCHM.attemptEviction() is done without a lock and so it could return true simultaneously for multiple threads - which will all proceed to wait on the segment lock and attempt eviction at the same time. I added another batch threshold check while holding the lock and the performance anomaly disappeared. Another strange pattern is that neither eviction policy respects the capacity parameter exactly. LIRS rounds up the capacity to the next power of 2, and LRU/LRUOld do the same rounding and then multiply by 0.75. I updated BCHM to pass the initial capacity to the eviction policy, and now all the policies keep the same number of entries. I'll report again once I fixed these and once I update the reporting - I think the total number of misses might be more relevant than the standard deviation of the keys at the end. I got the tests running on the test cluster again and the new LRU is clearly better in every respect than the old LRU. In fact the only problem in the test results is that in the new writeOnMiss scenario the hit ratio of the new LRU is much better than all the other policies. There is probably a mistake somewhere in the test, if it was a random thing I don't think it would have been so visible in a 20-minutes test: MapStressTest configuration: capacity 50, test running time 1200 seconds Container BCHM:LIRS Ops/s 11155.49 HitRatio 96.54 Size 499968 stdDev 193558.30 Container BCHM:LRU Ops/s 31292.06 HitRatio 97.84 Size 50 stdDev 193168.07 Container BCHM:LRU_OLD Ops/s 116.89 HitRatio 76.11 Size 500032 stdDev 197974.87 Testing write on miss performance with capacity 50, keys 200, concurrency level 32, threads 100 Container BCHM:LIRS Ops/s1684.01 HitRatio 63.13 Size 499968 stdDev 338637.40 Container BCHM:LRU Ops/s4884.57 HitRatio 84.47 Size 50 stdDev 353336.31 Container BCHM:LRU_OLD Ops/s 50.69 HitRatio 41.34 Size 500032 stdDev 361239.68 MapStressTest configuration: capacity 50, test running time 1200 seconds Testing independent read/write/remove performance with capacity 50, keys 100, concurrency level 32, readers 90, writers 9, removers 1 Container BCHM:LIRS Ops/s 14865.45 Gets/s 13971.36 Puts/s8424.86 Removes/s4640.39 HitRatio 83.10 Size 499972 StdDev 168386.87 Container BCHM:LRU Ops/s 35970.38 Gets/s 33800.61 Puts/s 19273.89 Removes/s 21815.97 HitRatio 78.64 Size 50 StdDev 192598.94 Container BCHM:LRU_OLD Ops/s 496.10 Gets/s 467.32 Puts/s 258.91 Removes/s 269.63 HitRatio 69.66 Size 500032 StdDev 182468.36 Container CLHM Ops/s 56074.13 Gets/s 50656.76 Puts/s 48274.11 Removes/s 53090.65 HitRatio 77.62 Size 50 StdDev 180395.29 Container SLHM Ops/s 20603.49 Gets/s 18110.83 Puts/s 22696.60 Removes/s 20080.24 HitRatio 80.12 Size 499401 StdDev 179188.12 [testng-MapStressTest] Test testReadWriteRemove(org.infinispan.stress.MapStressTest) succeeded. Test suite progress: tests succeeded: 1, failed: 0, skipped: 0. Testing independent read/write/remove performance with capacity 50, keys 100, concurrency level 128, readers 90, writers 9, removers 1 Container BCHM:LIRS Ops/s 23567.68 Gets/s 21558.38 Puts/s 19691.52 Removes/s3617.48 HitRatio 84.73 Size 498450 StdDev 167244.66 Container BCHM:LRU Ops/s 38313.89 Gets/s 35514.87 Puts/s 24921.89 Removes/s 27617.36 HitRatio 77.67 Size 500096 StdDev 193880.45 Container BCHM:LRU_OLD Ops/s2717.51 Gets/s2600.66 Puts/s1035.20 Removes/s1197.20 HitRatio 79.14 Size 499573 StdDev 180107.36 Container CLHM Ops/s 77609.22 Gets/s 70243.55 Puts/s 65605.74 Removes/s 72464.47 HitRatio 78.64 Size 499968 StdDev 182996.20 Container SLHM Ops/s 18812.45 Gets/s 16125.15 Puts/s 25319.02 Removes/s 14003.08 HitRatio 81.44 Size 496624 StdDev 183041.62 [testng-MapStressTest] Test testReadWriteRemove(org.infinispan.stress.MapStressTest) succeeded. Test suite progress: tests succeeded: 2, failed: 0, skipped: 0. Testing independent read/write/remove performance with capacity 50, keys 200, concurrency level 32, readers 90, writers 9, removers 1 Container BCHM:LIRS Ops/s 10579.80 Gets/s 10346.59 Puts/s2109.26 Removes/s2002.00 HitRatio 50.88 Size 499969 StdDev 245820.97 Container BCHM:LRU Ops/s 35180.30 Gets/s 34372.36 Puts/s6871.62 Removes/s 10867.85 HitRatio 50.08
Re: [infinispan-dev] Faster LRU
On Mon, Jul 11, 2011 at 7:29 PM, Vladimir Blagojevic vblag...@redhat.com wrote: On 11-07-08 5:44 AM, Dan Berindei wrote: I haven't run the tests with concurrency level 512, as the total number of threads is only 100, but I suspect the old LRU still won't catch up with the new LRU's performance. It's interesting that in the writeOnMiss test the new LRU performance dropped when I increased the concurrency level from 32 to 128. I think it might be because the eviction.thresholdExpired() check in BCHM.attemptEviction() is done without a lock and so it could return true simultaneously for multiple threads - which will all proceed to wait on the segment lock and attempt eviction at the same time. I am not sure about this Dan. I looked at this code for hours! I do not see how two threads can call eviction#execute() concurrently. Sorry I wasn't very clear, two threads can enter attemptEviction simultaneously, one will get the lock and perform eviction, the other will also try to get the lock and when it gets it it will proceed to call eviction#execute() again. So eviction#execute() is not called concurrently, but it is called twice when it should have been called only once, and I think this dilutes the advantages of batching. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] IllegalStateException when receiving invalidation messages on a stopped/stopping cache
TBH I'm not sure why the patch helped with the error messages in the log, I always thought that doing this would just replace the IllegalStateException with an InterruptedException (since do some more stuff after invalidation). Of course the best solution would be to not do rehashing at all when we know we're going to stop the entire cluster anyway (https://issues.jboss.org/browse/ISPN-1239). Dan On Thu, Jul 14, 2011 at 7:48 PM, Manik Surtani ma...@jboss.org wrote: Patch looks good. Go ahead and issue a pull req. On 14 Jul 2011, at 16:26, Sanne Grinovero wrote: Hello, I was looking into the many stacktraces being thrown from the testsuite, this one caught my attention (see below). Can't we just discard such errors? if the cache is stopping, or stopped already, we shouldn't really care for invalidations - especially stacktraces and exceptions being returned to the caller. This doesn't solve all the EOF exceptions I'm still experiencing, but it seems to make things a bit better.. I've hacked a solution which implies adding a method: boolean ignoreCommandOnStatus(ComponentStatus status); to the VisitableCommand interface, seems to work fine. Shall I open a JIRA and send a pull request? Details: https://github.com/Sanne/infinispan/commit/ed962ed72bc68765078b6a0f172b95ea1c07d485#L7L142 Cheers, Sanne 2011-07-14 15:16:20,940 ERROR [RebalanceTask] (Rehasher,Infinispan-Cluster,NonStringKeyStateTransferTest-NodeC-7649) ISPN000145: Error during rehash java.lang.IllegalStateException: Default cache is in 'STOPPING' state and this is an invocation not belonging to an on-going transaction, so it does not accept new invocations. Either restart it or recreate the cache container. at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:83) at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:64) at org.infinispan.commands.AbstractVisitor.visitInvalidateCommand(AbstractVisitor.java:120) at org.infinispan.commands.AbstractVisitor.visitInvalidateL1Command(AbstractVisitor.java:124) at org.infinispan.commands.write.InvalidateL1Command.acceptVisitor(InvalidateL1Command.java:177) at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:274) at org.infinispan.distribution.RebalanceTask.invalidateKeys(RebalanceTask.java:172) at org.infinispan.distribution.RebalanceTask.performRehash(RebalanceTask.java:145) at org.infinispan.distribution.RehashTask.call(RehashTask.java:67) at org.infinispan.distribution.RehashTask.call(RehashTask.java:44) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] IllegalStateException when receiving invalidation messages on a stopped/stopping cache
I was surprised because I would have thought your change would replace each IllegalStateException in the log with an InterruptedException, so I wouldn't have expected the log to be any less noisy. I mean you don't stop the rehashing during invalidation, you just skip invalidation and go to the next phase (notifyCoordinatorPushCompleted), which I thought should throw an InterruptedException. True, we could ignore InterruptedExceptions in RehashTask and not log anything, but based only on this change I was expecting the same number of exceptions in the log. Good to know that I was wrong :) Dan On Fri, Jul 15, 2011 at 1:46 PM, Sanne Grinovero sa...@infinispan.org wrote: Sorry for not being clear - I'm not assuming that this has anything to do with the InterruptedException/EOFs being still logged, just that the output is less noisy if I remove these which where pointless. Sanne 2011/7/14 Dan Berindei dan.berin...@gmail.com: TBH I'm not sure why the patch helped with the error messages in the log, I always thought that doing this would just replace the IllegalStateException with an InterruptedException (since do some more stuff after invalidation). Of course the best solution would be to not do rehashing at all when we know we're going to stop the entire cluster anyway (https://issues.jboss.org/browse/ISPN-1239). Dan On Thu, Jul 14, 2011 at 7:48 PM, Manik Surtani ma...@jboss.org wrote: Patch looks good. Go ahead and issue a pull req. On 14 Jul 2011, at 16:26, Sanne Grinovero wrote: Hello, I was looking into the many stacktraces being thrown from the testsuite, this one caught my attention (see below). Can't we just discard such errors? if the cache is stopping, or stopped already, we shouldn't really care for invalidations - especially stacktraces and exceptions being returned to the caller. This doesn't solve all the EOF exceptions I'm still experiencing, but it seems to make things a bit better.. I've hacked a solution which implies adding a method: boolean ignoreCommandOnStatus(ComponentStatus status); to the VisitableCommand interface, seems to work fine. Shall I open a JIRA and send a pull request? Details: https://github.com/Sanne/infinispan/commit/ed962ed72bc68765078b6a0f172b95ea1c07d485#L7L142 Cheers, Sanne 2011-07-14 15:16:20,940 ERROR [RebalanceTask] (Rehasher,Infinispan-Cluster,NonStringKeyStateTransferTest-NodeC-7649) ISPN000145: Error during rehash java.lang.IllegalStateException: Default cache is in 'STOPPING' state and this is an invocation not belonging to an on-going transaction, so it does not accept new invocations. Either restart it or recreate the cache container. at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:83) at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:64) at org.infinispan.commands.AbstractVisitor.visitInvalidateCommand(AbstractVisitor.java:120) at org.infinispan.commands.AbstractVisitor.visitInvalidateL1Command(AbstractVisitor.java:124) at org.infinispan.commands.write.InvalidateL1Command.acceptVisitor(InvalidateL1Command.java:177) at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:274) at org.infinispan.distribution.RebalanceTask.invalidateKeys(RebalanceTask.java:172) at org.infinispan.distribution.RebalanceTask.performRehash(RebalanceTask.java:145) at org.infinispan.distribution.RehashTask.call(RehashTask.java:67) at org.infinispan.distribution.RehashTask.call(RehashTask.java:44) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] State transfer is transferring the same data twice
Hi guys While fixing https://issues.jboss.org/browse/ISPN-1243 I found that in certain cases state transfer will copy the in-memory data twice: first when copying the in-memory data and then again while copying the persistent data. I was getting duplicate key exceptions in StateTransferManagerImpl.applyPersistentState(), so I changed StateTransferManagerImpl.applyInMemoryState() to skip writing to the cache store if persistent state transfer is enabled and passivation is disabled. That seems to work, but I'm wondering if there are other cases that I missed. For 5.1, since we're going to use the rehashing code, we can check if a key is in memory and not include it again in the persistent state, but that seems too intrusive to include in 5.0. WDYT? Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] BeanUtils.getterMethod
On Thu, Aug 4, 2011 at 2:03 AM, Sanne Grinovero sa...@infinispan.org wrote: 2011/8/3 Dan Berindei dan.berin...@gmail.com: I think it's supposed to be an automatic version of https://github.com/infinispan/infinispan/blob/master/core/src/main/java/org/infinispan/config/GlobalConfiguration.java#L326 I prefer the manual version in GlobalConfiguration, although I don't think there are any users for those components either. Dan I was thinking something similar, but would like to know if it's still used, somebody must know about this code? Initially I thought it was some clever way to extract instances from a user-customizable configuration instance, but since it's reflecting on the static definition of Configuration it doesn't look like it's meant to support extensions of it. If it's used we could cache the references to existing methods and track also which methods don't exist, so we limit the number of exceptions to a single one per attempted method name (i.e. starting 500 caches would cost as starting one). Even a custom Configuration could register its objects in the component registry manually in an @Inject method, like GlobalConfiguration does. But I don't see any good reason for injecting bits of the configuration instead of injecting the whole thing and navigating the configuration tree in the component code. So I'd go with your initial suggestion and remove it. Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] Starting caches in parallel
Hi guys I've found a deadlock with transactions spanning multiple caches during rehashing if the joiner's caches are started sequentially (for more details see https://gist.github.com/1124740) After discussing a bit on IRC with Manik and Galderz it appears the only solution for 5.0.0.FINAL would be to have a mechanism to start all the caches in parallel. There are several options to implement that. All of them require the users to know that they should create the caches in parallel at application startup to avoid problems. 1. Advise the users about the problem, but let them create their own threads and call EmbeddedCacheManager.getCache() on those threads. 2. Add a method EmbeddedCacheManager.getCaches() to start multiple caches in parallel. This is not as straightforward as it may seem, first there is a question of whether to use template parameters or not: 2.a. SetCache getCaches(String... cacheNames); vs 2.b. SetCacheK, V getCaches(String... cacheNames); I don't think having the same K and V for all the caches is going to be very common, so I'd go with 2.a. Then there is the problem of how to request the default cache. I think should be fine, but we'd need to document it and also change the regular getCache() methods to accept it. 3. Add an async version of EmbeddedCacheManager.getCache(): FutureCacheK, V getCacheAsync(String cacheName); FutureCacheK, V getCacheAsync(); This nicely sidesteps the generics issue in solution 2 and it's also easier for the users than solution 1, so it's currently my favourite. What do you think? Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Starting caches in parallel
On Thu, Aug 4, 2011 at 2:01 PM, Sanne Grinovero sa...@infinispan.org wrote: 2011/8/4 Dan Berindei dan.berin...@gmail.com: Hi guys I've found a deadlock with transactions spanning multiple caches during rehashing if the joiner's caches are started sequentially (for more details see https://gist.github.com/1124740) After discussing a bit on IRC with Manik and Galderz it appears the only solution for 5.0.0.FINAL would be to have a mechanism to start all the caches in parallel. There are several options to implement that. All of them require the users to know that they should create the caches in parallel at application startup to avoid problems. 1. Advise the users about the problem, but let them create their own threads and call EmbeddedCacheManager.getCache() on those threads. 2. Add a method EmbeddedCacheManager.getCaches() to start multiple caches in parallel. This is not as straightforward as it may seem, first there is a question of whether to use template parameters or not: 2.a. SetCache getCaches(String... cacheNames); vs 2.b. SetCacheK, V getCaches(String... cacheNames); It turns out the proper generic version should be MapString, Cache? extends Object, ? extends Object createCaches(String...), but even this version produces unchecked warnings. We'd also had to make the return type a map, otherwise the user has no way of getting a specific cache. I don't think having the same K and V for all the caches is going to be very common, so I'd go with 2.a. Then there is the problem of how to request the default cache. I think should be fine, but we'd need to document it and also change the regular getCache() methods to accept it. You could use org.infinispan.manager.CacheContainer.DEFAULT_CACHE_NAME. I thought it was something hidden from the users, I wasn't aware it's in the public API. This is perfect. 3. Add an async version of EmbeddedCacheManager.getCache(): FutureCacheK, V getCacheAsync(String cacheName); FutureCacheK, V getCacheAsync(); This nicely sidesteps the generics issue in solution 2 and it's also easier for the users than solution 1, so it's currently my favourite. What do you think? I think it's very bad but it doesn't surprise me, since starting more than a single cache in sequence brings you in the scenario of ISPN-658, unless you're quick enough. So yes it's not nice, but we highlighted the problem since long I think? The workaround for ISPN-658 is to start all caches on application startup. This scenario appeared specifically because I was trying to avoid an asymmetric cluster and moved the creation of all caches to a central location. Are you proposing a temporary API to make things work before ISPN-658 is solved? I don't like the Future approach, it's still unclear that I have to send all requests before blocking on any get. The fix for ISPN-658 will most likely prevent existing nodes from sending request to a joiner until that joiner has finished the initial rehash, so I presume it will make my scenario impossible. There is another JIRA planned for 5.1 that is targeted more specifically as a workaround for ISPN-658: allow InboundInvocationHandler to create caches automatically when it has an incoming command for that cache (can't find the id now). I'd make a void startCaches(String... names); which implicitly includes the default cache too, and you throw an exception if getCache() is used on an unstarted cache, and unfortunately you should also throw an exception if startCaches() is invoked more than once. I've had a long chat on IRC about this with Sanne and Pete, and our conclusion was that adding a startCaches method is the clearest option for the users. The intention of this API is only to start the caches, so the name should clearly state that. It will also eliminate an uncertainty for users: at the end of startCaches call they will know that the caches have started and that further getCache calls won't fail. We can't afford to break getCache() for users though (including ourselves), so getCache() will still start the cache. It should still print a warning when doing so, directing people to call startCaches instead. I don't think we should throw an exception if startCaches is invoked more than once either to match getCache() and all the other components. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Why DistTxInterceptor in use with Hot Rod ?
Going back to your original question Galder, the exception is most likely thrown because of this sequence of events: 0. Given a cluster {A, B}, a key k and a node C joining. 1. Put acquires the transaction lock on node A (blocking rehashing) 2. Put acquires lock for key k on node A 3. Rehashing starts on node B, blocking transactions 4. Put tries to acquire transaction lock on node B Since it's impossible to finish rehashing while the put operation keeps the transaction lock on node A, the best option was to kill the put operation by throwing a RehashInProgressException. I was thinking in the context of transactions when I wrote this code (see https://github.com/danberindei/infinispan/commit/6ed94d3b2e184d4a48d4e781db8d404baf5915a3, this scenario became just a footnote to the generic case with multiple caches), but the scenario also occurs without transactions. Actually I renamed it state transfer lock and I moved it to a separate interceptor in my ISPN-1194 branch. Maybe the locking changes in 5.1 will eliminate this scenario, but otherwise we could improve the user experience by retrying the command after the rehashing finishes. Dan On Tue, Sep 13, 2011 at 8:11 PM, Galder Zamarreño gal...@redhat.com wrote: On Sep 13, 2011, at 6:38 PM, Mircea Markus wrote: On 13 Sep 2011, at 17:22, Galder Zamarreño wrote: Hi, I'm looking at this failure http://goo.gl/NQw4h and I'm wondering why org.infinispan.distribution.RehashInProgressException: Timed out waiting for the transaction lock is thrown? This is thrown DistTxInterceptor which is added by the InterceptorChainFactory: if (configuration.getCacheMode().isDistributed()) interceptorChain.appendInterceptor(createInterceptor(DistTxInterceptor.class)); else interceptorChain.appendInterceptor(createInterceptor(TxInterceptor.class)); However, this is a Hot Rod test and the cache is not configured with a transaction manager, so is DistTxInterceptor really needed? In fact, why a TxInterceptor if no transaction manager is configured? (this kinda goes back to Mircea's email earlier today about a transactions enabled/disabled flag) All valid concerns, IMO. 5.1 will clarify this by not adding Tx interceptors for non tx caches. Is this captured in some other JIRA? I'd like to reference the test and the JIRA from ISPN-1123 (stabilising testsuite jira) Cheers, -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Why DistTxInterceptor in use with Hot Rod ?
On Thu, Sep 15, 2011 at 12:25 PM, Galder Zamarreño gal...@redhat.com wrote: On Sep 14, 2011, at 11:40 AM, Dan Berindei wrote: Going back to your original question Galder, the exception is most likely thrown because of this sequence of events: 0. Given a cluster {A, B}, a key k and a node C joining. 1. Put acquires the transaction lock on node A (blocking rehashing) 2. Put acquires lock for key k on node A 3. Rehashing starts on node B, blocking transactions 4. Put tries to acquire transaction lock on node B Since it's impossible to finish rehashing while the put operation keeps the transaction lock on node A, the best option was to kill the put operation by throwing a RehashInProgressException. I was thinking in the context of transactions when I wrote this code (see https://github.com/danberindei/infinispan/commit/6ed94d3b2e184d4a48d4e781db8d404baf5915a3, this scenario became just a footnote to the generic case with multiple caches), but the scenario also occurs without transactions. Actually I renamed it state transfer lock and I moved it to a separate interceptor in my ISPN-1194 branch. Right, but it shouldn't happen when transactions are off, right? It will still happen with transactions off, because state transfer will acquire the state transfer lock in exclusive mode and WriteCommands will acquire the state transfer lock in shared mode instead of PrepareCommands with transactions on. If we don't acquire the state transfer lock than we run the risk of pushing stale data to other nodes, are you saying that without transactions enabled this risk is acceptable? Maybe the locking changes in 5.1 will eliminate this scenario, but otherwise we could improve the user experience by retrying the command after the rehashing finishes. I'd prefer that, otherwise users are likely to code similar logic which is not nice. After looking at the scenario again it became clear that step 2. is not necessary at all and the locking enhancements will not change anything. We could change DistributionInterceptor to not hold the state transfer lock during remote calls, which will allow rehashing to proceed on the origin. But then the ownership of the keys might change during the call, so we'll need a retry phase anyway. Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Versioned entries - overview of design, looking for comments
On Wed, Sep 14, 2011 at 7:48 PM, Sanne Grinovero sa...@infinispan.org wrote: Wouldn't the node performing the operation always do an RPC anyway iff the intended operation is to replace a specific value? I was thinking along the same lines, if there is no value in the InvocationContext (L0 cache?) then there's no point in doing any write skew check. Examples: - If I do a put() operation which doesn't skip the return value, the RPC has to be perfomed, we get the current version value which is what we will check to be unchanged when committing the write operation. This includes putIfAbsent, and all other atomic operations, which are of common use and all need an RPC anyway. This means that L1 caches will contain the version number as well. - If I do a put() operation in which I skip the return value, or a remove() without any check (nor lock), it seems the user intention is to overwrite/delete whatever was there without any further check at commit time. In fact this doesn't acquire the optimistick lock. I see a problem with this reasoning, as either with Jokre or with the forthcoming void put() operation in JSR-107 the user will not make a conscious decision to skip the return value. But I still don't think the write skew check is useful if the value was not previously read by the user in that transaction. - any read operation will need an RPC anyway, (If relevant: I guess for this case we could have different options if to rollback or proceed when detecting stale reads at commit time.) I thought the write skew check only applied to write operations. But it would be nice to force a write skew check on a key without a write (like an optimistic equivalent of lock()). - any lock() operation will force such an RPC. I don't think I understood the difference between #1 - #2. In both cases the writing node needs to retrieve the version information, and the key owner will perform the version increment at commit time, if the skew check is happy. The way I understood it #1 forces the remote get even if the user didn't request it, but as you said it could just skip the write skew check in that case. #2 looks to me like it doesn't fetch the remote version at all, but I'm having trouble understanding how the owner will perform the write skew check. Dan Sanne On 14 September 2011 16:03, Manik Surtani ma...@jboss.org wrote: So I've been hacking on versioned entries for a bit now, and want to run the designs by everyone. Adding an EntryVersion to each entry is easy, making this optional and null by default easy too, and a SimpleVersion a wrapper around a long and a PartitionTolerantVersion being a vector clock implementation. Also easy stuff, changing the entry hierarchy and the marshalling to ensure versions - if available - are shipped, etc. Comparing versions would happen in Mircea's optimistic locking code, on prepare, when a write skew check is done. If running in a non-clustered environment, the simple object-identity check we currently have is enough; otherwise an EntryVersion.compare() will need to happen, with one of 4 possible results: equal, newer than, older than, or concurrently modified. The last one can only happen if you have a PartitionTolerantVersion, and will indicate a split brain and simultaneous update. Now the hard part. Who increments the version? We have a few options, all seem expensive. 1) The modifying node. If the modifying node is a data owner, then easy. Otherwise the modifying node *has* to do a remote GET first (or at least a GET_VERSION) before doing a PUT. Extra RPC per entry. Sucks. 2) The data owner. This would have to happen on the primary data owner only, and the primary data owner would need to perform the write skew check. NOT the modifying node. The modifying node would also need to increment and ship its own NodeClock along with the modification. Extra info to ship per commit. I'm guessing we go with #2, but would like to hear your thoughts. Cheers Manik -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] TEST_PING on a Mac?
I've seen the same thing on Linux but a lot less often, about 4 times in the whole test suite with DEBUG logging. I didn't see it at all when I switched TRACE logging on. Dan On Mon, Sep 26, 2011 at 7:22 PM, Manik Surtani ma...@jboss.org wrote: Guys - has anyone seen this? Happens on many tests when I run on OSX, but works fine on Linux. --- T E S T S --- Running TestSuite Transaction manager used: JBossTM Transport protocol stack used = tcp 2011-09-26 18:19:25,724 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:19:33,241 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:19:40,753 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:19:48,267 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:19:55,784 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:20:03,316 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:20:11,156 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:20:19,298 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:20:26,833 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:20:34,347 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:20:41,859 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:20:49,366 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:20:56,884 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:21:04,416 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying 2011-09-26 18:21:12,249 WARN [GMS] (testng-TxAndInvalidationTimeoutTest) join(TxAndInvalidationTimeoutTest-NodeB-47469) sent to TxAndInvalidationTimeoutTest-NodeA-11323 timed out (after 7000 ms), retrying -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] ISPN-1384 - InboundInvocationHandlerImpl should wait for cache to be started? (not just defined)
On Tue, Sep 27, 2011 at 5:59 PM, Galder Zamarreño gal...@redhat.com wrote: Hi, Re: https://issues.jboss.org/browse/ISPN-1384 I've had a look to this and this race condition could, in theory, be resolved by making InboundInvocationHandlerImpl.handle() waiting for cache not only to be defined, but to be started. Otherwise there'll always be a potential race condition like the one showed in the log. Actually I think it would be enough to wait until the cache has started joining (StateTransferManager.waitForJoinToStart()), that means all the other components have finished starting. With the asymmetric cache support in place I think we shouldn't have to wait at all, since we'll send the join request only after all the components have been started. We could either apply the commands (if we implement the non-blocking state transfer option) or reject them and tell the originator to retry after the state transfer is done (if we keep the blocking state transfer, since the other nodes shouldn't be sending commands to us anyway). In this particular case, this is clustered get command being received from a clustered cache loader, which is arriving in the cache before this is started (and the cache loader has been created, hence the NPE). Another question, is there any reason why CacheLoader is not a named cache component which can be initalised with a corresponding factory and to which other components can be injected (i.e. marshaller, cache...etc)? In this particular case, this would also resolve the issue because ClusterCacheLoader.start() does nothing, so all the interceptor needs is a proper instance of ClusterCacheLoader available. The factory makes these available bejore inject. Thoughts? Cheers, p.s. Dan, I am aware of https://issues.jboss.org/browse/ISPN-1324, maybe you're solving this indirectly with the work for that JIRA? With asymmetric caches in place I don't think we'll need/want ISPN-1324 any more. Cheers Dan -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Asymmetric caches and manual rehashing design
On Wed, Sep 28, 2011 at 12:59 PM, Bela Ban b...@redhat.com wrote: My 5 cents: Virtual Cache Views: - The REQUEST_VIEW_UPDATE is a good idea to manually trigger flushing of pending views. However, I'm not sure what the goal of DISABLE_VIEW_UPDATES is. Since I assume this is for the entire cluster, can 1 member disable view delivery for the entire cluster ? What scenario is this for ? Indeed, one member could disable view delivery for the entire cluster. The main scenario I had in mind for this was stopping the entire cluster. A good rebalancing policy might handle such cases automatically, so I'm not 100% sure we need this. - Are you clubbing (virtual) view updates and rebalancing together ? And if so (I should probably read on first...), can't you have view installations *without* rebalancing ? I'm not sure what benefits you would get from joining the cache view but not receiving state compared to not sending the join request at all. Non-members are allowed to send/receive commands, so in the future we could even have separate server nodes (that join the cache view) and client nodes (joining the JGroups cluster to send commands, but not the cache view and so not holding any data except L1). My idea was that the cache view was a representation of the caches that are able to service requests, so it doesn't make sense to include in the view caches that don't hold data. As a practical note, if we add nodes to the consistent hash as they come along, a key will have one set of owners if written to before the join and another set of owner if written to after the join. How should another node determine the current location of the key, especially considering that there can be many view changes before rebalancing is triggerred? - Do we need the complex PREPARE_VIEW / ROLLBACK_VIEW / COMMIT_VIEW 2PC handling ? This adds a lot of complexity. Is it only used when we have a transactional cache ? Nope, this doesn't have anything to do with transactional caches, instead it is all about computing the owner that will push the key during the rebalance operation. In order to do it deterministically we need to have a common last good consistent hash for the last rebalance that finished successfully, and each node must determine if it should push a key or not based on that last good CH. A rebalance operation can also fail for various reasons (e.g. the coordinator died). If that happens the new owners won't have all the state, so they should not receive requests for the state that they would have had in the pending CH. - State is to be transferred *within* this 2PC time frame. Hmm, again, this ties rebalancing and view installation together (see my argument above)... If view installation wasn't tied to state transfer then we'd have to keep yet the last rebalanced view somewhere else. We would hold the last pending view (pending rebalance, that is) in the CacheViewsManager and the last rebalanced view in another component, and that component would have it's own mechanism for synchronizing the last rebalanced view among cache members. So I think the 2PC approach in CacheViewsManager actually simplifies things. On 9/27/11 6:22 PM, Dan Berindei wrote: Following the discussions last week I've written up a wiki page describing the strategies for cache view management and state transfer that will enable asymmetric caches and manual rehashing: http://community.jboss.org/wiki/AsymmetricCachesAndManualRehashingDesign The state transfer part is not very detailed, as you'll see we want to go with a non-blocking approach but I'm not sure we can finish that for 5.1 so we're keeping a blocking fallback option. Your comments are welcome, either on the list or on the wiki page. -- Bela Ban Lead JGroups (http://www.jgroups.org) JBoss / Red Hat ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Asymmetric caches and manual rehashing design
On Sep 28, 2011 5:24 PM, Mircea Markus mircea.mar...@jboss.com wrote: Nice stuff. Some q: - When a node CM that has c1 and c2 caches running on it goes down, you'll end up having REQUEST_LEAVE(c1) and REQUEST_LEAVE(c2) at the same time. Is this handled in serial or in parallel? In parallel, although that shouldn't matter because the rebalance is triggerred asynchronously. - The view updates can be disabled dynamically (DISABLE_VIEW_UPDATES). Shouldn't we also be able to re-enabled them manually after that? The view accept handler updates the pending view acquires the view change lock in exclusive mode, then releases it immediately Shouldn't this be: - acquire exclusive lock - updates the pending view - release lock ? Right, what I meant to say is that it doesn't hold the lock for the entire duration of the rebalance task. On 27 Sep 2011, at 17:22, Dan Berindei wrote: Following the discussions last week I've written up a wiki page describing the strategies for cache view management and state transfer that will enable asymmetric caches and manual rehashing: http://community.jboss.org/wiki/AsymmetricCachesAndManualRehashingDesign The state transfer part is not very detailed, as you'll see we want to go with a non-blocking approach but I'm not sure we can finish that for 5.1 so we're keeping a blocking fallback option. Your comments are welcome, either on the list or on the wiki page. Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Asymmetric caches and manual rehashing design
On Thu, Sep 29, 2011 at 10:01 AM, Bela Ban b...@redhat.com wrote: On 9/28/11 1:48 PM, Dan Berindei wrote: On Wed, Sep 28, 2011 at 12:59 PM, Bela Banb...@redhat.com wrote: My 5 cents: - Are you clubbing (virtual) view updates and rebalancing together ? And if so (I should probably read on first...), can't you have view installations *without* rebalancing ? I'm not sure what benefits you would get from joining the cache view but not receiving state compared to not sending the join request at all. Non-members are allowed to send/receive commands, so in the future we could even have separate server nodes (that join the cache view) and client nodes (joining the JGroups cluster to send commands, but not the cache view and so not holding any data except L1). I had the scenario in mind where you join 100 members and only *then* do a state transfer (rebalancing). My idea was that the cache view was a representation of the caches that are able to service requests, so it doesn't make sense to include in the view caches that don't hold data. OK. So with periodic rebalancing, you'd hold (virtual) views and state *until* the trigger fires, which then installs the new virtual views and rebalances the state ? In this case, tying view delivery and rebalancing together makes sense... Yes, I'm installing the views only when the trigger fires, and installing the views also will also include transferring state. So the 100 members might ask the coordinator to join the virtual view, but they will only actually join (and receive state) when the new view that includes them is installed. - Do we need the complex PREPARE_VIEW / ROLLBACK_VIEW / COMMIT_VIEW 2PC handling ? This adds a lot of complexity. Is it only used when we have a transactional cache ? Nope, this doesn't have anything to do with transactional caches, instead it is all about computing the owner that will push the key during the rebalance operation. In order to do it deterministically we need to have a common last good consistent hash for the last rebalance that finished successfully, and each node must determine if it should push a key or not based on that last good CH. OK. I just hope this makes sense for large clusters, as it is a 2PC, which doesn't scale to a larger number of nodes. I mean, we don't use FLUSH in large clusters for the same reason. Hmm, on the upside, you don't run this algorithm a lot though, so maybe running it only a few times amortizes the cost of it. Yeah, it is kind of bad with large clusters, at least with the blocking state transfer we have now. But in order to ensure we don't lose data we need to serialize the rebalance operations, and I don't think we can do that without 2PC. If we switch to the non-blocking state transfer proposal it shouldn't matter any more, the installation of the view will be delayed until we get the consensus but transactions will still be able to proceed while a view is in the prepare phase. With this algorithm, I assume you won't need the transitory view anymore (UnionConsistentHashFunction or whatever it was called), which includes both current and new owners of a key ? We are going to need something similar: since we're waiting for the coordinator to tell us to install a new CH, our CH may contain nodes that are already stopped. So we will have to check the addresses returned by the CH against the list of nodes still running. But other than that we are just going to use the CH of the latest committed view (with blocking state transfer) or the CH of the latest pending view (with non-blocking state transfer). With blocking state transfer we just know that the old owners will hold the key until the new view is committed. The non-blocking state transfer proposal works by letting the new owners fetch the key from the old owners, so there is no need for the callers to fall back to the previous owners. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Why no use async listener executor?
Galder, I didn't know about @Listener(sync = false), but a separate executor makes sense for state transfer because I didn't want concurrent state transfers and I could set it up to automatically cancels any state transfer task that's waiting in the queue. For the StaleTransactionCleanup I suppose we could use an async listener but it would definitely need a bigger thread pool because the user might have async listeners as well. If only you'd have sent this email one day before, I've been been trying to reuse the async transport executor in my cache views manager, but I didn't know it's also configured to use 1 thread max and I spent a good part of yesterday trying to understand what's going on :) I eventually moved on to create my own executor because I also wanted to name my threads better and I couldn't find a simple way to include the node name in the async transport executor's thread names, but I'm pretty sure 1 is not a reasonable default for the async transport either. Cheers Dan On Wed, Oct 5, 2011 at 6:53 PM, Galder Zamarreño gal...@redhat.com wrote: Hey guys, Re: https://issues.jboss.org/browse/ISPN-1396 While looking into this, I've discovered that we have been creating executors in cache level components, where we're calling submit from @Listener implementations. For example, BaseStateTransferManagerImpl.ViewChangeListener submits a rehash task and TransactionTable.StaleTransactionCleanup submits a tasks to break locks. Did the people that wrote these listeners (Dan Mircea respectively) consider defining listeners to be called asynchronously? i.e. @Listener(sync = false) Unless you have very strict requirements, the async listener executor should just work, and would lead to reuse of executors which results in lower consumption. The same applies to the singleton store executor btw. Surely, if we were to expand on to other use cases, async notification executor should have more than 1 max thread by default, otherwise we could easily hold up tasks. Cheers, -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Infinispan 5.1 + AS 7.1
Hi Paul Yeah, I have changed replicated mode's state transfer to transfer state in parallel from all the existing nodes and I'm using a consistent hash to determine which keys should be pushed by each node. This one looks tricky, the transport reports 0 members but it should always have at least the current node as a member. I'll try to run with REPL_ASYNC to see if it changes anything, but I don't think it should. Cheers Dan On Fri, Oct 14, 2011 at 1:27 AM, Paul Ferraro paul.ferr...@redhat.com wrote: Hey all, I'm a bit stuck with the Infinispan 5.1 upgrade in AS 7.1. I've tried both with BETA1 and a SNAPSHOT build from today. When a deployment forces a cache to start, I see the following stacktrace: 14:06:07,584 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-8) MSC1: Failed to start service jboss.infinispan.web.repl: org.jboss.msc.service.StartException in service jboss.infinispan.web.repl: Failed to start service at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1780) [jboss-msc-1.0.1.GA.jar:1.0.1.GA] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [:1.6.0_23] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [:1.6.0_23] at java.lang.Thread.run(Thread.java:679) [:1.6.0_23] Caused by: org.infinispan.CacheException: Unable to invoke method private void org.infinispan.statetransfer.BaseStateTransferManagerImpl.start() throws java.lang.Exception on object at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205) at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:825) at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:624) at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:527) at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:177) at org.infinispan.CacheImpl.start(CacheImpl.java:462) at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:574) at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:453) at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:467) at org.jboss.as.clustering.infinispan.DefaultEmbeddedCacheManager.getCache(DefaultEmbeddedCacheManager.java:84) at org.jboss.as.clustering.infinispan.DefaultEmbeddedCacheManager.getCache(DefaultEmbeddedCacheManager.java:73) at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:73) at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1824) [jboss-msc-1.0.1.GA.jar:1.0.1.GA] at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1759) [jboss-msc-1.0.1.GA.jar:1.0.1.GA] ... 3 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [:1.6.0_23] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [:1.6.0_23] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [:1.6.0_23] at java.lang.reflect.Method.invoke(Method.java:616) [:1.6.0_23] at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203) ... 16 more Caused by: java.lang.IllegalArgumentException: Invalid cache list for consistent hash: [] at org.infinispan.distribution.ch.AbstractWheelConsistentHash.setCaches(AbstractWheelConsistentHash.java:96) at org.infinispan.distribution.ch.ConsistentHashHelper.createConsistentHash(ConsistentHashHelper.java:122) at org.infinispan.statetransfer.ReplicatedStateTransferManagerImpl.createConsistentHash(ReplicatedStateTransferManagerImpl.java:56) at org.infinispan.statetransfer.BaseStateTransferManagerImpl.start(BaseStateTransferManagerImpl.java:143) What's particularly puzzling is that this is a REPL_ASYNC cache with state transfer enabled. Why are we attempting to create a consistent hash here? Dan, is this related to your work? Thanks in advance, Paul ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] getting ISPN config as XML via JMX
On Fri, Oct 14, 2011 at 7:40 PM, Pete Muir pm...@redhat.com wrote: What is the use case for that method? I've never know anyone actually want to export a running config as XML. So I was planning to loose it. It may be useful to take the configuration out of a cache configured in AS' ISPN subsystem and run it in standalone mode to isolate bugs... Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Infinispan 5.1 + AS 7.1
I'm afraid I can't reproduce it here and I don't know of any situation where the transport could return 0 members. Could I get your code and debug it on my machine? Cheers Dan On Mon, Oct 17, 2011 at 5:25 PM, Paul Ferraro paul.ferr...@redhat.com wrote: Hey Dan, Any progress on this? I'd like to push the Infinispan/JGroups upgrade this week if possible, but this is currently blocking my progress. Thanks, Paul On Fri, 2011-10-14 at 15:36 +0300, Dan Berindei wrote: Hi Paul Yeah, I have changed replicated mode's state transfer to transfer state in parallel from all the existing nodes and I'm using a consistent hash to determine which keys should be pushed by each node. This one looks tricky, the transport reports 0 members but it should always have at least the current node as a member. I'll try to run with REPL_ASYNC to see if it changes anything, but I don't think it should. Cheers Dan On Fri, Oct 14, 2011 at 1:27 AM, Paul Ferraro paul.ferr...@redhat.com wrote: Hey all, I'm a bit stuck with the Infinispan 5.1 upgrade in AS 7.1. I've tried both with BETA1 and a SNAPSHOT build from today. When a deployment forces a cache to start, I see the following stacktrace: 14:06:07,584 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-8) MSC1: Failed to start service jboss.infinispan.web.repl: org.jboss.msc.service.StartException in service jboss.infinispan.web.repl: Failed to start service at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1780) [jboss-msc-1.0.1.GA.jar:1.0.1.GA] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) [:1.6.0_23] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [:1.6.0_23] at java.lang.Thread.run(Thread.java:679) [:1.6.0_23] Caused by: org.infinispan.CacheException: Unable to invoke method private void org.infinispan.statetransfer.BaseStateTransferManagerImpl.start() throws java.lang.Exception on object at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:205) at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:825) at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:624) at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:527) at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:177) at org.infinispan.CacheImpl.start(CacheImpl.java:462) at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:574) at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:453) at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:467) at org.jboss.as.clustering.infinispan.DefaultEmbeddedCacheManager.getCache(DefaultEmbeddedCacheManager.java:84) at org.jboss.as.clustering.infinispan.DefaultEmbeddedCacheManager.getCache(DefaultEmbeddedCacheManager.java:73) at org.jboss.as.clustering.infinispan.subsystem.CacheService.start(CacheService.java:73) at org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1824) [jboss-msc-1.0.1.GA.jar:1.0.1.GA] at org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1759) [jboss-msc-1.0.1.GA.jar:1.0.1.GA] ... 3 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [:1.6.0_23] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [:1.6.0_23] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [:1.6.0_23] at java.lang.reflect.Method.invoke(Method.java:616) [:1.6.0_23] at org.infinispan.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:203) ... 16 more Caused by: java.lang.IllegalArgumentException: Invalid cache list for consistent hash: [] at org.infinispan.distribution.ch.AbstractWheelConsistentHash.setCaches(AbstractWheelConsistentHash.java:96) at org.infinispan.distribution.ch.ConsistentHashHelper.createConsistentHash(ConsistentHashHelper.java:122) at org.infinispan.statetransfer.ReplicatedStateTransferManagerImpl.createConsistentHash(ReplicatedStateTransferManagerImpl.java:56) at org.infinispan.statetransfer.BaseStateTransferManagerImpl.start(BaseStateTransferManagerImpl.java:143) What's particularly puzzling is that this is a REPL_ASYNC cache with state transfer enabled. Why are we attempting to create a consistent hash here? Dan, is this related to your work? Thanks
Re: [infinispan-dev] strictly not returning expired values
On Mon, Oct 17, 2011 at 7:51 PM, Sanne Grinovero sa...@infinispan.org wrote: I've noticed that every time we perform a get() operation (or any other retrieval) the value is NOT returned to the client if it's expired, which is going to receive a null instead. It looks like that these checks are in place to prevent a) a race condition with the eviction process b) to not return very old values from a very large wakeUpInterval between eviction activity c) to load from cacheLoaders Now for the first case, as a user of a caching library I would prefer to get the value instead. Expiry is in place only to prevent me from wasting memory, but this entry wasn't collected yet so why is Infinispan denying me access to the value I want to retrieve? We're introducing unneeded cache misses. Since even with the current code we can not provide guarantees that we won't ever return an expired entry (between the check and the actual return to user code we might be briefly suspended), I don't see why we should not make it clear that the expiry timeout is a best-effort time and relax our code checks a bit; there are many reasons to do so: 1) avoid fictitious cache misses. 2) Avoid the need to invoke a System.currentTimeMillis() for each retrieval operation - that's expensive! 3) Some operations like size() or values() now explicitly iterate the whole result set twice to make sure all expired entries are removed. This is unnecessary following the reasoning above. Regarding the other cases: b) We don't need to care imo. If people need more precision they should use a lower wakeUpInterval. I don't think this is an option, as a 5 seconds wakeUpInterval was deemed to short not long ago: https://source.jboss.org/viewrep/Infinispan/core/src/main/java/org/infinispan/config/Configuration.java?r1=9fe9a8cb6e49308b9e45f09b2f5324a7daefdee3r2=9cda361dc6df7bcecde95abc31fcbab11f734360 c) We should keep this logic in CacheLoader, just because having an aggressive wakeUpInterval might be very expensive in this case. I'm asking as I've built a quick POC for ISPN-1459 but since now the current time needs to be passed to the entry, I end up needing to invoke the wall clock even for immortal cache entries making many other use cases slower: not happy with that, I'd rather speed up both use cases. I guess you could pass in a Clock that lazily calls System.currentTimeMillis() and saves it for subsequent calls, but that will complicate things again. Dan Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Time measurement and expiry
On Tue, Oct 18, 2011 at 1:32 AM, Mircea Markus mircea.mar...@jboss.com wrote: On 17 Oct 2011, at 14:13, Sanne Grinovero wrote: Very interesting, I knew that in Windows currentTimeMillis() basically just reads a volatile because I got bit by the 15 millisecond accuracy issue before, so I thought it would always be very fast. I had no idea on Linux it would have the same performance as nanoTime(). Indeed very nice! I miss the part where the article says nanoTime has the same performance on modern Linux as currentTimeMillis, are you sure? Yeah, the article didn't talk about Linux but I found this article: http://blogs.oracle.com/ksrini/entry/we_take_java_performance_very There's also this JDK bug complaining about both being slow: http://bugs.sun.com/view_bug.do?bug_id=6876279 (the test output is in a weird format, ignore the 1st number, the 2nd one is nanos/call). Except when I actually ran the code in the bug report I my timings were much better than those reported in the bug description: java version 1.6.0_22 OpenJDK Runtime Environment (IcedTea6 1.10.3) (fedora-59.1.10.3.fc15-x86_64) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz currentTimeMillis: 36ns nanoTime: 28ns -server or -XX:AggressiveOpts don't seem to make a difference. I also ran the test on cluster10 and the results are slightly worse, but not significantly: java version 1.6.0_17 OpenJDK Runtime Environment (IcedTea6 1.7.10) (rhel-1.39.b17.el6_0-x86_64) OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode) Intel(R) Xeon(R) CPU E5640 @ 2.67GHz currentTimeMillis: 40ns nanoTime: 35ns It would be interesting if we could run the test on all our machines and see how the timings vary by machine and OS. It seems we're not the only ones with this problem either, Oracle (the database) apparently calls gettimeofday() a lot so RHEL has some optimizations to remove the system call overhead and make it even faster (more like Windows I presume, but I don't have a Windows machine on hand to confirm): http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-General_System_Tuning-gettimeofday_speedup.html http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-Realtime_Specific_Tuning-RT_Specific_gettimeofday_speedup.html ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Reminders about GIT
Nice! On Tue, Oct 18, 2011 at 6:16 PM, Sanne Grinovero sa...@infinispan.org wrote: Almost forgot the main tip; use: git config branch.master.mergeoptions --ff-only It will prevent a non-linear creating merge with an error as fatal. Not possible to fast-forward, aborting. Cheers, Sanne On 18 October 2011 16:09, Sanne Grinovero sa...@infinispan.org wrote: 1) If you have warnings about Merge made by recursive you have to fix it rebasing. 2) If you have warnings about non-fast-forward you have to rebase. 3) If you see non-fast-forward updates were rejected you shall never use force on upstream! It means that another patch was merged before you and you have to update your master again, and rebase again. 4) force is allowed only in special maintenance circumstances. If you find you're needing it to handle a pull request, then you're doing it wrong, and the mistake might be a dangerous one. It's like the good rule of never commit when you're drunk (coding is allowed). ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Infinispan 5.1.0.BETA2 is out with asymmetric cluster support
Nice writeup Galder! On Wed, Oct 19, 2011 at 10:54 AM, Manik Surtani ma...@jboss.org wrote: Nice one! On 19 Oct 2011, at 08:51, Galder Zamarreño wrote: Read all about it in http://goo.gl/C6BtO Cheers, -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Time measurement and expiry
On Tue, Oct 18, 2011 at 10:57 AM, Sanne Grinovero sa...@infinispan.org wrote: The replies so far are very interesting, but at this stage I'd be more interested in discussing the fundamental question: Why are we preferring to provide more cache misses slower performance, to fake a slightly better precision in eviction? I guess this is the main question, how much worse would the eviction precision be if we only relied on the periodic eviction thread? I would agree with you if we could guarantee a constant precision in eviction, but with the current algorithm the eviction interval has to grow with the cache size so we can't offer any guarantees. And that's even without considering CacheLoaders. Elias' suggestion of using a heap for eviction is interesting, but adding entries to the heap will slow puts. We could try a combined approach, every minute scan the data container and add the entries that will expire in the next minute to a heap. Only puts that expire before the deadline will add the entry to the eviction heap. Then another eviction thread can evict entries from the heap at a higher frequency. I'm sure there are better ideas, but I we'd have to prototype them and show that we can keep the staleness under an upper bound (say 1 second) for reasonably large caches before changing the algorithm. Since we can't guarantee precision, I'd want to actually remove these checks altogether: is there any use case requiring high precision in the eviction process? I hope not, I wouldn't suggest to rely on Infinispan as a reliable clock. Relying on (relatively) precise expiration is the kind of implicit assumption that people make and don't even realize they depend on it until an upgrade breaks their application. I'm curious if other cache impls offer any guarantees in this area. In the couple of cases we should really keep this logic - likely the CacheLoader - we can discuss how to optimize this. For example I like Dan's idea of introducing a Clock component; we could explore such solutions for the remaining bits which will still need a time source but I'd first want to remove the main bottleneck. --Sanne On 18 October 2011 07:57, Dan Berindei dan.berin...@gmail.com wrote: On Tue, Oct 18, 2011 at 1:32 AM, Mircea Markus mircea.mar...@jboss.com wrote: On 17 Oct 2011, at 14:13, Sanne Grinovero wrote: Very interesting, I knew that in Windows currentTimeMillis() basically just reads a volatile because I got bit by the 15 millisecond accuracy issue before, so I thought it would always be very fast. I had no idea on Linux it would have the same performance as nanoTime(). Indeed very nice! I miss the part where the article says nanoTime has the same performance on modern Linux as currentTimeMillis, are you sure? Yeah, the article didn't talk about Linux but I found this article: http://blogs.oracle.com/ksrini/entry/we_take_java_performance_very There's also this JDK bug complaining about both being slow: http://bugs.sun.com/view_bug.do?bug_id=6876279 (the test output is in a weird format, ignore the 1st number, the 2nd one is nanos/call). Except when I actually ran the code in the bug report I my timings were much better than those reported in the bug description: java version 1.6.0_22 OpenJDK Runtime Environment (IcedTea6 1.10.3) (fedora-59.1.10.3.fc15-x86_64) OpenJDK 64-Bit Server VM (build 20.0-b11, mixed mode) Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz currentTimeMillis: 36ns nanoTime: 28ns -server or -XX:AggressiveOpts don't seem to make a difference. I also ran the test on cluster10 and the results are slightly worse, but not significantly: java version 1.6.0_17 OpenJDK Runtime Environment (IcedTea6 1.7.10) (rhel-1.39.b17.el6_0-x86_64) OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode) Intel(R) Xeon(R) CPU E5640 @ 2.67GHz currentTimeMillis: 40ns nanoTime: 35ns It would be interesting if we could run the test on all our machines and see how the timings vary by machine and OS. It seems we're not the only ones with this problem either, Oracle (the database) apparently calls gettimeofday() a lot so RHEL has some optimizations to remove the system call overhead and make it even faster (more like Windows I presume, but I don't have a Windows machine on hand to confirm): http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-General_System_Tuning-gettimeofday_speedup.html http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-Realtime_Specific_Tuning-RT_Specific_gettimeofday_speedup.html ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman
Re: [infinispan-dev] JBoss Logging upgrade breaking build
David, I've upgraded to 3.1.0.Beta3 and I'm getting the same errors in the query module: [INFO] diagnostic error: All message bundles and message logger messageMethods must have or inherit a message. ... [ERROR] error on execute: error during compilation Dan On Mon, Oct 24, 2011 at 2:14 AM, David M. Lloyd david.ll...@redhat.com wrote: Okay, I've gotten to the root of the build issue. If you update your jboss-logging to 3.1.0.Beta3, and do a clean rebuild, the problem should go away. The issue was caused by me changing the retention on the annotations - cross-artifact interface hierarchies would stop working in this case. On 10/23/2011 01:59 PM, David M. Lloyd wrote: On 10/23/2011 03:20 AM, Galder Zamarreño wrote: Hi, This JBoss Logging upgrade is rather annoying for a couple of reasons: 1. IDE integration is broken: Before we had a JBoss Logging Processor as some dependency that allowed the logging processor to be found in the classpath. Now this is no longer in the classpath, so IntelliJ forces you to point to a jar. However, the new processing jar is split between two jars: ./jboss-logging-processor/1.0.0.CR3/jboss-logging-processor-1.0.0.CR3.jar and ./jboss-logging-generator/1.0.0.CR3/jboss-logging-generator-1.0.0.CR3.jar - And the generation won't work without both jars being pointed by the annotation processor. David, why did you split these jars? We split the JARs at the request of the Seam Solder project who wanted to reuse the generator. It should however still work just fine if you add both JARs as provided dependencies; I haven't had a problem here, even with IDEA. James and I have been discussing the ultimate wisdom of this strategy though. We might consider re-merging the JARs if it proves to cause insurmountable issues (they aren't getting *that* much value from our code, really). But I'm really surprised this is turning out to be a serious issue. 2. It breaks our build, see https://infinispan.ci.cloudbees.com/job/Infinispan-master-JDK6-tcp/268/org.infinispan$infinispan-query/console - What are these errors about? And why is it an info it breaks the compilation? :) [INFO] diagnostic error: All message bundles and message logger messageMethods must have or inherit a message. I've never seen this before. Maybe this error is a result of extending the other message bundle interface, that isn't in the same artifact? We'll see about putting in some more tests for this case. What is wrong with https://github.com/infinispan/infinispan/blob/master/query/src/main/java/org/infinispan/query/logging/Log.java ? Cheers, p.s. We can we please test these upgrades throughfully before committing them? Thx :) Of course we test, but of course things can be missed. In particular I've integrated the new version with several projects without any issues. -- - DML ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Preloading from disk versus state transfer Re: ISPN-1384 - InboundInvocationHandlerImpl should wait for cache to be started? (not just defined)
ISPN-1470 (https://issues.jboss.org/browse/ISPN-1470) raises an interesting question: if the preloading happens before joining, the preloading code won't know anything about the consistent hash. It will load everything from the cache store, including the keys that are owned by other nodes. I think there is a check in place already so that the joiner won't push stale data from its cache store to the other nodes, but we should also discard the keys that don't map locally or we'll have stale data (since we don't have a way to check if those keys are stale and register to receive invalidations for those keys). What do you think, should I discard the non-local keys with the fix for ISPN-1470 or should I let them be and warn the user about potentially stale data? Cheers Dan On Mon, Oct 3, 2011 at 3:09 AM, Manik Surtani ma...@jboss.org wrote: On 28 Sep 2011, at 10:56, Dan Berindei wrote: I'm not sure if the comment is valid though, since the old StateTransferManager had priority 55 and it also cleared the data container before applying the state from the coordinator. I'm not sure how preloading and state transfer are supposed to interact, maybe Manik can help clear this up? Hmm - this is interesting. I think preloading should happen first, since the cache store may contain old data. -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Preloading from disk versus state transfer Re: ISPN-1384 - InboundInvocationHandlerImpl should wait for cache to be started? (not just defined)
On Mon, Oct 24, 2011 at 4:42 PM, Sanne Grinovero sa...@infinispan.org wrote: On 24 October 2011 12:58, Dan Berindei dan.berin...@gmail.com wrote: Hi Galder On Mon, Oct 24, 2011 at 1:46 PM, Galder Zamarreño gal...@redhat.com wrote: On Oct 24, 2011, at 12:04 PM, Dan Berindei wrote: ISPN-1470 (https://issues.jboss.org/browse/ISPN-1470) raises an interesting question: if the preloading happens before joining, the preloading code won't know anything about the consistent hash. It will load everything from the cache store, including the keys that are owned by other nodes. It's been defined to work that way: https://docs.jboss.org/author/display/ISPN/CacheLoaders Tbh, that will only happen in shared cache stores. In non-shared ones, you'll only have data that belongs to that node. Not really... in distributed mode, every time the cache starts it will have another position on the hash wheel. That means even with a non-shared cache store, it's likely most of the stored keys will no longer be local. Actually I just noticed that you've fixed ISPN-1404, which looks like it would solves my problem when the cache is created by a HotRod server. I would like to extend it to work like this by default, e.g. by using the transport's nodeName as the seed. I think there is a check in place already so that the joiner won't push stale data from its cache store to the other nodes, but we should also discard the keys that don't map locally or we'll have stale data (since we don't have a way to check if those keys are stale and register to receive invalidations for those keys). +1, only for shared cache stores. What do you think, should I discard the non-local keys with the fix for ISPN-1470 or should I let them be and warn the user about potentially stale data? Discard only for shared cache stores. Cache configurations should be symmetrical, so if other nodes preload, they'll preload only data local to them with your change. Discarding works fine from the correctness POV, but for performance it's not that great: we may do a lot of work to preload keys and have nothing to show for it at the end. Can't you just skip loading state and be happy with the state you receive from peers? More data will be lazily loaded. Applying of course only when you're not the only/first node in the grid, in which case you have to load. Right, we could preload only on the first node. With a shared cache store this should work great, we just have to start preloading after we connect to the cluster and before we send the join request. But I have trouble visualizing how a persistent (purgeOnStartup = false) non-shared cache store should to work until we have some validation mechanism like in https://issues.jboss.org/browse/ISPN-1195. Should we even allow this kind of setup? The only alternative I see is to be able to find the boundaries of keys you own, and change the CacheLoader API to load keys by the identified range - should work with multiple boundaries too for virtualnodes, but this is something that not all CacheLoaders will be able to implement, so it should be an optional API; for now I'd stick with the first option above as I don't see how we can be more efficient in loading the state from CacheLoaders than via JGroups. Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Cache ops after view changes result in RehashInProgressException very easily
Hi Galder, sorry it took so long to reply. On Mon, Oct 24, 2011 at 4:16 PM, Galder Zamarreño gal...@redhat.com wrote: Btw, forgot to attach the log: On Oct 24, 2011, at 3:13 PM, Galder Zamarreño wrote: Hi Dan, Re: http://goo.gl/TGwrP There's a few of this in the Hot Rod server+client testsuites. It's easy to replicate it locally. Seems like cache operations right after a cache has started are rather problematic. In local execution of HotRodReplicationTest, I was able to replicate the issue when trying to test topology changes. Please find attached the log file, but here're the interesting bits: 1. A new view installation is being prepared with NodeA and NodeB: 2011-10-24 14:36:09,046 4221 TRACE [org.infinispan.cacheviews.CacheViewsManagerImpl] (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) ___hotRodTopologyCache: Preparing cache view CacheView{viewId=4, members=[NodeA-63227, NodeB-15806]}, committed view is CacheView{viewId=3, members=[NodeA-63227, NodeB-15806, NodeC-17654]} … 2011-10-24 14:36:09,047 4222 DEBUG [org.infinispan.statetransfer.StateTransferLockImpl] (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Blocking new transactions 2011-10-24 14:36:09,047 4222 TRACE [org.infinispan.statetransfer.StateTransferLockImpl] (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Acquiring exclusive state transfer shared lock, shared holders: 0 2011-10-24 14:36:09,047 4222 TRACE [org.infinispan.statetransfer.StateTransferLockImpl] (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Acquired state transfer lock in exclusive mode 2. The cluster coordinator discovers a view change and requests NodeA and NodeB to remove NodeC from the topology view: 2011-10-24 14:36:09,048 4223 TRACE [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-3,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Invoked with command RemoveCommand{key=NodeC-17654, value=null, flags=null} and InvocationContext [NonTxInvocationContext{flags=null}] 3. NodeB has not yet finished installing the cache view, so that remove times out: 2011-10-24 14:36:09,049 4224 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-3,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) ISPN000136: Execution error org.infinispan.distribution.RehashInProgressException: Timed out waiting for the transaction lock A way to solve this is to avoid relying on cluster view changes, but instead wait for the cache view to be installed, and then do the operations then. Is there any way to wait till then? One way would be to have some CacheView installed callbacks or similar. This could be a good option cos I could have a CacheView listener for the hot rod topology cache whose callbacks I can check for isPre=false and then do the cache ops safely. Initially I was thinking of allowing multiple cache view listeners for each cache and making StateTransferManager one of them but I decided against it because I realized it needs a different interface than our regular listeners. I know that it was only a matter of time until someone needed it... An alternative solution would be to retry all operations, like we do with commits now, when we receive a RehashInProgressException exception from the remote node. That's what I was planning to do first as it helps in other use cases as well. Otherwise, code like this the one I used for keeping the Hot Rod topology is gonna be racing against your cache view installation code. You seem to have some pieces in place for this, i.e. CacheViewListener, but it seems only designed for internal core/ work. Any other suggestions? Cheers, -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Cache ops after view changes result in RehashInProgressException very easily
I was gunning for beta3 but I don't think I'm going to make it. On Wed, Oct 26, 2011 at 12:38 PM, Galder Zamarreño gal...@redhat.com wrote: On Oct 26, 2011, at 9:46 AM, Dan Berindei wrote: Hi Galder, sorry it took so long to reply. On Mon, Oct 24, 2011 at 4:16 PM, Galder Zamarreño gal...@redhat.com wrote: Btw, forgot to attach the log: On Oct 24, 2011, at 3:13 PM, Galder Zamarreño wrote: Hi Dan, Re: http://goo.gl/TGwrP There's a few of this in the Hot Rod server+client testsuites. It's easy to replicate it locally. Seems like cache operations right after a cache has started are rather problematic. In local execution of HotRodReplicationTest, I was able to replicate the issue when trying to test topology changes. Please find attached the log file, but here're the interesting bits: 1. A new view installation is being prepared with NodeA and NodeB: 2011-10-24 14:36:09,046 4221 TRACE [org.infinispan.cacheviews.CacheViewsManagerImpl] (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) ___hotRodTopologyCache: Preparing cache view CacheView{viewId=4, members=[NodeA-63227, NodeB-15806]}, committed view is CacheView{viewId=3, members=[NodeA-63227, NodeB-15806, NodeC-17654]} … 2011-10-24 14:36:09,047 4222 DEBUG [org.infinispan.statetransfer.StateTransferLockImpl] (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Blocking new transactions 2011-10-24 14:36:09,047 4222 TRACE [org.infinispan.statetransfer.StateTransferLockImpl] (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Acquiring exclusive state transfer shared lock, shared holders: 0 2011-10-24 14:36:09,047 4222 TRACE [org.infinispan.statetransfer.StateTransferLockImpl] (OOB-1,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Acquired state transfer lock in exclusive mode 2. The cluster coordinator discovers a view change and requests NodeA and NodeB to remove NodeC from the topology view: 2011-10-24 14:36:09,048 4223 TRACE [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-3,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) Invoked with command RemoveCommand{key=NodeC-17654, value=null, flags=null} and InvocationContext [NonTxInvocationContext{flags=null}] 3. NodeB has not yet finished installing the cache view, so that remove times out: 2011-10-24 14:36:09,049 4224 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (OOB-3,Infinispan-Cluster,NodeB-15806:___hotRodTopologyCache) ISPN000136: Execution error org.infinispan.distribution.RehashInProgressException: Timed out waiting for the transaction lock A way to solve this is to avoid relying on cluster view changes, but instead wait for the cache view to be installed, and then do the operations then. Is there any way to wait till then? One way would be to have some CacheView installed callbacks or similar. This could be a good option cos I could have a CacheView listener for the hot rod topology cache whose callbacks I can check for isPre=false and then do the cache ops safely. Initially I was thinking of allowing multiple cache view listeners for each cache and making StateTransferManager one of them but I decided against it because I realized it needs a different interface than our regular listeners. I know that it was only a matter of time until someone needed it... An alternative solution would be to retry all operations, like we do with commits now, when we receive a RehashInProgressException exception from the remote node. That's what I was planning to do first as it helps in other use cases as well. Ok, do you have time to include this today ahead of the BETA3 release? I think this is a very important fix cos as you can see in the testsuite, it's very easy to get this error with Hot Rod servers. Otherwise, code like this the one I used for keeping the Hot Rod topology is gonna be racing against your cache view installation code. You seem to have some pieces in place for this, i.e. CacheViewListener, but it seems only designed for internal core/ work. Any other suggestions? Cheers, -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org
Re: [infinispan-dev] Time measurement and expiry
Cool link Elias, I read the paper linked in the javadoc and it's very interesting. Netty seems to use scheme 6 (a single array of lists with hashing and without any overflow). I believe in general our timeouts will be much longer so I suspect scheme 5 or scheme 7 should perform better for us. Scheme 5 should be much better than 6 if most of the timeouts use the same value, and I believe we fit right in that use case. On the other hand I think we could save some memory if we implement scheme 7 by not actually storing the higher level wheels and building them on the fly by iterating the data container. The downside is that we'll need periodic sweeps of the data container, but those can be on a separate thread. One scenario that we should keep in mind is the L1 cache, whose entries I would guess are invalidated much more often than they expire after their 1 hour lifetime. If the user stores only immortal entries + L1 entries in the cache we might end up with a ticker task that never rarely does anything useful so we should minimize its impact. It might also be worth it to implement our own list type to make adding and removing timeouts faster. I see Netty uses IdentityHashMap-backed sets for the bucket lists, but as Sanne found out they aren't the fastest thing around. Using a doubly-linked list and keeping a reference to the Node in the cache entry should make it much faster. Cheers Dan On Tue, Oct 25, 2011 at 8:02 PM, Elias Ross gen...@noderunner.net wrote: Rather than a heap, I recall Netty had a hash wheel timer[1]. Which should be better performing than a heap. You could get clever by having multiple heaps, segmented by orders of magnitude. For example wheel 1 could have 1 minute, 2 1 hour etc. Netty uses a lot of short lived timers and performs well. [1] http://docs.jboss.org/netty/3.1/api/org/jboss/netty/util/HashedWheelTimer.html Obviously this uses some memory, but you could probably optimize it somewhat to only work with Node instances. ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] ISPN-1543 Split CacheImpl into TransactionalCache and NonTransactionalCache
Any pros? On Tue, Nov 22, 2011 at 2:11 AM, Mircea Markus mircea.mar...@jboss.com wrote: This is simple to implement and I don't see how it can cause any API incompatibilities, but as it is in the very core I'd like to ask your opinion - any cons for adding it into next release? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Shipping logical address String in Hot Rod protocol v1.1?
On Tue, Nov 29, 2011 at 5:09 PM, Bela Ban b...@redhat.com wrote: On 11/29/11 3:50 PM, Galder Zamarreño wrote: On Nov 29, 2011, at 2:23 PM, Bela Ban wrote: On 11/29/11 2:10 PM, Galder Zamarreño wrote: Hi, We've been having a discussion this morning with regards to the Hot Rod changes introduced in 5.1 with regards to hashing. When Hot Rod server is deployed in AS, in order to start correctly, it requires the Hot Rod server to start before any other (clustered) caches in AS. This is because any hashing can only be done once the hash has been calculated on the hot rod endpoint. Although this can be fixed in a hacky way (have all caches configured to start lazily and let the Hot Rod server start the topology cache and then all defined caches, ugh), we're considering a slight change to Hot Rod protocol v 1.1 (https://docs.jboss.org/author/display/ISPN/Hot+Rod+Protocol+-+Version+1.1) that would solve this problem. Instead of hashing on the hot rod endpoint address (host:port), we could hash on JGroups's Address UTF-8 toString representation, which is the logical address. The advantage here is any cache can calculate the hash on this from the start, no need to wait for Hot Rod server to start. The downside is that Hot Rod clients need to be aware of this UTF-8 string in order to hash to the same thing, so it'd mean shipping this back to the clients alongside the hot rod endpoint info. I have a few concerns, maybe not an issue with the way HotRod uses this, but I wanted to bring this up anyway: These are very valid concerns. - What if a logical name is null ? A user can do JChannel.setName(null); Hmmm, what's the JGroups address toString() representation in that case? The underlying UUID would be printed (not a nice sight !)... Hmm, I sometimes see UUIDs in the test suite logs. This makes me think, can we rely on the discovery protocol always giving us the logical names of the other cluster members during/after the viewAccepted() callback? - What if the logical name changes at runtime ? Certainly not recomended, but the APi allows this, or I should rather say, doesn't prevent it... :-( If the logical name changes, is a view change triggered? No. Maybe I should disallow changing a logical name after connect() has been called... How are other nodes gonna locate this node if it's logical address has changed? Through the discovery process. Again, maybe I should just disallow this use case, as it IMO doesn't make any sense. Looks dangerous from a JGroups perspective, why is it allowed? I like to give users a lot of rope to hang themselves ... :-) ha ha OK, you convinced me: https://issues.jboss.org/browse/JGRP-1395, for 3.0.2 and 3.1.0 +1 - What if we have multiple members with the same logical name ? Can that happen? Yes It's not very likely with standalone Infinispan, as we also add a random number to the configured node name. We also have the option of enforcing a unique node name by just failing to start when we detect duplicates. Don't know how AS sets the node names, but I'm pretty sure they also ensure a unique node name. How can JGroups work correctly if multiple members have the same logical name? How can unicasts be directed correctly in this case? Because the logical name is just a nice way to hide a UUID. JGroups never uses the logical address for anything but pretty-printing; instead the underlying UUID is used for addressing purposes. - At startup, we may not yet have the logical names of all members, as this is done as part of the discovery protocol… Hmmm, don't think that's important. What matters is that when node A starts, after the channel is open, it's logical address is availble locally. OK , that'll be the case That's the same right now. When Hot Rod starts, it maps the running node's JGroups address to its Hot Rod enpoint address. OK - Can't you hash on the address ? It'll be available as soon as connect() returns. Is that too late ? That would mean hashing a Java object which is not portable, hence why we need a hash on something that's portable, i.e. a UTF-8 String. Can't you grab the UUID which is 2 longs and simply send 2 longs across the wire ? The UUID is guaranteed to be unique. If we commit to sending a UUID to the HotRod clients it will be harder to send something else in the future. One of the reasons we chose to send a string was so that we could change the way we generate that string without breaking the clients. For example I'm very interested in starting nodes with a fixed hash seed, so that after a restart the local cache store is still relevant for this node. It would be also nice if the order of nodes on the hash wheel wouldn't change between test runs. As a compromise we could send a 32-bit hash to the clients and specify the algorithm to generate virtual node hashes from it in the HotRod protocol. That way we won't require
Re: [infinispan-dev] Shipping logical address String in Hot Rod protocol v1.1?
On Wed, Nov 30, 2011 at 10:04 AM, Bela Ban b...@redhat.com wrote: On 11/30/11 8:59 AM, Dan Berindei wrote: Why don't you send the UUID as a (16 byte) string then ? Yeah, that would work. However, a UUID is not always a valid UTF-8 string, so we should probably define it in the protocol as an array of bytes (without any meaning). Yes. We did something similar in JGroups, take a look at ENCRYPT.byteArrayToHexString(). Since HotRod is a binary protocol I meant to send the UUID as it is (no encoding whatsoever). I think sending the UUID as a hex-encoded string, a raw byte array, or a 32-bit hash (since that's all we need on the client) are all valid choices. -- Bela Ban Lead JGroups (http://www.jgroups.org) JBoss / Red Hat ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Suspect Exceptions
Sanne, I think I changed this with my fix for ISPN-1493 and now the StateTransferLockInterceptor retries the operation (at the moment, for up to 3 times). Dan On Fri, Dec 9, 2011 at 4:56 PM, Slorg1 slo...@gmail.com wrote: Hi, On Fri, Dec 9, 2011 at 09:53, Sanne Grinovero sa...@infinispan.org wrote: Thanks Vladimir, I can do that as a short term solution, but do you think we could have Infinispan care for it in the future? I don't like to mandate wrapping all invocations to Infinispan in a try catch block looking for such errors, and maybe others. I am using 5.1.0 BETA 5 and when members are suspected I do not get the exception. Infinispan takes care of it for me. Regards, Slorg1 -- Please consider the environment - do you really need to print this email ? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] boolean properties in config
On Tue, Dec 13, 2011 at 10:44 AM, Galder Zamarreño gal...@redhat.com wrote: On Dec 12, 2011, at 3:45 AM, Mircea Markus wrote: On 10 Dec 2011, at 10:04, Pete Muir wrote: https://issues.jboss.org/browse/ISPN-1474 Currently we do a mix of: enableXXX() disableXXX() enabled(boolean b) XXX(boolean b) where XXX is something to enable or disable (e.g. purgeOnStartup). We also the scatter the word use around in a very inconsistent fashion. this is a bit ugly indeed +1 I would like to rationalise this, and would propose that every boolean has: enableXXX disableXXX xxxEnabled(boolean b) The former 2 are nice for hardcoded config, the latter is nice when you are adapting one format to another (e.g. Paul). I think this would be much more readable. My only comment is that we allow two ways of doing the same thing, which might be slightly confusing for the users. Also the easiness of migration from old to new configuration is important, but IMO not as important to make the new config simple. I agree with Mircea that having two ways of doing the same thing might confuse users. If I had to choose I'd probably go for the latter option. How about: xxxEnabled() xxxEnabled(boolean b) However, I don't think enabled is going to work in every situation (loader.sharedEnabled?), so we might want to keep the xxx()/xxx(boolean b) form in some places. I would deprecate the usage of XXX(boolean b) and useXXX(boolean b) and remove in 6. Thoughts? ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
[infinispan-dev] ISPN-1586 - inconsistent cache data in replication cluster with local (not shared) cache store
Hi guys For a little background, see the discussion at https://issues.jboss.org/browse/ISPN-1586 How do you feel about discarding the contents of the cache store on all cache (virtual) nodes except the first to start? Configuring purgeOnStartup=true is not ok in data grid scenarios, because you'd lose all the data on restart. But loading entries from the cache store of a node that started later is almost guaranteed to lead to stale data. In some use cases that stale data might be acceptable, but in most cases it probably isn't. So perhaps it makes sense to make this configurable? Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] 5.1.0.CR2 tomorrow, are you ready?
I've been working on some state transfer/virtual cache view improvements: ISPN-1606. I'll have a pull request up in an hour. Dan On Thu, Dec 15, 2011 at 8:58 PM, Galder Zamarreño gal...@redhat.com wrote: On Dec 15, 2011, at 2:21 PM, Vladimir Blagojevic wrote: If someone can deal with my pull https://github.com/infinispan/infinispan/pull/722 I would appreciate it. It is very non-affecting to the rest of the codebase. Thx, just had a quick look and commented. It needs some tweaking IMO. Also, I have sent out a fix proposal for https://issues.jboss.org/browse/ISPN-1546 to Sanne for testing, I would like to squeeze that one in if he approves. On 11-12-15 9:24 AM, Pete Muir wrote: I'm hoping to replace out the parser, I have it mostly implemented, just testing. On 15 Dec 2011, at 12:21, Galder Zamarreño wrote: Hi, We're planning to release CR2 tomorrow. How are things looking on your side? I have the PFER stuff to commit, assuming Mircea's happy with it (see dev list thread), and https://issues.jboss.org/browse/ISPN-1609 Cheers, -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] inefficient usage of CacheLoader with REPL
Dan, thanks for looking. So it ignores the return values: wouldn't it be more efficient to ask the remote node to not serialize it at all in first place? I mean ReplicationInterceptor should send over both Flag.SKIP_CACHE_LOAD (*not* X_LOAD) and Flag.SKIP_REMOTE_LOOKUP on any write operation. Flag.SKIP_REMOTE_LOOKUP shouldn't be necessary, the ResponseGenerator on the remote node checks the command ID before serializing the response and sends a null instead if the command is not one of those listed in DefaultResponseGenerator.requiresResponse (for replicated and invalidation caches). The ResponseGenerator does seem a bit magic, so perhaps it would be better to use Flag.SKIP_REMOTE_LOOKUP instead and maybe add a generic Flag.IGNORE_REMOTE_RESPONSE if there are other commands that would benefit from it. Definitely something for 5.2 :) Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Infinispan 5.1.0.CR3 release
I still have some state transfer failures, I was hoping to fix them by this morning but I'm still having troubles so I might finish very late today. Dan On Mon, Jan 9, 2012 at 1:39 PM, Galder Zamarreño gal...@redhat.com wrote: Not on my side. I still need to check what 2LC looks with the fixes it needed which I'll do that today (w/ snapshot) or tomorrow (w/ CR3) On Jan 9, 2012, at 12:35 PM, Manik Surtani wrote: Yes please. Everyone, shall we defer our status update today for tomorrow, and instead focus on getting this release out today? Anything you feel that is important that *won't* make the cut? On 9 Jan 2012, at 10:29, Galder Zamarreño wrote: Guys, how does CR3 today sound? On Jan 4, 2012, at 5:24 PM, Galder Zamarreño wrote: I'm on holiday on Friday (instead of Monday) so I'm hoping to finish all by CR3 stuff by tomorrow. So, if stuff is ready, maybe someone else can release on Friday instead of me? Cheers, On Jan 4, 2012, at 4:56 PM, Manik Surtani wrote: I'm happy with one this week even, if we're ready in time. On 4 Jan 2012, at 15:53, Galder Zamarreño wrote: Btw, I had originally planned to release CR3 on the 12th but based on Tristan's request I'd like to do it earlier, probably on 9th or 10th (latest). How does that sound? Let's get cracking with all those remaining issues! :) On Jan 4, 2012, at 4:41 PM, Tristan Tarrant wrote: On 01/04/2012 04:39 PM, Galder Zamarreño wrote: We're in CR stages though so what's the point of releasing a CR which actually is not really a CR? ;) I can see the point of faster, more frequent releases in BETA/ALPHA stages because we can push over things, but doesn't make sense doing that in CR. Galder, our CRs are not real CRs anyway, so it's only a naming thing. Tristan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Galder Zamarreño Sr. Software Engineer Infinispan, JBoss Cache ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] IgnoreExtraResponsesValidityFilter
Good catch Manik! I just copied the approach from ClusteredGetResponseFilter and I didn't think too much about performance issues. Bela, the BitSet approach sounds a little better than a simple counter from a debugging perspective. But I think we're also missing some synchronization at the moment, and with the counter approach could be made thread-safe with minimal overhead by using an AtomicInteger. I assume using a counter only is safe (i.e. RspFilter.isAcceptable() will never be called twice for the same target). ClusteredGetResponseFilter could probably use the same treatment, except the targets list is quite small so we can search in the list directly instead of creating a HashSet with the same contents. Cheers Dan On Sat, Jan 14, 2012 at 4:06 PM, Bela Ban b...@redhat.com wrote: Another implementation could take the current view, and create a BitSet with the size being the length of the view, and every bit corresponds to the position of a member in the view, e.g.: V1={A,B,C,D,E,F}, bitset={0,0,1,0,1,1} means that responses have been received from C, E and F. On 1/13/12 8:20 PM, Manik Surtani wrote: Looking at IgnoreExtraResponsesValidityFilter - this seems to be a scalability issue! It seems to copy a set of every address in the cluster and gradually remove entries as responses come in. Isn't this a scalability issue? Since - assuming a large cluster - for every prepare command, we create a collection, copy into it the entire list of addresses (think hundreds of nodes x hundreds of concurrent threads) only to count down on the responses. I'm almost certain there is a better way to do this! :) Maybe even maintain a shared list of members (updated whenever there is a view change) to test for responses from non-members, a counter, and assume that members don't respond to the same request more than once? Cheers Manik -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org -- Bela Ban Lead JGroups (http://www.jgroups.org) JBoss / Red Hat ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] moving asyncMarshalling to jgroups
On Thu, Jan 19, 2012 at 7:46 PM, Mircea Markus mircea.mar...@jboss.com wrote: On 19 Jan 2012, at 17:36, Galder Zamarreño wrote: On Jan 19, 2012, at 3:43 PM, Bela Ban wrote: This may not give you any performance increase: #1 In my experience, serialization is way faster than de-serialization. Unless you're doing something fancy in your serializer No. I think Mircea didn't explain this very well. What really happens here is that when asyncMarshalling is turned on (the name is confusing...), async transport sending is activated. What does this mean? When the request needs to be passed onto JGroups, this is done in a separate thread, which indirectly, results in marshalling happening in a different thread. This returns faster because we return the control earlier rather than waiting for JGroups to do his thing of marshalling and sending the command and not waiting for response. See the use of RpcManagerImpl.asyncExecutor for more info. Thanks for the clarification Galder. So basically with asyncMarshalling=true (this only works for async x-casts): - user thread(1) passes control to marshalling thread(2). At this point user thread returns - marshalling thread serialises and builds the Message object then passed control to jgroups thread(3) that sends the response async Are you sure about this Mircea? AFAIK the message is always sent by JGroups from our thread (either the user thread or the async transport thread). The only difference between sync (GET_ALL) and async calls (GET_NONE) at the JGroups level is that with GET_NONE that thread doesn't wait for the response from the target nodes. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] Keeping track of locked nodes
In this particular case it just means that the commit is sync even if it's configured to be async. But there are more places where we check the remote lock nodes list, e.g. BaseRpcInterceptor.shouldInvokeRemotely or AbstractEnlistmentAdapter - which, incidentally, could probably settle with a hasRemoteLocks boolean flag instead of the nodes collection. TransactionXaAdapter.forgetSuccessfullyCompletedTransaction does use it properly when recovery is enabled - if we didn't keep the collection around it would have to compute it again from the list of keys. The same with PessimisticLockingInterceptor.releaseLocksOnFailureBeforePrepare, but there we're on thefailure path already so recomputing the address set wouldn't hurt that much. I may have missed others... Cheers Dan On Fri, Jan 20, 2012 at 8:52 AM, Manik Surtani ma...@jboss.org wrote: Well, a view change may not mean that the nodes prepared on have changed. But still, there would almost certainly be a better way to do this than a collection of addresses. So the goal is to determine whether the set of nodes the prepare has been sent to and the current set of affected nodes at the time of commit has changed? And what are the consequences of getting this (pessimistically) wrong, just that the prepare gets repeated on some nodes? On 20 Jan 2012, at 03:54, Mircea Markus wrote: On 19 Jan 2012, at 19:22, Sanne Grinovero wrote: I just noticed that org.infinispan.transaction.LocalTransaction is keeping track of Addresses on which locks where acquired. That's surprising me .. why should it ever be interested in the specific Address? I'd expect it to be able to figure that out when needed, especially since the Address owning the lock might change over time I don't understand to track for a specific node. This information is required at least here[1], but not sure wether we cannot rely on the viewId which is now associated with every transaction to replace that logic. [1] http://bit.ly/wgMIHH ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] DIST.retrieveFromRemoteSource
Hi Sanne On Wed, Jan 25, 2012 at 1:22 AM, Sanne Grinovero sa...@infinispan.org wrote: Hello, in the method: org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(Object, InvocationContext, boolean) we have: ListAddress targets = locate(key); // if any of the recipients has left the cluster since the command was issued, just don't wait for its response targets.retainAll(rpcManager.getTransport().getMembers()); But then then we use ResponseMode.WAIT_FOR_VALID_RESPONSE, which means we're not going to wait for all responses anyway, and I think we might assume to get a reply by a node which actually is in the cluster. So the retainAll method is unneeded and can be removed? I'm wondering, because it's not safe anyway, actually it seems very unlikely to me that just between a locate(key) and the retainAll the view is being changed, so not something we should be relying on anyway. I'd rather assume that such a get method might be checked and eventually dropped by the receiver. The locate method will return a list of owners based on the committed cache view, so there is a non-zero probability that one of the owners has already left. If I remember correctly, I added the retainAll call because otherwise ClusteredGetResponseFilter.needMoreResponses() would keep returning true if one of the targets left the cluster. Coupled with the fact that null responses are not considered valid (unless *all* responses are null), this meant that a remote get for a key that doesn't exist would throw a TimeoutException after 15 seconds instead of returning null immediately. We could revisit the decision to make null responses invalid, and then as long as there is still one of the old owners left in the cluster you'll get the null result immediately. You may still get an exception if all the old owners left the cluster, but I'm not sure. I wish I had added a test for this... We may also be able to add a workaround in FutureCollator as well - just remember that we use the same FutureCollator for writes in REPL mode so it needs to work with GET_ALL as well as with GET_FIRST. Slightly related, I wonder if Manik's comment is still true: if at all possible, try not to use JGroups' ANYCAST for now. Multiple (parallel) UNICASTs are much faster.) Intuitively it shouldn't be true, unicasts+FutureCollator do basically the same thing as anycast+GroupRequest. Cheers Dan Cheers, Sanne ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
On Fri, Jan 27, 2012 at 2:35 PM, Mircea Markus mircea.mar...@jboss.com wrote: I've created a JIRA to track this: https://issues.jboss.org/browse/ISPN-1801 I understand everybody has to have the exact same number of vnodes for reads and writes to hit the correct node, right ? Yes. That's true, but it is not a good thing: numVirtNodes should be proportional with the node's capacity, i.e. more powerful machines in the cluster should have assigned more virtual nodes. This way we can better control the load. A node would need to send its configured numVirtualNodes when joining in order to support this, but that's a thing we already do for TACH. We should use a different mechanism than the TopologyAwareUUID we use for TACH, because the address is sent with every command. The capacity instead should be fairly static. We may want to make it changeable at runtime, but it will take a state transfer to propagate that info to all the members of the cluster (because the nodes' CHs need to stay in sync). In fact, I can imagine users wanting to balance key ownership between machines/racks/sites with TACH, but without actually using RELAY - the TopologyAwareUUID is just an overhead for them. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
On Fri, Jan 27, 2012 at 2:53 PM, Mircea Markus mircea.mar...@jboss.com wrote: That's true, but it is not a good thing: numVirtNodes should be proportional with the node's capacity, i.e. more powerful machines in the cluster should have assigned more virtual nodes. This way we can better control the load. A node would need to send its configured numVirtualNodes when joining in order to support this, but that's a thing we already do for TACH. We should use a different mechanism than the TopologyAwareUUID we use for TACH, because the address is sent with every command. so every command sends cluster, rack and machine info? That's sounds a bit redundant. Can't we just send them once with the JOIN request? When RELAY is enabled, it actually needs the topology info in order to relay messages between sites. I agree that the topology info should never change, but RELAY requires it for now so we can't avoid it (in the general case). Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] very interesting performance results
On Fri, Jan 27, 2012 at 3:43 PM, Mircea Markus mircea.mar...@jboss.com wrote: On 27 Jan 2012, at 13:31, Sanne Grinovero wrote: My experiments where using the default JVM settings regarding compile settings, with these others: -Xmx2G -Xms2G -XX:MaxPermSize=128M -XX:+HeapDumpOnOutOfMemoryError -Xss512k -XX:HeapDumpPath=/tmp/java_heap -Djava.net.preferIPv4Stack=true -Djgroups.bind_addr=127.0.0.1 -Dlog4j.configuration=file:/opt/log4j.xml -XX:+PrintCompilation -Xbatch -server -XX:+UseCompressedOops -XX:+UseLargePages -XX:LargePageSizeInBytes=2m -XX:+AlwaysPreTouch And it's with these options that after 30 minutes of full-stress it's still not finished warming up all of Infinispan+JGroups code. After that I stated I was going to *experiment* with -XX:CompileThreshold=10, to see if I could get it to compile in a shorter time, just to save me from waiting too long in performance tests. It doesn't seem to matter much, so I'm reverting it back to the above values (default for this parameter for my VM is 1). That's surprising, I'd say that in 30 mins of invocations all the critical paths are touched much more many times than 1. E.g. the number of reads/sec is in thousands (20k on the cluster lab). Might be that this param is ignored or it collides with other -XX ? PrintCompilation tells you which methods are compiled, but it doesn't tell you which methods were inlined with it. So something like this can (and probably does) happen: 1. ConcurrentSkipListMap.doPut is compiled 2. UNICAST2.down() (assuming JGroups 3.0.3.Final) calls AckSenderWindow.add() - ConcurrentSkipListMap.put - doPut. This gets compiled, and everything is inlined in it. 2. The conditions where ConcurrentSkipListMap.put is called change dramatically - so the initial optimizations are no longer valid. 2. Eventually the timers and everything else that's using ConcurrentSkipListMap call put() 1 times and ConcurrentSkipListMap.doPut is compiled again. AFAIK oprofile will not report any inlined methods, so if ConcurrentSkipListMap appears in the report it means that it still called a fair amount of times. On the other hand, oprofile also doesn't report interpreted methods - so if it appears as a separate method in the oprofile report than it means it is compiled. You should also try running the test without -Xbatch, apparently it's only good for debugging the JVM: http://stackoverflow.com/questions/3369791/java-vm-tuning-xbatch-and-xcomp It shouldn't change what methods get compiled, but compilation should be less expensive without it. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] DIST.retrieveFromRemoteSource
Manik, Bela, I think we send the requests sequentially as well. In ReplicationTask.call: for (Address a : targets) { NotifyingFutureObject f = sendMessageWithFuture(constructMessage(buf, a), opts); futureCollator.watchFuture(f, a); } In MessageDispatcher.sendMessageWithFuture: UnicastRequestT req=new UnicastRequestT(msg, corr, dest, options); req.setBlockForResults(false); req.execute(); Did we use to send each request on a separate thread? Cheers Dan On Fri, Jan 27, 2012 at 1:21 PM, Bela Ban b...@redhat.com wrote: yes. On 1/27/12 12:13 PM, Manik Surtani wrote: On 25 Jan 2012, at 09:42, Bela Ban wrote: No, parallel unicasts will be faster, as an anycast to A,B,C sends the unicasts sequentially Is this still the case in JG 3.x? -- Bela Ban Lead JGroups (http://www.jgroups.org) JBoss / Red Hat ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] The need for a 5.1.1
On Fri, Jan 27, 2012 at 5:31 PM, Bela Ban b...@redhat.com wrote: On 1/27/12 4:26 PM, Mircea Markus wrote: On 27 Jan 2012, at 15:08, Bela Ban wrote: Build the JGroups JAR with ./build.sh jar, *not* via maven ! I attached the JAR for you. Thanks! JGroups *is* and *will remain* JAR less ! :-) Sorry for loosing the faith :) Might make sense to have the mvn install work as well though, I think people would expect it to behave correctly when they see a pom.xml. Are you volunteering ? I'd be happy to integrate your changes ! :-) Tests probably don't work, but it's enough to get it building: https://github.com/belaban/JGroups/pull/25 Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] default value for virtualNodes
Manik, I'm assigning ISPN-1801 to myself - I need to add my key distribution test and the results anyway. Cheers Dan On Mon, Jan 30, 2012 at 1:17 PM, Manik Surtani ma...@jboss.org wrote: On 29 Jan 2012, at 14:57, Galder Zamarreño wrote: To reiterate what I said in another thread, the memory effects of virtual nodes on on Hot Rod clients is none since version 1.1 of the protocol (included in 5.1). I enhanced the protocol so that clients would generate virtual node hashes and so avoid sending them over the wire. Yes, I remember. Perfect. :) So for https://issues.jboss.org/browse/ISPN-1801 I'll set the default to 48. Any objections? Cheers Manik -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] JBoss Libra
On Tue, Jan 31, 2012 at 4:49 PM, Bela Ban b...@redhat.com wrote: I don't know, but I don't see why not. If you look at java.lang.{Memory,GarbageCollector,MemoryPool}, there are a lot of values you can look at, including the details on eden and old. I just looked at G1 with JConsole, it does provide memory usage information in (java.lang:type=MemoryPool, name={G1 Young Gen, G1 Survivor, G1 Old Gen}).Usage. However, the locations are not standard, so we'd have to keep a bunch of rules about where to look for each garbage collector. Plus I'm not sure what looking at each generation separately would buy us over looking only at Runtime.freeMemory(). You can even register for notifications when free memory drops below a certain threshold, but I've never tried this out and I heard some impls don't provide this... The biggest problem I think with looking at the amount of used/free memory is used memory includes dead objects. So you'll get a free memory threshold notification and start evicting entries, but you won't know how much memory you've freed (or even if you needed to evict any entries in the first place) without triggering an extremely expensive full GC. I second Mircea's idea of using the serialized data size, that's the only reliable data we've got. Cheers Dan On 1/31/12 3:32 PM, Tristan Tarrant wrote: On 01/31/2012 03:32 PM, Bela Ban wrote: IIRC you can look at the size of the young and old generation via JMX, and there you can see how much memory has accumulated. Does that work when using G1 ? Tristan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Bela Ban Lead JGroups (http://www.jgroups.org) JBoss / Red Hat ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] again: no physical address
Hi Bela I guess it's pretty clear now... In Sanne's thread dump the main thread is blocked in a cache.put() call after the cluster has supposedly already formed: org.infinispan.benchmark.Transactional.main() prio=10 tid=0x7ff4045de000 nid=0x7c92 in Object.wait() [0x7ff40919d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x0007f61997d0 (a org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator.getResponseList(CommandAwareRpcDispatcher.java:372) ... at org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(DistributionManagerImpl.java:169) ... at org.infinispan.CacheSupport.put(CacheSupport.java:52) at org.infinispan.benchmark.Transactional.start(Transactional.java:110) at org.infinispan.benchmark.Transactional.main(Transactional.java:70) State transfer was disabled, so during the cluster startup the nodes only had to communicate with the coordinator and not between them. The put command had to get the old value from another node, so it needed the physical address and had to block until PING would retrieve it. Does PING use RSVP or does it wait for the normal STABLE timeout for retransmission? Note that everything is blocked at this point, we won't send another message in the entire cluster until we got the physical address. I'm sure you've already considered it before, but why not make the physical addresses a part of the view installation message? This should ensure that every node can communicate with every other node by the time the view is installed. I'm also not sure what to make of these lines: [org.jgroups.protocols.UDP] sanne-55119: no physical address for sanne-53650, dropping message [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-55119) sent to sanne-53650 timed out (after 3000 ms), retrying It appears that sanne-55119 knows the logical name of sanne-53650, and the fact that it's coordinator, but not its physical address. Shouldn't all of this information have arrived at the same time? Cheers Dan On Tue, Jan 31, 2012 at 4:31 PM, Bela Ban b...@redhat.com wrote: This happens every now and then, when multiple nodes join at the same time, on the same host and PING has a small num_initial_mbrs. Since 2.8, the identity of a member is not an IP address:port anymore, but a UUID. The UUID has to be mapped to an IP address (and port), and every member maintains a table of UUIDs/IP addresses. This table is populated at startup, but the shipping of the IP address/UUID association is unreliable (in the case of UDP), so packets do get dropped when there are traffic spikes, like concurrent startup, or when the high CPU usage slows down things. If we need to send a unicast message to P, and the table doesn't have a mapping for P, PING multicasts a discovery request, and drops the message. Every member responds with the IP address of P, which is then added to the table. The next time the message is sent (through retransmission), P's IP address will be available, and the unicast send should succeed. Of course, if the multicast or unicast response is dropped too, we'll run this protocol again... and again ... and again, until we finally have a valid IP address for P. On 1/31/12 11:29 AM, Manik Surtani wrote: I have sporadically seen this before when running some perf tests as well … curious to know what's up. On 30 Jan 2012, at 17:45, Sanne Grinovero wrote: Hi Bela, this is the same error we where having in Boston when preparing the Infinispan nodes for some of the demos. So I didn't see it for a long time, but today it returned especially to add a special twist to my performance tests. Dan, when this happened it looked like I had a deadlock: the benchmark is not making any more progress, it looks like they are all waiting for answers. JConsole didn't detect a deadlock, and unfortunately I'm not having more logs than this from nor JGroups nor Infinispan (since it was supposed to be a performance test!). I'm attaching a threaddump in case it interests you, but I hope not: this is a DIST test with 12 nodes (in the same VM from this dump). I didn't have time to inspect it myself as I have to run, and I think the interesting news here is with the no physical address ideas? [org.jboss.logging] Logging Provider: org.jboss.logging.Log4jLoggerProvider [org.jgroups.protocols.UDP] sanne-55119: no physical address for sanne-53650, dropping message [org.jgroups.protocols.pbcast.GMS] JOIN(sanne-55119) sent to sanne-53650 timed out (after 3000 ms), retrying [org.jgroups.protocols.pbcast.GMS] sanne-55119 already present; returning existing view [sanne-53650|5] [sanne-53650, sanne-49978, sanne-27401, sanne-4741, sanne-29196, sanne-55119] [org.jgroups.protocols.UDP]
Re: [infinispan-dev] DIST.retrieveFromRemoteSource
It's true, but then JGroups' GroupRequest does exactly the same thing... socket.send() takes some time too, I thought sending the request in parallel would mean calling socket.send() on a separate thread for each recipient. Cheers Dan On Fri, Jan 27, 2012 at 6:41 PM, Manik Surtani ma...@jboss.org wrote: Doesn't setBlockForResults(false) mean that we're not waiting on a response, and can proceed to the next message to the next recipient? On 27 Jan 2012, at 16:34, Dan Berindei wrote: Manik, Bela, I think we send the requests sequentially as well. In ReplicationTask.call: for (Address a : targets) { NotifyingFutureObject f = sendMessageWithFuture(constructMessage(buf, a), opts); futureCollator.watchFuture(f, a); } In MessageDispatcher.sendMessageWithFuture: UnicastRequestT req=new UnicastRequestT(msg, corr, dest, options); req.setBlockForResults(false); req.execute(); Did we use to send each request on a separate thread? Cheers Dan On Fri, Jan 27, 2012 at 1:21 PM, Bela Ban b...@redhat.com wrote: yes. On 1/27/12 12:13 PM, Manik Surtani wrote: On 25 Jan 2012, at 09:42, Bela Ban wrote: No, parallel unicasts will be faster, as an anycast to A,B,C sends the unicasts sequentially Is this still the case in JG 3.x? -- Bela Ban Lead JGroups (http://www.jgroups.org) JBoss / Red Hat ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] again: no physical address
On Wed, Feb 1, 2012 at 9:48 AM, Bela Ban b...@redhat.com wrote: On 1/31/12 10:55 PM, Dan Berindei wrote: Hi Bela I guess it's pretty clear now... In Sanne's thread dump the main thread is blocked in a cache.put() call after the cluster has supposedly already formed: org.infinispan.benchmark.Transactional.main() prio=10 tid=0x7ff4045de000 nid=0x7c92 in Object.wait() [0x7ff40919d000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on0x0007f61997d0 (a org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher$FutureCollator.getResponseList(CommandAwareRpcDispatcher.java:372) ... at org.infinispan.distribution.DistributionManagerImpl.retrieveFromRemoteSource(DistributionManagerImpl.java:169) ... at org.infinispan.CacheSupport.put(CacheSupport.java:52) at org.infinispan.benchmark.Transactional.start(Transactional.java:110) at org.infinispan.benchmark.Transactional.main(Transactional.java:70) State transfer was disabled, so during the cluster startup the nodes only had to communicate with the coordinator and not between them. The put command had to get the old value from another node, so it needed the physical address and had to block until PING would retrieve it. That's not the way it works; at startup of F, it sends its IP address with the discovery request. Everybody returns its IP address with the discovery response, so even though we have F only talking to A (the coordinator) initially, F will also know the IP addresses of A,B,C,D and E. Ok, I stand corrected... since we start all the nodes on the same thread, each of them should reply to the discovery request of the next nodes. However, num_initial_members was set to 3 (the Infinispan default). Could that make PING not wait for all the responses? If it's like that, then I suggest we set a (much) higher num_initial_members and a lower timeout in the default configuration. Does PING use RSVP No: (1) I don;'t want a dependency of Discovery on RSVP and (2) the discovery is unreliable; discovery requests or responses can get dropped. Right, I keep forgetting that every protocol is optional! or does it wait for the normal STABLE timeout for retransmission? Note that everything is blocked at this point, we won't send another message in the entire cluster until we got the physical address. As I said; this is an exceptional case, probably caused by Sanne starting 12 channels inside the same JVM, at the same time, therefore causing a traffic spike, which results in dropped discovery requests or responses. Bela, we create the caches on a single thread, so we never have more than one node joining at the same time. At most we could have some extra activity if one node can't join the existing cluster and starts a separate partition, but hardly enough to cause congestion. After than, when F wants to talk to C, it asks the cluster for C's IP address, and that should be a few ms at most. Ok, so when F wanted to send the ClusteredGetCommand request to C, PING got the physical address right away. But the ClusteredGetCommand had to wait for STABLE to kick in and for C to ask for retransmission (because we didn't send any other messages). Maybe *we* should use RSVP for our ClusteredGetCommands, since those can never block... Actually, we don't want to retransmit the request if we already got a response from another node, so it would be best if we could ask for retransmission of a particular request explicitly ;-) I wonder if we could also decrease desired_avg_gossip and stability_delay in STABLE. After all, an extra STABLE round can't slow us when we're not doing anything, and when we are busy we're going to hit the max_bytes limit much sooner than the desired_avg_gossip time limit anyway. I'm sure you've already considered it before, but why not make the physical addresses a part of the view installation message? This should ensure that every node can communicate with every other node by the time the view is installed. There's a few reasons: - I don't want to make GMS dependent on logical addresses. GMS is completely independent and shouldn't know about physical addresses - At the time GMS kicks in, it's already too late. Remember, F needs to send a unicast JOIN request to A, but at this point it doesn't yet know A's address - MERGE{2,3} also use discovery to detect sub-partitions to be merged, so discovery needs to be a separate piece of functionality - A View is already big as it is, and I've managed to reduce its size even more, but adding physical addresses would blow up the size of View even more, especially in large clusters Thanks for the explanation. I'm also not sure what to make of these lines
Re: [infinispan-dev] Proposal: ISPN-1394 Manual rehashing in 5.2
Bela, you're right, this is essentially what we talked about in Lisbon: https://community.jboss.org/wiki/AsymmetricCachesAndManualRehashingDesign For joins I actually started working on a policy of coalescing joins that happen one after the other in a short time interval. The current implementation is very primitive, as I shifted focus to stability, but it does coalesce joins 1 second after another join started (or while that join is still running). I don't quite agree with Sanne's assessment that it's fine for getCache() to block for 5 minutes until the administrator allows the new node to join. We should modify startCaches() instead to signal to the coordinator that we are ready to receive data for one or all of the defined caches, and wait with a customizable time limit until the caches have properly joined the cluster. The getCache() timeout should not be increased at all. Instead I would propose that getCache() returns a functional cache immediately, even if the cache didn't receive any data, and it works solely as an L1 cache until the administrator allows it to join. I'd even make it possible to designate a cache as an L1-only cache, so it's never an owner for any key. For leaves, the main problem is that every node has to compute the same primary owner for a key, at all times. So we need a 2PC cache view installation immediately after any leave to ensure that every node determines the primary owner in the same way - we can't coalesce or postpone leaves. For 5.2 I will try to decouple the cache view installation from the state transfer, so in theory we will be able to coalesce/postpone the state transfer for leaves as well (https://issues.jboss.org/browse/ISPN-1827). I'm kind of need it for non-blocking state transfer, because with the current implementation a leave forces us to cancel any state transfer in progress and restart with the updated cache view - a state transfer rollback will be very expensive with NBST. Erik does raise a valid point - with TACH, if we bring up a node with a different siteId, then it will be an owner for all the keys in the cache. That node probably isn't provisioned to hold all the keys, so it would very likely run out of memory or evict much of the data. I guess that makes it a 5.2 issue? Shutting down a site should be possible even with what we have now - just insert a DISCARD protocol in the JGroups stack of all the nodes that are shutting down, and when FD finally times out on the nodes in the surviving datacenter they won't have any state transfer to do (although it may cause a few failed state transfer attempts). We could make it simpler though. Cheers Dan On Tue, Jan 31, 2012 at 6:21 PM, Erik Salter an1...@hotmail.com wrote: ...such as bringing up a backup data center. -Original Message- From: infinispan-dev-boun...@lists.jboss.org [mailto:infinispan-dev-boun...@lists.jboss.org] On Behalf Of Bela Ban Sent: Tuesday, January 31, 2012 11:18 AM To: infinispan-dev@lists.jboss.org Subject: Re: [infinispan-dev] Proposal: ISPN-1394 Manual rehashing in 5.2 I cannot volunteer either, but I find it important to be done in 5.2 ! Unless rehashing works flawlessly with a large number of nodes joining at the same time, I think manual rehashing is crucial... On 1/31/12 5:13 PM, Sanne Grinovero wrote: On 31 January 2012 16:06, Bela Banb...@redhat.com wrote: This is essentially what I suggested at the Lisbon meeting, right ? Yes! I think Dan had a design wiki on this somewhere... Just rising it here as it was moved to 6.0, while I think it deserves a dedicated thread to better think about it. If it's not hard, I think it should be done sooner. But while I started the thread to wake up the brilliant minds, I can't volunteer for this to make it happen. Sanne On 1/31/12 4:53 PM, Sanne Grinovero wrote: I think this is an important feature to have soon; My understanding of it: We default with the feature off, and newly discovered nodes are added/removed as usual. With a JMX operatable switch, one can disable this: If a remote node is joining the JGroups view, but rehash is off: it will be added to a to-be-installed view, but this won't be installed until rehash is enabled again. This gives time to add more changes before starting the rehash, and would help a lot to start larger clusters. If the [self] node is booting and joining a cluster with manual rehash off, the start process and any getCache() invocation should block and wait for it to be enabled. This would need of course to override the usually low timeouts. When a node is suspected it's a bit a different story as we need to make sure no data is lost. The principle is the same, but maybe we should have two flags: one which is a soft request to avoid rehashes of less than N members (and refuse N=numOwners ?), one which is just disable it and don't care: data might be in a cachestore, data might not be important. Which reminds me, we should consider
Re: [infinispan-dev] Proposal: ISPN-1394 Manual rehashing in 5.2
On Wed, Feb 1, 2012 at 1:46 PM, Sanne Grinovero sa...@infinispan.org wrote: On 1 February 2012 11:23, Dan Berindei dan.berin...@gmail.com wrote: Bela, you're right, this is essentially what we talked about in Lisbon: https://community.jboss.org/wiki/AsymmetricCachesAndManualRehashingDesign For joins I actually started working on a policy of coalescing joins that happen one after the other in a short time interval. The current implementation is very primitive, as I shifted focus to stability, but it does coalesce joins 1 second after another join started (or while that join is still running). I don't quite agree with Sanne's assessment that it's fine for getCache() to block for 5 minutes until the administrator allows the new node to join. We should modify startCaches() instead to signal to the coordinator that we are ready to receive data for one or all of the defined caches, and wait with a customizable time limit until the caches have properly joined the cluster. The getCache() timeout should not be increased at all. Instead I would propose that getCache() returns a functional cache immediately, even if the cache didn't receive any data, and it works solely as an L1 cache until the administrator allows it to join. I'd even make it possible to designate a cache as an L1-only cache, so it's never an owner for any key. I agree that would be very nice, but makes it much more complex to implement in 5.2 as well: functional L1 means that the other nodes must accept this node as part of the grid, including for L1 invalidation purposes. I don't think L1 would be a problem, the L1 code doesn't assume that the requestor is in the CH. That would basically be the only difference between a normal node and a L1-only node. However, I'm sure there will be problems in other areas, so I wouldn't push this for 5.2. So my proposal on blocking until ready is to make a first step, and I think it would still be very useful for people wanting to boot some ~100 nodes. Blocking the application is not a big deal, as you're delaying boot of an application which was likely not even powered on before. When adding several new nodes, I just want them to add all at once, so preventing intermediate rehashing: until all have joined you should block rehash - that's a manual (or more likely automated externally) step and will not be engaged for long, nor it would replace normal behaviour when disabled. Actually even more useful would be to start a node with such an no-rehash option enabled. The rehash or no-rehash decision has to be made on the coordinator, he's the one that's going to trigger the rehash. So any flag that allows or disallows rehashing has to be propagated to the coordinator, and maybe even to the entire cluster in case the coordinator goes down. For leaves, the main problem is that every node has to compute the same primary owner for a key, at all times. So we need a 2PC cache view installation immediately after any leave to ensure that every node determines the primary owner in the same way - we can't coalesce or postpone leaves. For 5.2 I will try to decouple the cache view installation from the state transfer, so in theory we will be able to coalesce/postpone the state transfer for leaves as well (https://issues.jboss.org/browse/ISPN-1827). I'm kind of need it for non-blocking state transfer, because with the current implementation a leave forces us to cancel any state transfer in progress and restart with the updated cache view - a state transfer rollback will be very expensive with NBST. Erik does raise a valid point - with TACH, if we bring up a node with a different siteId, then it will be an owner for all the keys in the cache. That node probably isn't provisioned to hold all the keys, so it would very likely run out of memory or evict much of the data. I guess that makes it a 5.2 issue? That's exactly my reasoning as well. There are situations in which you want to add several nodes once, which is very different than in rapid sequence as that would storm the network with data shuffling in all directions. I don't think the network traffic would be that bad, if the first rehash takes a long time then all other joiners will queue up and join at the same time. So we're looking at just one extra rehash, but the problem is with that first rehash. Shutting down a site should be possible even with what we have now - just insert a DISCARD protocol in the JGroups stack of all the nodes that are shutting down, and when FD finally times out on the nodes in the surviving datacenter they won't have any state transfer to do (although it may cause a few failed state transfer attempts). We could make it simpler though. Yes, use a pair of scissors ;-) --Sanne Cheers Dan On Tue, Jan 31, 2012 at 6:21 PM, Erik Salter an1...@hotmail.com wrote: ...such as bringing up a backup data center. -Original Message- From: infinispan-dev
Re: [infinispan-dev] again: no physical address
Hi Bela I think I found why you weren't seeing the warnings. The bench-log4j.xml in github master is configured to log only to the log file (benchmark.log). If you add an appender-ref ref=CONSOLE/ you'll see the warnings on the console as well. I am now able to reproduce it pretty reliably, even from IntelliJ. I created a run configuration for the Transactional class, I set the module to infinispan-5.1-SNAPSHOT, and I added these JVM arguments: -Dbench.loops=3000 -Dbench.writerThreads=33 -Dbench.readerThreads=10 -Dbench.dist=true -Dbench.transactional=true -Dbench.numkeys=500 -Dbench.payloadsize=10240 -Dbench.nodes=12 -Djava.net.preferIPv4Stack -Dlog4j.configuration=bench-log4j.xml In Sanne's thread dump it was a ClusteredGetCommand that was waiting for a physical address, but in my test runs it's usually a JOIN that's failing. And it seems that it happens because we got a discovery response from 3 nodes (that is our initial num_initial_members), but none of them was the coordinator: 2012-02-02 08:04:41,672 TRACE [org.jgroups.protocols.PING] (main) discovery took 8 ms: responses: 3 total (2 servers (0 coord), 1 clients) 2012-02-02 08:04:41,672 TRACE [org.jgroups.protocols.pbcast.GMS] (main) localhost-47135: initial_mbrs are localhost-36539 localhost-30921 2012-02-02 08:04:41,672 DEBUG [org.jgroups.protocols.pbcast.GMS] (main) election results: {localhost-32436=2} 2012-02-02 08:04:41,672 DEBUG [org.jgroups.protocols.pbcast.GMS] (main) sending JOIN(localhost-47135) to localhost-32436 2012-02-02 08:04:41,673 TRACE [org.jgroups.protocols.UNICAST2] (main) localhost-47135: created connection to localhost-32436 (conn_id=0) 2012-02-02 08:04:41,673 TRACE [org.jgroups.protocols.UNICAST2] (main) localhost-47135 -- DATA(localhost-32436: #1, conn_id=0, first) 2012-02-02 08:04:41,673 TRACE [org.jgroups.protocols.UDP] (main) sending msg to localhost-32436, src=localhost-47135, headers are GMS: GmsHeader[JOIN_REQ]: mbr=localhost-47135, UNICAST2: DATA, seqno=1, first, UDP: [channel_name=ISPN] 2012-02-02 08:04:41,673 WARN [org.jgroups.protocols.UDP] (main) localhost-47135: no physical address for localhost-32436, dropping message Cheers Dan On Wed, Feb 1, 2012 at 7:12 PM, Sanne Grinovero sa...@infinispan.org wrote: On 1 February 2012 16:40, Bela Ban b...@redhat.com wrote: Your benchmark is giving me the creeps ! Manik was the original author, I've only been adapting it slightly to identify performance issues. I wouldn't have used Maven either, but it's serving me well especially since it turns out I have to frequently change JGroups version ;-) First, which version of JGroups / Infinispan does this use ? Second, is there a way to start this in an IDE rather than through maven ? Third, I don't think bench-jgroups.xml is picked up at all ! How do I make a change to bench-jgroups.xml and have Transactional *use* it ? 1) The intention is to compare Infinispan versions, and see how we make progress, so the JGroups version is defined as each different Infinispan version was released with, or in case of a SNAPSHOT the version it's currently using. So, assuming you have a fresh snapshot of Infinispan 5.1.0-SNAPSHOT, it's JGroups 3.0.4.Final 2) As I pointed out in the previous mail, I've reconfigured it for you to not pick up the bench-jgroups.xml but go with Infinispan's default UDP configuration. You have to comment line 15 of bench.sh to use the bench-jgroups.xml, but then I can't reproduce the issue anymore. This maven crap hides so much, I never know what's going on under the covers !@$@ Do I have to do mvn install or mvn package when I make a change to bench-jgroups.xml ? Aren't you using bench.sh as I pointed out in the script in my previous mail? It does mvn install before running it. But it's also configured to NOT use bench-jgroups.xml, but rather jgroups-udp.xml. I don't think I'll ever switch to this f*cking piece of shit ! I don't think Maven is to be blamed today! What's wrong? Anyway one of the nice things of this little benchmark is exactly that it it's a single class with a main file, so you can just import it in you IDE and run. Any IDE will pick the correct dependencies, thanks to Maven. Just that if you do, it will use the default test properties as hardcoded in the test class org.infinispan.benchmark.Transactional: please set the same environment variable as bench.sh does, unless you don't want to run my same configuration. Just ping me for any doubt. Cheers, Sanne Please ping when as soon as you've calmed down... :-) Cheers, On 2/1/12 4:52 PM, Sanne Grinovero wrote: On 1 February 2012 15:18, Bela Banb...@redhat.com wrote: On 2/1/12 10:25 AM, Dan Berindei wrote: That's not the way it works; at startup of F, it sends its IP address with the discovery request. Everybody returns its IP address with the discovery response, so even though we have F only talking to A (the coordinator) initially, F will also know the IP addresses
Re: [infinispan-dev] DIST.retrieveFromRemoteSource
On Sat, Feb 4, 2012 at 5:23 PM, Bela Ban b...@redhat.com wrote: No, Socket.send() is not a hotspot in the Java code. On the networking level, with UDP datagrams, a packet is placed into a buffer and send() returns after the packet has been placed into the buffer successfully. Some network stacks simply discard the packet if the buffer is full, I assume. With TCP, send() blocks until the packet has been placed into the buffer and an ack has been received. If the receiver can't keep up with the sending rate, it'll reduce the window to 0, so send() blocks until the receiver catches up. So if you send 2 requests in paralle over TCP, both threads would block() on the send. So I don't think parallelization would make sense here, unless of course you're doing something else, such as serialization, lock acquisition etc... You're probably right... When I sent the email I had only seen that DataGramSocket.send takes 0.4% of the get worker's total time (which is huge considering that 99.3% is spent waiting for the remote node's response). I didn't notice that it's synchronized and the actual sending only takes half of that - sending the messages in parallel would create more lock contention on the socket *and* in JGroups. On the other hand, we're sending the messages to two different nodes, on two different sockets, so sending in parallel *may* improve the response time in scenarios with few worker threads. Certainly not worth making our heavy-load performance worse, but I thought it was worth mentioning. Anyway, my primary motivation for this question was that I believe we could use GroupRequest instead of our FutureCollator to send our commands in parallel. They both do essentially the same thing, except FutureCollator has and extra indirection layer because it uses UnicastRequest. If there is any performance discrepancy at the moment between GroupRequest and FutureCollator+UnicastRequest, I don't think there is any fundamental reason why we can't fix GroupRequest to be just as efficient as FutureCollator+UnicastRequest. I think I just saw one such fix: in RequestCorrelator.sendRequest, if anycasting is enabled then it's making a copy of the buffer for each target member. I don't think that is necessary at all, in fact I think it should reuse both the buffer and the headers and only change the destination address. Cheers Dan On 2/4/12 3:43 PM, Manik Surtani wrote: Is that a micro-optimisation? Do we know that socket.send() really is a hotspot? On 1 Feb 2012, at 00:11, Dan Berindei wrote: It's true, but then JGroups' GroupRequest does exactly the same thing... socket.send() takes some time too, I thought sending the request in parallel would mean calling socket.send() on a separate thread for each recipient. Cheers Dan On Fri, Jan 27, 2012 at 6:41 PM, Manik Surtanima...@jboss.org wrote: Doesn't setBlockForResults(false) mean that we're not waiting on a response, and can proceed to the next message to the next recipient? On 27 Jan 2012, at 16:34, Dan Berindei wrote: Manik, Bela, I think we send the requests sequentially as well. In ReplicationTask.call: for (Address a : targets) { NotifyingFutureObject f = sendMessageWithFuture(constructMessage(buf, a), opts); futureCollator.watchFuture(f, a); } In MessageDispatcher.sendMessageWithFuture: UnicastRequestT req=new UnicastRequestT(msg, corr, dest, options); req.setBlockForResults(false); req.execute(); Did we use to send each request on a separate thread? Cheers Dan On Fri, Jan 27, 2012 at 1:21 PM, Bela Banb...@redhat.com wrote: yes. On 1/27/12 12:13 PM, Manik Surtani wrote: On 25 Jan 2012, at 09:42, Bela Ban wrote: No, parallel unicasts will be faster, as an anycast to A,B,C sends the unicasts sequentially Is this still the case in JG 3.x? -- Bela Ban Lead JGroups (http://www.jgroups.org) JBoss / Red Hat ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev -- Manik Surtani ma...@jboss.org twitter.com/maniksurtani Lead, Infinispan http://www.infinispan.org ___ infinispan-dev mailing list infinispan-dev
Re: [infinispan-dev] Proposal: ISPN-1394 Manual rehashing in 5.2
On Sat, Feb 4, 2012 at 4:49 PM, Manik Surtani ma...@jboss.org wrote: On 1 Feb 2012, at 12:23, Dan Berindei wrote: Bela, you're right, this is essentially what we talked about in Lisbon: https://community.jboss.org/wiki/AsymmetricCachesAndManualRehashingDesign For joins I actually started working on a policy of coalescing joins that happen one after the other in a short time interval. The current implementation is very primitive, as I shifted focus to stability, but it does coalesce joins 1 second after another join started (or while that join is still running). I don't quite agree with Sanne's assessment that it's fine for getCache() to block for 5 minutes until the administrator allows the new node to join. We should modify startCaches() instead to signal to the coordinator that we are ready to receive data for one or all of the defined caches, and wait with a customizable time limit until the caches have properly joined the cluster. The getCache() timeout should not be increased at all. Instead I would propose that getCache() returns a functional cache immediately, even if the cache didn't receive any data, and it works solely as an L1 cache until the administrator allows it to join. I'd even make it possible to designate a cache as an L1-only cache, so it's never an owner for any key. I presume this would be encoded in the Address? That would make sense for a node permanently designated as an L1 node. But then how would this work for a node temporarily acting as L1 only, until it has been allowed to join? Change the Address instance on the fly? A delegating Address? :/ Nope, not in the Address. Since we now have virtual cache views, every node has to explicitly request to join each cache. It would be quite easy to add an L1-only flag to the join request, and nodes with that flag would never be included in the proper cache view. They would however get a copy of the cache view on join and on every cache view installation so they could update their CH and send requests to the proper owners. Nodes acting temporarily as L1-only would send a normal join request, but they would also receive a copy of the cache view and getCache() would return immediately instead of waiting for the node to receive state. When the coordinator finally installs a cache view that includes them, they will perform the initial state transfer as they do now. For leaves, the main problem is that every node has to compute the same primary owner for a key, at all times. So we need a 2PC cache view installation immediately after any leave to ensure that every node determines the primary owner in the same way - we can't coalesce or postpone leaves. Yes, manual rehashing would probably just be for joins. Controlled shutdown in itself is manual, and crashes, well, need to be dealt with immediately IMO. We could extend the policy for craches as well, by adding a minNumOwners setting and only triggering an automatic rehash when there is a segment on the hash wheel with = minNumOwners owners. We would have a properly installed CH, that guarantees at some point in the past each key had numOwners owners, and a filter on top of it that removes any leavers from the result of DM.locate(), DM.getPrimaryLocation() etc. It would probably undo our recent optimizations around locate and getPrimaryLocation, so it's slowing the normal case (without any leavers) in order to make the exceptional case (organized shutdown or a part of the cluster) faster. The question is how big the cluster has to get before the exceptional case becomes common enough that it's worth optimizing for... For 5.2 I will try to decouple the cache view installation from the state transfer, so in theory we will be able to coalesce/postpone the state transfer for leaves as well (https://issues.jboss.org/browse/ISPN-1827). I'm kind of need it for non-blocking state transfer, because with the current implementation a leave forces us to cancel any state transfer in progress and restart with the updated cache view - a state transfer rollback will be very expensive with NBST. Erik does raise a valid point - with TACH, if we bring up a node with a different siteId, then it will be an owner for all the keys in the cache. That node probably isn't provisioned to hold all the keys, so it would very likely run out of memory or evict much of the data. I guess that makes it a 5.2 issue? Yes. Shutting down a site should be possible even with what we have now - just insert a DISCARD protocol in the JGroups stack of all the nodes that are shutting down, and when FD finally times out on the nodes in the surviving datacenter they won't have any state transfer to do (although it may cause a few failed state transfer attempts). We could make it simpler though. Cheers Dan On Tue, Jan 31, 2012 at 6:21 PM, Erik Salter an1...@hotmail.com wrote: ...such as bringing up a backup data center. -Original Message
Re: [infinispan-dev] DIST.retrieveFromRemoteSource
On Sun, Feb 5, 2012 at 12:24 PM, Bela Ban b...@redhat.com wrote: On 2/5/12 9:44 AM, Dan Berindei wrote: On Sat, Feb 4, 2012 at 5:23 PM, Bela Banb...@redhat.com wrote: No, Socket.send() is not a hotspot in the Java code. On the networking level, with UDP datagrams, a packet is placed into a buffer and send() returns after the packet has been placed into the buffer successfully. Some network stacks simply discard the packet if the buffer is full, I assume. With TCP, send() blocks until the packet has been placed into the buffer and an ack has been received. If the receiver can't keep up with the sending rate, it'll reduce the window to 0, so send() blocks until the receiver catches up. So if you send 2 requests in paralle over TCP, both threads would block() on the send. So I don't think parallelization would make sense here, unless of course you're doing something else, such as serialization, lock acquisition etc... You're probably right... When I sent the email I had only seen that DataGramSocket.send takes 0.4% of the get worker's total time (which is huge considering that 99.3% is spent waiting for the remote node's response). I didn't notice that it's synchronized and the actual sending only takes half of that - sending the messages in parallel would create more lock contention on the socket *and* in JGroups. On the other hand, we're sending the messages to two different nodes, on two different sockets, so sending in parallel *may* improve the response time in scenarios with few worker threads. Certainly not worth making our heavy-load performance worse, but I thought it was worth mentioning. If you think this makes a difference, why don't you make a temporary code change and measure its effect on performance ? I will give it a try, it's just that writing emails is a bit easier ;-) Anyway, my primary motivation for this question was that I believe we could use GroupRequest instead of our FutureCollator to send our commands in parallel. They both do essentially the same thing, except FutureCollator has and extra indirection layer because it uses UnicastRequest. Ah, ok. Yes, I think that 1 GroupRequest of 2 might be more efficient than 2 UnicastRequests for sending an 'anycast' message to 2 members. Perhaps the marshalling is done only once (I'd have to confirm that though) and we're only creating 1 data structure and add it to a hashmap (request/response correlation)... Certainly worth giving a try... We are only serializing once with the FutureCollator approach, and we don't copy the buffer either. If there is any performance discrepancy at the moment between GroupRequest and FutureCollator+UnicastRequest, I don't think there is any fundamental reason why we can't fix GroupRequest to be just as efficient as FutureCollator+UnicastRequest. You're assuming that FutureCollator+UnicastRequest is faster than GroupRequest, what are you basing your assumption on ? As I outlined above, I'd rather assume the opposite, although I don't know FutureCollator. I did say *if*... Maybe we should have a chat next week to go through the code that sends an anycast to 2 members, wdyt ? Sounds good, we should also also talk about how to implement staggered get requests best. I think I just saw one such fix: in RequestCorrelator.sendRequest, if anycasting is enabled then it's making a copy of the buffer for each target member. I don't think that is necessary at all, in fact I think it should reuse both the buffer and the headers and only change the destination address. No, this *is* necessary, I don't just make copies because I think this is fun !! :-) Remember that a message might be retransmitted, so it is placed into a retransmit buffer. If M1 has destination A and M2 has destination B, and we send M1 first (to A), then change M1's destination to B, and send it, everything is fine. However, if we later get a retransmit request from B, we'd resend the message to A instead ! This is just 1 example, modifications of headers is another one. Note that the copy does *not* copy the buffer (payload) itself, but only references it, so this is fast. Of course, nobody is supposed to modify the contents of the buffer itself... I wasn't clear enough, but I didn't mean we should reuse the entire Message object. I meant we should copy the Message but not the buffer or the headers. I see now that protocols may be adding new headers, so it wouldn't be safe to reuse the headers collection. I think this line in RequestCorrelator.sendRequest(RequestCorrelator.java:152) means that the contents of the buffer is copied in the new message, not just the buffer reference: Message copy=msg.copy(true); Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev
Re: [infinispan-dev] DIST.retrieveFromRemoteSource
On Sun, Feb 5, 2012 at 5:56 PM, Bela Ban b...@redhat.com wrote: On 2/5/12 4:40 PM, Dan Berindei wrote: Remember that a message might be retransmitted, so it is placed into a retransmit buffer. If M1 has destination A and M2 has destination B, and we send M1 first (to A), then change M1's destination to B, and send it, everything is fine. However, if we later get a retransmit request from B, we'd resend the message to A instead ! This is just 1 example, modifications of headers is another one. Note that the copy does *not* copy the buffer (payload) itself, but only references it, so this is fast. Of course, nobody is supposed to modify the contents of the buffer itself... I wasn't clear enough, but I didn't mean we should reuse the entire Message object. I meant we should copy the Message but not the buffer or the headers. I see now that protocols may be adding new headers, so it wouldn't be safe to reuse the headers collection. I think this line in RequestCorrelator.sendRequest(RequestCorrelator.java:152) means that the contents of the buffer is copied in the new message, not just the buffer reference: Message copy=msg.copy(true); No, this does *not* copy the buffer, but simply references the same buffer. Aha, I thought copy_buffer == true meant copy the contents and copy_buffer == false meant share the contents. I see copy_buffer == true actually means copy the reference, share the contents and copy_buffer == false means don't copy anything. I will modify our CommandAwareRpcDispatcher to use GroupRequest and see how they compare, then we can continue this discussion with the results. Cheers Dan ___ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev