Re: Casandara and Jackrabbit
Well, I'll try to answer that. As far as I know, after looking into the PM source code, at the time that store(change log) is called, the javadoc says that the change log must be saved completely or nothing. That is basically saying, if it fails, then rollback. If you use a bundle approach, as I did, each bundle is a node and its properties. The bundle id is my row key in Cassandra, hence, I save a bundle per row. So, since each bundle is stored in a row/key, tipically in a chage log, you will have many bundles which means many rows or keys. At this point Cassandra does no assures you that all invocations will be successful. so far, Cassandra support transactions at the row level. That says, the row is saved or not at all. I read there is some intentions to add the feature of transactions across multiple keys in the future. Please feel free to question my thoughts. I would like to get to a better approach is there is a way. Regards Patricio On Thu, Feb 18, 2010 at 1:14 AM, Ian Boston wrote: > > On 18 Feb 2010, at 06:39, Patricio Echagüe wrote: > > > Hi, I have written a Cassandra Persistence Manager but as Jukka > mentioned, > > there are some issues to resolve. > > The other problem I see is that Cassandra does not support transaction > > across multiple keys. > > I havent looked at the PM yet, but have been doing a JGroups based > ClusterNode/Journal as a start, was not intending to put the Journal in > Cassandra (that might be a mistake) > > Remind me, > Is the problem that Cassandra might not take multiple key updates to > Bundle, Binval and Refs atomically or that its not possible to have the > "transaction monitor" inside the PM. IIRC the PM is a form of transaction > monitor ? > > Also, (potentially dumb question), could you merge Bundle, Binval and Refs > ? (or would that just kill performance) > Ian > > > > > The JR PM assumes that the change logs are saved all or none of them. > That > > represents another issue. (I assume a bundle approach) > > > > Just to keep the discussion, this is my Keyspace > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Patricio > > > > On Wed, Feb 17, 2010 at 6:47 AM, Ian Boston wrote: > > > >> > >> On 17 Feb 2010, at 14:34, Jukka Zitting wrote: > >> > >>> Hi, > >>> > >>> On Wed, Feb 17, 2010 at 11:53 AM, Ian Boston wrote: > On 17 Feb 2010, at 10:43, Jukka Zitting wrote: > > I'm not aware of anything like that, though there's been some > > discussion about persistence on top of distributed databases or hash > > tables. The main problem with such approaches is the eventual > > consistency model that can be troublesome for the current Jackrabbit > > architecture. > > Is that because, in a cluster one JR node might get an event that an > Item exists, but its not yet present on the backend its connected to, > and there are no guarantees over the order in which items appear, > so, for instance, the hierarchy manager might find a child but not > the parent? > >>> > >>> Exactly. The current Jackrabbit architecture assumes that the > >>> underlying persistence store is always (not just eventually) > >>> consistent. > >> > >> > >> Ok, I'm looking into ways to address that. The other one, I suspect is > >> going to be the sequence ID on the journal event stream, which IIRC > needs to > >> be sequential so the journal can be replayed, also since the biggest > source > >> of eventually consistent errors is going to be that journal stream I am > >> thinking binding the journal with the PM might allow remote nodes to > know > >> when a local item is out of date, if an item in the local PM is stale, > and > >> which server to get the latest item copy from. That assumes nothing > changes > >> without a journal record being emitted. I want to avoid sending the > journal > >> via storage. > >> > >>> > Do you have any pointers to the discussions so I can go and read ? > >>> > >>> It's been mostly coffee room discussions so far, but I'll bring up the > >>> topic soon on dev@ as a part of a larger Jackrabbit 3 roadmap > >>> discussion. > >> > >> I commented on you JR3 rfi, but I am still thinking about JR2 > >> > >>> > >>> BR, > >>> > >>> Jukka Zitting > >> > >> > > > > > > -- > > Patricio.- > > -- Patricio.-
Re: Casandara and Jackrabbit
On 18 Feb 2010, at 06:39, Patricio Echagüe wrote: > Hi, I have written a Cassandra Persistence Manager but as Jukka mentioned, > there are some issues to resolve. > The other problem I see is that Cassandra does not support transaction > across multiple keys. I havent looked at the PM yet, but have been doing a JGroups based ClusterNode/Journal as a start, was not intending to put the Journal in Cassandra (that might be a mistake) Remind me, Is the problem that Cassandra might not take multiple key updates to Bundle, Binval and Refs atomically or that its not possible to have the "transaction monitor" inside the PM. IIRC the PM is a form of transaction monitor ? Also, (potentially dumb question), could you merge Bundle, Binval and Refs ? (or would that just kill performance) Ian > The JR PM assumes that the change logs are saved all or none of them. That > represents another issue. (I assume a bundle approach) > > Just to keep the discussion, this is my Keyspace > > > > > > > > > > > > > > > > > > > > Patricio > > On Wed, Feb 17, 2010 at 6:47 AM, Ian Boston wrote: > >> >> On 17 Feb 2010, at 14:34, Jukka Zitting wrote: >> >>> Hi, >>> >>> On Wed, Feb 17, 2010 at 11:53 AM, Ian Boston wrote: On 17 Feb 2010, at 10:43, Jukka Zitting wrote: > I'm not aware of anything like that, though there's been some > discussion about persistence on top of distributed databases or hash > tables. The main problem with such approaches is the eventual > consistency model that can be troublesome for the current Jackrabbit > architecture. Is that because, in a cluster one JR node might get an event that an Item exists, but its not yet present on the backend its connected to, and there are no guarantees over the order in which items appear, so, for instance, the hierarchy manager might find a child but not the parent? >>> >>> Exactly. The current Jackrabbit architecture assumes that the >>> underlying persistence store is always (not just eventually) >>> consistent. >> >> >> Ok, I'm looking into ways to address that. The other one, I suspect is >> going to be the sequence ID on the journal event stream, which IIRC needs to >> be sequential so the journal can be replayed, also since the biggest source >> of eventually consistent errors is going to be that journal stream I am >> thinking binding the journal with the PM might allow remote nodes to know >> when a local item is out of date, if an item in the local PM is stale, and >> which server to get the latest item copy from. That assumes nothing changes >> without a journal record being emitted. I want to avoid sending the journal >> via storage. >> >>> Do you have any pointers to the discussions so I can go and read ? >>> >>> It's been mostly coffee room discussions so far, but I'll bring up the >>> topic soon on dev@ as a part of a larger Jackrabbit 3 roadmap >>> discussion. >> >> I commented on you JR3 rfi, but I am still thinking about JR2 >> >>> >>> BR, >>> >>> Jukka Zitting >> >> > > > -- > Patricio.-
Re: Casandara and Jackrabbit
Hi, I have written a Cassandra Persistence Manager but as Jukka mentioned, there are some issues to resolve. The other problem I see is that Cassandra does not support transaction across multiple keys. The JR PM assumes that the change logs are saved all or none of them. That represents another issue. (I assume a bundle approach) Just to keep the discussion, this is my Keyspace Patricio On Wed, Feb 17, 2010 at 6:47 AM, Ian Boston wrote: > > On 17 Feb 2010, at 14:34, Jukka Zitting wrote: > > > Hi, > > > > On Wed, Feb 17, 2010 at 11:53 AM, Ian Boston wrote: > >> On 17 Feb 2010, at 10:43, Jukka Zitting wrote: > >>> I'm not aware of anything like that, though there's been some > >>> discussion about persistence on top of distributed databases or hash > >>> tables. The main problem with such approaches is the eventual > >>> consistency model that can be troublesome for the current Jackrabbit > >>> architecture. > >> > >> Is that because, in a cluster one JR node might get an event that an > >> Item exists, but its not yet present on the backend its connected to, > >> and there are no guarantees over the order in which items appear, > >> so, for instance, the hierarchy manager might find a child but not > >> the parent? > > > > Exactly. The current Jackrabbit architecture assumes that the > > underlying persistence store is always (not just eventually) > > consistent. > > > Ok, I'm looking into ways to address that. The other one, I suspect is > going to be the sequence ID on the journal event stream, which IIRC needs to > be sequential so the journal can be replayed, also since the biggest source > of eventually consistent errors is going to be that journal stream I am > thinking binding the journal with the PM might allow remote nodes to know > when a local item is out of date, if an item in the local PM is stale, and > which server to get the latest item copy from. That assumes nothing changes > without a journal record being emitted. I want to avoid sending the journal > via storage. > > > > >> Do you have any pointers to the discussions so I can go and read ? > > > > It's been mostly coffee room discussions so far, but I'll bring up the > > topic soon on dev@ as a part of a larger Jackrabbit 3 roadmap > > discussion. > > I commented on you JR3 rfi, but I am still thinking about JR2 > > > > > BR, > > > > Jukka Zitting > > -- Patricio.-
Re: Casandara and Jackrabbit
On 17 Feb 2010, at 14:34, Jukka Zitting wrote: > Hi, > > On Wed, Feb 17, 2010 at 11:53 AM, Ian Boston wrote: >> On 17 Feb 2010, at 10:43, Jukka Zitting wrote: >>> I'm not aware of anything like that, though there's been some >>> discussion about persistence on top of distributed databases or hash >>> tables. The main problem with such approaches is the eventual >>> consistency model that can be troublesome for the current Jackrabbit >>> architecture. >> >> Is that because, in a cluster one JR node might get an event that an >> Item exists, but its not yet present on the backend its connected to, >> and there are no guarantees over the order in which items appear, >> so, for instance, the hierarchy manager might find a child but not >> the parent? > > Exactly. The current Jackrabbit architecture assumes that the > underlying persistence store is always (not just eventually) > consistent. Ok, I'm looking into ways to address that. The other one, I suspect is going to be the sequence ID on the journal event stream, which IIRC needs to be sequential so the journal can be replayed, also since the biggest source of eventually consistent errors is going to be that journal stream I am thinking binding the journal with the PM might allow remote nodes to know when a local item is out of date, if an item in the local PM is stale, and which server to get the latest item copy from. That assumes nothing changes without a journal record being emitted. I want to avoid sending the journal via storage. > >> Do you have any pointers to the discussions so I can go and read ? > > It's been mostly coffee room discussions so far, but I'll bring up the > topic soon on dev@ as a part of a larger Jackrabbit 3 roadmap > discussion. I commented on you JR3 rfi, but I am still thinking about JR2 > > BR, > > Jukka Zitting
Re: Casandara and Jackrabbit
Hi, On Wed, Feb 17, 2010 at 11:53 AM, Ian Boston wrote: > On 17 Feb 2010, at 10:43, Jukka Zitting wrote: >> I'm not aware of anything like that, though there's been some >> discussion about persistence on top of distributed databases or hash >> tables. The main problem with such approaches is the eventual >> consistency model that can be troublesome for the current Jackrabbit >> architecture. > > Is that because, in a cluster one JR node might get an event that an > Item exists, but its not yet present on the backend its connected to, > and there are no guarantees over the order in which items appear, > so, for instance, the hierarchy manager might find a child but not > the parent? Exactly. The current Jackrabbit architecture assumes that the underlying persistence store is always (not just eventually) consistent. > Do you have any pointers to the discussions so I can go and read ? It's been mostly coffee room discussions so far, but I'll bring up the topic soon on dev@ as a part of a larger Jackrabbit 3 roadmap discussion. BR, Jukka Zitting
Re: Casandara and Jackrabbit
On 17 Feb 2010, at 10:43, Jukka Zitting wrote: > Hi, > > On Wed, Feb 17, 2010 at 11:29 AM, Ian Boston wrote: >> Has anyone written a PM or SPI implementation based on Casandra ? > > I'm not aware of anything like that, though there's been some > discussion about persistence on top of distributed databases or hash > tables. The main problem with such approaches is the eventual > consistency model that can be troublesome for the current Jackrabbit > architecture. Is that because, in a cluster one JR node might get an event that an Item exists, but its not yet present on the backend its connected to, and there are no guarantees over the order in which items appear, so, for instance, the hierarchy manager might find a child but not the parent? Do you have any pointers to the discussions so I can go and read ? Thanks Ian > > BR, > > Jukka Zitting
Re: Casandara and Jackrabbit
Hi, On Wed, Feb 17, 2010 at 11:29 AM, Ian Boston wrote: > Has anyone written a PM or SPI implementation based on Casandra ? I'm not aware of anything like that, though there's been some discussion about persistence on top of distributed databases or hash tables. The main problem with such approaches is the eventual consistency model that can be troublesome for the current Jackrabbit architecture. BR, Jukka Zitting
Casandara and Jackrabbit
Hi, Has anyone written a PM or SPI implementation based on Casandra ? Ian