Re: Casandara and Jackrabbit

2010-02-18 Thread Ian Boston

On 18 Feb 2010, at 06:39, Patricio Echagüe wrote:

 Hi, I have written a Cassandra Persistence Manager but as Jukka mentioned,
 there are some issues to resolve.
 The other problem I see is that Cassandra does not support transaction
 across multiple keys.

I havent looked at the PM yet, but have been doing a JGroups based 
ClusterNode/Journal as a start, was not intending to put the Journal in 
Cassandra (that might be a mistake)

Remind me, 
Is the problem that Cassandra might not take multiple key updates to Bundle, 
Binval and Refs atomically or that its not possible to have the transaction 
monitor inside the PM. IIRC the PM is a form of transaction monitor ?

Also, (potentially dumb question), could you merge Bundle, Binval and Refs ? 
(or would that just kill performance)
Ian



 The JR PM assumes that the change logs are saved all or none of them. That
 represents another issue. (I assume a bundle approach)
 
 Just to keep the discussion, this is my Keyspace
 
  Keyspaces
   Keyspace Name=myCassandraPM
 
  ColumnFamily CompareWith=BytesType Name=WORKSPACE_Bundle /
  ColumnFamily CompareWith=BytesType Name=WORKSPACE_Binval /
  ColumnFamily CompareWith=BytesType Name=WORKSPACE_Refs /
 
  ColumnFamily CompareWith=BytesType Name=VERSION_Bundle /
  ColumnFamily CompareWith=BytesType Name=VERSION_Binval /
  ColumnFamily CompareWith=BytesType Name=VERSION_Refs /
 
  ColumnFamily CompareWith=BytesType Name=SECURITY_Bundle /
  ColumnFamily CompareWith=BytesType Name=SECURITY_Binval /
  ColumnFamily CompareWith=BytesType Name=SECURITY_Refs /
 
/Keyspace
  /Keyspaces
 
 Patricio
 
 On Wed, Feb 17, 2010 at 6:47 AM, Ian Boston i...@tfd.co.uk wrote:
 
 
 On 17 Feb 2010, at 14:34, Jukka Zitting wrote:
 
 Hi,
 
 On Wed, Feb 17, 2010 at 11:53 AM, Ian Boston i...@tfd.co.uk wrote:
 On 17 Feb 2010, at 10:43, Jukka Zitting wrote:
 I'm not aware of anything like that, though there's been some
 discussion about persistence on top of distributed databases or hash
 tables. The main problem with such approaches is the eventual
 consistency model that can be troublesome for the current Jackrabbit
 architecture.
 
 Is that because, in a cluster one JR node might get an event that an
 Item exists, but its not yet present on the backend its connected to,
 and there are no guarantees over the order in which items appear,
 so, for instance, the hierarchy manager might find a child but not
 the parent?
 
 Exactly. The current Jackrabbit architecture assumes that the
 underlying persistence store is always (not just eventually)
 consistent.
 
 
 Ok, I'm looking into ways to address that. The other one, I suspect is
 going to be the sequence ID on the journal event stream, which IIRC needs to
 be sequential so the journal can be replayed, also since the biggest source
 of eventually consistent errors is going to be that journal stream I am
 thinking binding the journal with the PM might allow remote nodes to know
 when a local item is out of date, if an item in the local PM is stale, and
 which server to get the latest item copy from. That assumes nothing changes
 without a journal record being emitted. I want to avoid sending the journal
 via storage.
 
 
 Do you have any pointers to the discussions so I can go and read ?
 
 It's been mostly coffee room discussions so far, but I'll bring up the
 topic soon on dev@ as a part of a larger Jackrabbit 3 roadmap
 discussion.
 
 I commented on you JR3 rfi, but I am still thinking about JR2
 
 
 BR,
 
 Jukka Zitting
 
 
 
 
 -- 
 Patricio.-



Re: Casandara and Jackrabbit

2010-02-18 Thread Patricio Echagüe
Well, I'll try to answer that. As far as I know, after looking into the PM
source code, at the time that store(change log) is called, the javadoc says
that the change log must be saved completely or nothing.
That is basically saying, if it fails, then rollback.

If you use a bundle approach, as I did, each bundle is a node and its
properties. The bundle id is my row key in Cassandra, hence, I save a bundle
per row.

So, since each bundle is stored in a row/key, tipically in a chage log, you
will have many bundles which means many rows or keys. At this point
Cassandra does no assures you that all invocations will be successful.

so far, Cassandra support transactions at the row level. That says, the row
is saved or not at all.

I read there is some intentions to add the feature of transactions across
multiple keys in the future.

Please feel free to question my thoughts. I would like to get to a better
approach is there is a way.

Regards
Patricio

On Thu, Feb 18, 2010 at 1:14 AM, Ian Boston i...@tfd.co.uk wrote:


 On 18 Feb 2010, at 06:39, Patricio Echagüe wrote:

  Hi, I have written a Cassandra Persistence Manager but as Jukka
 mentioned,
  there are some issues to resolve.
  The other problem I see is that Cassandra does not support transaction
  across multiple keys.

 I havent looked at the PM yet, but have been doing a JGroups based
 ClusterNode/Journal as a start, was not intending to put the Journal in
 Cassandra (that might be a mistake)

 Remind me,
 Is the problem that Cassandra might not take multiple key updates to
 Bundle, Binval and Refs atomically or that its not possible to have the
 transaction monitor inside the PM. IIRC the PM is a form of transaction
 monitor ?

 Also, (potentially dumb question), could you merge Bundle, Binval and Refs
 ? (or would that just kill performance)
 Ian



  The JR PM assumes that the change logs are saved all or none of them.
 That
  represents another issue. (I assume a bundle approach)
 
  Just to keep the discussion, this is my Keyspace
 
   Keyspaces
Keyspace Name=myCassandraPM
 
   ColumnFamily CompareWith=BytesType Name=WORKSPACE_Bundle /
   ColumnFamily CompareWith=BytesType Name=WORKSPACE_Binval /
   ColumnFamily CompareWith=BytesType Name=WORKSPACE_Refs /
 
   ColumnFamily CompareWith=BytesType Name=VERSION_Bundle /
   ColumnFamily CompareWith=BytesType Name=VERSION_Binval /
   ColumnFamily CompareWith=BytesType Name=VERSION_Refs /
 
   ColumnFamily CompareWith=BytesType Name=SECURITY_Bundle /
   ColumnFamily CompareWith=BytesType Name=SECURITY_Binval /
   ColumnFamily CompareWith=BytesType Name=SECURITY_Refs /
 
 /Keyspace
   /Keyspaces
 
  Patricio
 
  On Wed, Feb 17, 2010 at 6:47 AM, Ian Boston i...@tfd.co.uk wrote:
 
 
  On 17 Feb 2010, at 14:34, Jukka Zitting wrote:
 
  Hi,
 
  On Wed, Feb 17, 2010 at 11:53 AM, Ian Boston i...@tfd.co.uk wrote:
  On 17 Feb 2010, at 10:43, Jukka Zitting wrote:
  I'm not aware of anything like that, though there's been some
  discussion about persistence on top of distributed databases or hash
  tables. The main problem with such approaches is the eventual
  consistency model that can be troublesome for the current Jackrabbit
  architecture.
 
  Is that because, in a cluster one JR node might get an event that an
  Item exists, but its not yet present on the backend its connected to,
  and there are no guarantees over the order in which items appear,
  so, for instance, the hierarchy manager might find a child but not
  the parent?
 
  Exactly. The current Jackrabbit architecture assumes that the
  underlying persistence store is always (not just eventually)
  consistent.
 
 
  Ok, I'm looking into ways to address that. The other one, I suspect is
  going to be the sequence ID on the journal event stream, which IIRC
 needs to
  be sequential so the journal can be replayed, also since the biggest
 source
  of eventually consistent errors is going to be that journal stream I am
  thinking binding the journal with the PM might allow remote nodes to
 know
  when a local item is out of date, if an item in the local PM is stale,
 and
  which server to get the latest item copy from. That assumes nothing
 changes
  without a journal record being emitted. I want to avoid sending the
 journal
  via storage.
 
 
  Do you have any pointers to the discussions so I can go and read ?
 
  It's been mostly coffee room discussions so far, but I'll bring up the
  topic soon on dev@ as a part of a larger Jackrabbit 3 roadmap
  discussion.
 
  I commented on you JR3 rfi, but I am still thinking about JR2
 
 
  BR,
 
  Jukka Zitting
 
 
 
 
  --
  Patricio.-




-- 
Patricio.-


Re: Casandara and Jackrabbit

2010-02-17 Thread Jukka Zitting
Hi,

On Wed, Feb 17, 2010 at 11:29 AM, Ian Boston i...@tfd.co.uk wrote:
 Has anyone written a PM or SPI implementation based on Casandra ?

I'm not aware of anything like that, though there's been some
discussion about persistence on top of distributed databases or hash
tables. The main problem with such approaches is the eventual
consistency model that can be troublesome for the current Jackrabbit
architecture.

BR,

Jukka Zitting


Re: Casandara and Jackrabbit

2010-02-17 Thread Ian Boston

On 17 Feb 2010, at 10:43, Jukka Zitting wrote:

 Hi,
 
 On Wed, Feb 17, 2010 at 11:29 AM, Ian Boston i...@tfd.co.uk wrote:
 Has anyone written a PM or SPI implementation based on Casandra ?
 
 I'm not aware of anything like that, though there's been some
 discussion about persistence on top of distributed databases or hash
 tables. The main problem with such approaches is the eventual
 consistency model that can be troublesome for the current Jackrabbit
 architecture.

Is that because, in a cluster one JR node might get an event that an Item 
exists, but its not yet present on the backend its connected to, and there are 
no guarantees over the order in which items appear, so, for instance, the 
hierarchy manager might find a child but not the parent?

Do you have any pointers to the discussions so I can go and read ?


Thanks
Ian

 
 BR,
 
 Jukka Zitting



Re: Casandara and Jackrabbit

2010-02-17 Thread Jukka Zitting
Hi,

On Wed, Feb 17, 2010 at 11:53 AM, Ian Boston i...@tfd.co.uk wrote:
 On 17 Feb 2010, at 10:43, Jukka Zitting wrote:
 I'm not aware of anything like that, though there's been some
 discussion about persistence on top of distributed databases or hash
 tables. The main problem with such approaches is the eventual
 consistency model that can be troublesome for the current Jackrabbit
 architecture.

 Is that because, in a cluster one JR node might get an event that an
 Item exists, but its not yet present on the backend its connected to,
 and there are no guarantees over the order in which items appear,
 so, for instance, the hierarchy manager might find a child but not
 the parent?

Exactly. The current Jackrabbit architecture assumes that the
underlying persistence store is always (not just eventually)
consistent.

 Do you have any pointers to the discussions so I can go and read ?

It's been mostly coffee room discussions so far, but I'll bring up the
topic soon on dev@ as a part of a larger Jackrabbit 3 roadmap
discussion.

BR,

Jukka Zitting


Re: Casandara and Jackrabbit

2010-02-17 Thread Ian Boston

On 17 Feb 2010, at 14:34, Jukka Zitting wrote:

 Hi,
 
 On Wed, Feb 17, 2010 at 11:53 AM, Ian Boston i...@tfd.co.uk wrote:
 On 17 Feb 2010, at 10:43, Jukka Zitting wrote:
 I'm not aware of anything like that, though there's been some
 discussion about persistence on top of distributed databases or hash
 tables. The main problem with such approaches is the eventual
 consistency model that can be troublesome for the current Jackrabbit
 architecture.
 
 Is that because, in a cluster one JR node might get an event that an
 Item exists, but its not yet present on the backend its connected to,
 and there are no guarantees over the order in which items appear,
 so, for instance, the hierarchy manager might find a child but not
 the parent?
 
 Exactly. The current Jackrabbit architecture assumes that the
 underlying persistence store is always (not just eventually)
 consistent.


Ok, I'm looking into ways to address that. The other one, I suspect is going to 
be the sequence ID on the journal event stream, which IIRC needs to be 
sequential so the journal can be replayed, also since the biggest source of 
eventually consistent errors is going to be that journal stream I am thinking 
binding the journal with the PM might allow remote nodes to know when a local 
item is out of date, if an item in the local PM is stale, and which server to 
get the latest item copy from. That assumes nothing changes without a journal 
record being emitted. I want to avoid sending the journal via storage.

 
 Do you have any pointers to the discussions so I can go and read ?
 
 It's been mostly coffee room discussions so far, but I'll bring up the
 topic soon on dev@ as a part of a larger Jackrabbit 3 roadmap
 discussion.

I commented on you JR3 rfi, but I am still thinking about JR2

 
 BR,
 
 Jukka Zitting



Re: Casandara and Jackrabbit

2010-02-17 Thread Patricio Echagüe
Hi, I have written a Cassandra Persistence Manager but as Jukka mentioned,
there are some issues to resolve.
The other problem I see is that Cassandra does not support transaction
across multiple keys.
The JR PM assumes that the change logs are saved all or none of them. That
represents another issue. (I assume a bundle approach)

Just to keep the discussion, this is my Keyspace

  Keyspaces
   Keyspace Name=myCassandraPM

  ColumnFamily CompareWith=BytesType Name=WORKSPACE_Bundle /
  ColumnFamily CompareWith=BytesType Name=WORKSPACE_Binval /
  ColumnFamily CompareWith=BytesType Name=WORKSPACE_Refs /

  ColumnFamily CompareWith=BytesType Name=VERSION_Bundle /
  ColumnFamily CompareWith=BytesType Name=VERSION_Binval /
  ColumnFamily CompareWith=BytesType Name=VERSION_Refs /

  ColumnFamily CompareWith=BytesType Name=SECURITY_Bundle /
  ColumnFamily CompareWith=BytesType Name=SECURITY_Binval /
  ColumnFamily CompareWith=BytesType Name=SECURITY_Refs /

/Keyspace
  /Keyspaces

Patricio

On Wed, Feb 17, 2010 at 6:47 AM, Ian Boston i...@tfd.co.uk wrote:


 On 17 Feb 2010, at 14:34, Jukka Zitting wrote:

  Hi,
 
  On Wed, Feb 17, 2010 at 11:53 AM, Ian Boston i...@tfd.co.uk wrote:
  On 17 Feb 2010, at 10:43, Jukka Zitting wrote:
  I'm not aware of anything like that, though there's been some
  discussion about persistence on top of distributed databases or hash
  tables. The main problem with such approaches is the eventual
  consistency model that can be troublesome for the current Jackrabbit
  architecture.
 
  Is that because, in a cluster one JR node might get an event that an
  Item exists, but its not yet present on the backend its connected to,
  and there are no guarantees over the order in which items appear,
  so, for instance, the hierarchy manager might find a child but not
  the parent?
 
  Exactly. The current Jackrabbit architecture assumes that the
  underlying persistence store is always (not just eventually)
  consistent.


 Ok, I'm looking into ways to address that. The other one, I suspect is
 going to be the sequence ID on the journal event stream, which IIRC needs to
 be sequential so the journal can be replayed, also since the biggest source
 of eventually consistent errors is going to be that journal stream I am
 thinking binding the journal with the PM might allow remote nodes to know
 when a local item is out of date, if an item in the local PM is stale, and
 which server to get the latest item copy from. That assumes nothing changes
 without a journal record being emitted. I want to avoid sending the journal
 via storage.

 
  Do you have any pointers to the discussions so I can go and read ?
 
  It's been mostly coffee room discussions so far, but I'll bring up the
  topic soon on dev@ as a part of a larger Jackrabbit 3 roadmap
  discussion.

 I commented on you JR3 rfi, but I am still thinking about JR2

 
  BR,
 
  Jukka Zitting




-- 
Patricio.-