cli composite type literal with empty string component
I have a CF defined like this in CLI syntax: create column family Test with key_validation_class = UTF8Type and comparator = 'CompositeType(AsciiType, UTF8Type)' and default_validation_class = UTF8Type and column_metadata = [ { column_name : 'deleted:', validation_class : BooleanType }, { column_name : 'version:', validation_class : LongType }, ]; I expected these columns to map to (deleted, ) and (version, ) in pycassa, but this is not the case: TEST.insert(r1, { (deleted, ): False, (version, ): 1, (a, b): c }) AttributeError: 'int' object has no attribute 'encode' TEST.column_validators {'\x00\x07deleted\x00': 'BooleanType', '\x00\x07version\x00': 'LongType'} The obvious workaround is to use pycassa to define the schema: SYSTEM_MANAGER.create_column_family(test, Test2, key_validation_class=UTF8_TYPE, comparator_type=CompositeType(ASCII_TYPE, UTF8_TYPE), default_validation_class=UTF8_TYPE, column_validation_classes={ (version, ): LONG_TYPE, (deleted, ): BOOLEAN_TYPE }) and this does really produce a different schema: TEST2.column_validators {'\x00\x07version\x00\x00\x00\x00': 'LongType', '\x00\x07deleted\x00\x00\x00\x00': 'BooleanType'} To mimic what CLI does, I leave off the last component instead of using : SYSTEM_MANAGER.create_column_family(test, Test3, key_validation_class=UTF8_TYPE, comparator_type=CompositeType(ASCII_TYPE, UTF8_TYPE), default_validation_class=UTF8_TYPE, column_validation_classes={ (version,): LONG_TYPE, (deleted,): BOOLEAN_TYPE }) TEST3.column_validators {'\x00\x07deleted\x00': 'BooleanType', '\x00\x07version\x00': 'LongType'} But I see no way to address these columns from pycassa. I have a workaround, but I find the inconsistency perplexing, and would rather not have to do the busywork to convert my schema syntax. Is there a way to address columns with an empty string component in the CLI? Thanks, Bryce signature.asc Description: PGP signature
Re: cli composite type literal with empty string component
Never mind; the issue with addressing composite column names with empty components was fixed in the latest pycassa, which is why I was even able to create them in the Test3 schema below. I get an error in 1.2.1 which I used to be running, but it all seems to work in 1.4.0. -Bryce On Wed, 8 Feb 2012 10:25:07 -0600 Bryce Allen bal...@ci.uchicago.edu wrote: I have a CF defined like this in CLI syntax: create column family Test with key_validation_class = UTF8Type and comparator = 'CompositeType(AsciiType, UTF8Type)' and default_validation_class = UTF8Type and column_metadata = [ { column_name : 'deleted:', validation_class : BooleanType }, { column_name : 'version:', validation_class : LongType }, ]; I expected these columns to map to (deleted, ) and (version, ) in pycassa, but this is not the case: TEST.insert(r1, { (deleted, ): False, (version, ): 1, (a, b): c }) AttributeError: 'int' object has no attribute 'encode' TEST.column_validators {'\x00\x07deleted\x00': 'BooleanType', '\x00\x07version\x00': 'LongType'} The obvious workaround is to use pycassa to define the schema: SYSTEM_MANAGER.create_column_family(test, Test2, key_validation_class=UTF8_TYPE, comparator_type=CompositeType(ASCII_TYPE, UTF8_TYPE), default_validation_class=UTF8_TYPE, column_validation_classes={ (version, ): LONG_TYPE, (deleted, ): BOOLEAN_TYPE }) and this does really produce a different schema: TEST2.column_validators {'\x00\x07version\x00\x00\x00\x00': 'LongType', '\x00\x07deleted\x00\x00\x00\x00': 'BooleanType'} To mimic what CLI does, I leave off the last component instead of using : SYSTEM_MANAGER.create_column_family(test, Test3, key_validation_class=UTF8_TYPE, comparator_type=CompositeType(ASCII_TYPE, UTF8_TYPE), default_validation_class=UTF8_TYPE, column_validation_classes={ (version,): LONG_TYPE, (deleted,): BOOLEAN_TYPE }) TEST3.column_validators {'\x00\x07deleted\x00': 'BooleanType', '\x00\x07version\x00': 'LongType'} But I see no way to address these columns from pycassa. I have a workaround, but I find the inconsistency perplexing, and would rather not have to do the busywork to convert my schema syntax. Is there a way to address columns with an empty string component in the CLI? Thanks, Bryce signature.asc Description: PGP signature
Re: cli composite type literal with empty string component
In case anyone else is curious about what is going on here: https://github.com/pycassa/pycassa/issues/112 The links to the Cassandra JIRA are instructive. -Bryce On Wed, 8 Feb 2012 10:59:37 -0600 Bryce Allen bal...@ci.uchicago.edu wrote: Never mind; the issue with addressing composite column names with empty components was fixed in the latest pycassa, which is why I was even able to create them in the Test3 schema below. I get an error in 1.2.1 which I used to be running, but it all seems to work in 1.4.0. -Bryce On Wed, 8 Feb 2012 10:25:07 -0600 Bryce Allen bal...@ci.uchicago.edu wrote: I have a CF defined like this in CLI syntax: create column family Test with key_validation_class = UTF8Type and comparator = 'CompositeType(AsciiType, UTF8Type)' and default_validation_class = UTF8Type and column_metadata = [ { column_name : 'deleted:', validation_class : BooleanType }, { column_name : 'version:', validation_class : LongType }, ]; I expected these columns to map to (deleted, ) and (version, ) in pycassa, but this is not the case: TEST.insert(r1, { (deleted, ): False, (version, ): 1, (a, b): c }) AttributeError: 'int' object has no attribute 'encode' TEST.column_validators {'\x00\x07deleted\x00': 'BooleanType', '\x00\x07version\x00': 'LongType'} The obvious workaround is to use pycassa to define the schema: SYSTEM_MANAGER.create_column_family(test, Test2, key_validation_class=UTF8_TYPE, comparator_type=CompositeType(ASCII_TYPE, UTF8_TYPE), default_validation_class=UTF8_TYPE, column_validation_classes={ (version, ): LONG_TYPE, (deleted, ): BOOLEAN_TYPE }) and this does really produce a different schema: TEST2.column_validators {'\x00\x07version\x00\x00\x00\x00': 'LongType', '\x00\x07deleted\x00\x00\x00\x00': 'BooleanType'} To mimic what CLI does, I leave off the last component instead of using : SYSTEM_MANAGER.create_column_family(test, Test3, key_validation_class=UTF8_TYPE, comparator_type=CompositeType(ASCII_TYPE, UTF8_TYPE), default_validation_class=UTF8_TYPE, column_validation_classes={ (version,): LONG_TYPE, (deleted,): BOOLEAN_TYPE }) TEST3.column_validators {'\x00\x07deleted\x00': 'BooleanType', '\x00\x07version\x00': 'LongType'} But I see no way to address these columns from pycassa. I have a workaround, but I find the inconsistency perplexing, and would rather not have to do the busywork to convert my schema syntax. Is there a way to address columns with an empty string component in the CLI? Thanks, Bryce signature.asc Description: PGP signature
Re: two dimensional slicing
to do the index lookup). It's definitely not much more complicated when using RP; I was caught up in some nuances of our old model when I wrote the last email. -Bryce Could you re-write the entire list every version update? CF: VersionedList row: list_name:version col_name: name col_value: last updated version So you slice one row at the upper version and discard all the columns where the value is less than the lower version ? Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 27/01/2012, at 5:31 AM, Bryce Allen wrote: Thanks, comments inline: On Mon, 23 Jan 2012 20:59:34 +1300 aaron morton aa...@thelastpickle.com wrote: It depends a bit on the data and the query patterns. * How many versions do you have ? We may have 10k versions in some cases, with up to a million names total in any given version but more often 10K. To manage this we are currently using two CFs, one for storing compacted complete lists and one for storing deltas on the compacted list. Based on usage, we will create a new compacted list and start writing deltas against that. We should be able to limit the number of deltas in a single row to below 100; I'd like to be able to keep it lower but I'm not sure we can maintain that under all load scenarios. The compacted lists are straightforward, but there are many ways to structure the deltas and they all have trade offs. A CF with composite columns that supported two dimensional slicing would be perfect. * How many names in each version ? We plan on limiting to a total of 1 million names, and around 10,000 per version (by limiting the batch size), but many deltas will have 10 names. * When querying do you know the versions numbers you want to query from ? How many are there normally? Currently we don't know the version numbers in advance - they are timestamps, and we are querying for versions less than or equal to the desired timestamp. We have talked about using vector clock versions and maintaining an index mapping time to version numbers, in which case we would know the exact versions after the index lookup, at the expense of another RTT on every operation. * How frequent are the updates and the reads ? We expect reads to be more frequent than writes. Unfortunately we don't have solid numbers on what to expect, but I would guess 20x. Update operations will involve several reads to determine where to write. I would lean towards using two standard CF's, one to list all the version numbers (in a single row probably) and one to hold the names in a particular version. To do your query slice the first CF and then run multi gets to the second. Thats probably not the best solution, if you can add some more info it may get better. I'm actually leaning back toward BOP, as I run into more issues and complexity with the RP models. I'd really like to implement both and compare them, but at this point I need to focus on one to get things working, so I'm trying to make a best initial guess. On 21/01/2012, at 6:20 AM, Bryce Allen wrote: I'm storing very large versioned lists of names, and I'd like to query a range of names within a given range of versions, which is a two dimensional slice, in a single query. This is easy to do using ByteOrderedPartitioner, but seems to require multiple (non parallel) queries and extra CFs when using RandomPartitioner. I see two approaches when using RP: 1) Data is stored in a super column family, with one dimension being the super column names and the other the sub column names. Since slicing on sub columns requires a list of super column names, a second standard CF is needed to get a range of names before doing a query on the main super CF. With CASSANDRA-2710, the same is possible using a standard CF with composite types instead of a super CF. 2) If one of the dimensions is small, a two dimensional slice isn't required. The data can be stored in a standard CF with linear ordering on a composite type (large_dimension, small_dimension). Data is queried based on the large dimension, and the client throws out the extra data in the other dimension. Neither of the above solutions are ideal. Does anyone else have a use case where two dimensional slicing is useful? Given the disadvantages of BOP, is it practical to make the composite column query model richer to support this sort of use case? Thanks, Bryce signature.asc Description: PGP signature
Re: two dimensional slicing
On Mon, 30 Jan 2012 11:14:37 -0600 Bryce Allen bal...@ci.uchicago.edu wrote: With RP, the idea is to query many versions in ListVersionIndex starting at the desired version going backward, hoping that it will hit a compact version. We could also maintain a separate CompactVersion index, and accept another query. Actually a better way to handle this is to store the latest compacted version with each delta version in the index. When doing compaction, all the deltas between it and the next compaction (or end) are updated to point at the new compaction. E.g.: ts0: 20;20 - compacted version ts1: 21;20 ts2: 22;20 ... ts9: 29;20 ts10: 30;20 ts11: 31;20 compaction is done on version 30: ... ts9: 29;20 ts10: 30;30 - new compacted version ts11: 31;30 Perhaps compaction is a bad term because it already has meaning in Cassandra, but I can't think of a better name at the moment. -Bryce signature.asc Description: PGP signature
Re: two dimensional slicing
Thanks, comments inline: On Mon, 23 Jan 2012 20:59:34 +1300 aaron morton aa...@thelastpickle.com wrote: It depends a bit on the data and the query patterns. * How many versions do you have ? We may have 10k versions in some cases, with up to a million names total in any given version but more often 10K. To manage this we are currently using two CFs, one for storing compacted complete lists and one for storing deltas on the compacted list. Based on usage, we will create a new compacted list and start writing deltas against that. We should be able to limit the number of deltas in a single row to below 100; I'd like to be able to keep it lower but I'm not sure we can maintain that under all load scenarios. The compacted lists are straightforward, but there are many ways to structure the deltas and they all have trade offs. A CF with composite columns that supported two dimensional slicing would be perfect. * How many names in each version ? We plan on limiting to a total of 1 million names, and around 10,000 per version (by limiting the batch size), but many deltas will have 10 names. * When querying do you know the versions numbers you want to query from ? How many are there normally? Currently we don't know the version numbers in advance - they are timestamps, and we are querying for versions less than or equal to the desired timestamp. We have talked about using vector clock versions and maintaining an index mapping time to version numbers, in which case we would know the exact versions after the index lookup, at the expense of another RTT on every operation. * How frequent are the updates and the reads ? We expect reads to be more frequent than writes. Unfortunately we don't have solid numbers on what to expect, but I would guess 20x. Update operations will involve several reads to determine where to write. I would lean towards using two standard CF's, one to list all the version numbers (in a single row probably) and one to hold the names in a particular version. To do your query slice the first CF and then run multi gets to the second. Thats probably not the best solution, if you can add some more info it may get better. I'm actually leaning back toward BOP, as I run into more issues and complexity with the RP models. I'd really like to implement both and compare them, but at this point I need to focus on one to get things working, so I'm trying to make a best initial guess. On 21/01/2012, at 6:20 AM, Bryce Allen wrote: I'm storing very large versioned lists of names, and I'd like to query a range of names within a given range of versions, which is a two dimensional slice, in a single query. This is easy to do using ByteOrderedPartitioner, but seems to require multiple (non parallel) queries and extra CFs when using RandomPartitioner. I see two approaches when using RP: 1) Data is stored in a super column family, with one dimension being the super column names and the other the sub column names. Since slicing on sub columns requires a list of super column names, a second standard CF is needed to get a range of names before doing a query on the main super CF. With CASSANDRA-2710, the same is possible using a standard CF with composite types instead of a super CF. 2) If one of the dimensions is small, a two dimensional slice isn't required. The data can be stored in a standard CF with linear ordering on a composite type (large_dimension, small_dimension). Data is queried based on the large dimension, and the client throws out the extra data in the other dimension. Neither of the above solutions are ideal. Does anyone else have a use case where two dimensional slicing is useful? Given the disadvantages of BOP, is it practical to make the composite column query model richer to support this sort of use case? Thanks, Bryce signature.asc Description: PGP signature
two dimensional slicing
I'm storing very large versioned lists of names, and I'd like to query a range of names within a given range of versions, which is a two dimensional slice, in a single query. This is easy to do using ByteOrderedPartitioner, but seems to require multiple (non parallel) queries and extra CFs when using RandomPartitioner. I see two approaches when using RP: 1) Data is stored in a super column family, with one dimension being the super column names and the other the sub column names. Since slicing on sub columns requires a list of super column names, a second standard CF is needed to get a range of names before doing a query on the main super CF. With CASSANDRA-2710, the same is possible using a standard CF with composite types instead of a super CF. 2) If one of the dimensions is small, a two dimensional slice isn't required. The data can be stored in a standard CF with linear ordering on a composite type (large_dimension, small_dimension). Data is queried based on the large dimension, and the client throws out the extra data in the other dimension. Neither of the above solutions are ideal. Does anyone else have a use case where two dimensional slicing is useful? Given the disadvantages of BOP, is it practical to make the composite column query model richer to support this sort of use case? Thanks, Bryce signature.asc Description: PGP signature
Re: How to reliably achieve unique constraints with Cassandra?
On Fri, 6 Jan 2012 10:38:17 -0800 Mohit Anchlia mohitanch...@gmail.com wrote: It could be as simple as reading before writing to make sure that email doesn't exist. But I think you are looking at how to handle 2 concurrent requests for same email? Only way I can think of is: 1) Create new CF say tracker 2) write email and time uuid to CF tracker 3) read from CF tracker 4) if you find a row other than yours then wait and read again from tracker after few ms 5) read from USER CF 6) write if no rows in USER CF 7) delete from tracker Please note you might have to modify this logic a little bit, but this should give you some ideas of how to approach this problem without locking. Distributed locking is pretty subtle; I haven't seen a correct solution that uses just Cassandra, even with QUORUM read/write. I suspect it's not possible. With the above proposal, in step 4 two processes could both have inserted an entry in the tracker before either gets a chance to check, so you need a way to order the requests. I don't think the timestamp works for ordering, because it's set by the client (even the internal timestamp is set by the client), and will likely be different from when the data is actually committed and available to read by other clients. For example: * At time 0ms, client 1 starts insert of u...@example.org * At time 1ms, client 2 also starts insert for u...@example.org * At time 2ms, client 2 data is committed * At time 3ms, client 2 reads tracker and sees that it's the only one, so enters the critical section * At time 4ms, client 1 data is committed * At time 5ms, client 2 reads tracker, and sees that is not the only one, but since it has the lowest timestamp (0ms vs 1ms), it enters the critical section. I don't think Cassandra counters work for ordering either. This approach is similar to the Zookeeper lock recipe: http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks but zookeeper has sequence nodes, which provide a consistent way of ordering the requests. Zookeeper also avoids the busy waiting. I'd be happy to be proven wrong. But even if it is possible, if it involves a lot of complexity and busy waiting it's probably not worth it. There's a reason people are using Zookeeper with Cassandra. -Bryce signature.asc Description: PGP signature
Re: How to reliably achieve unique constraints with Cassandra?
On Fri, 6 Jan 2012 10:03:38 -0800 Drew Kutcharian d...@venarc.com wrote: I know that this can be done using a lock manager such as ZooKeeper or HazelCast, but the issue with using either of them is that if ZooKeeper or HazelCast is down, then you can't be sure about the reliability of the lock. So this potentially, in the very rare instance where the lock manager is down and two users are registering with the same email, can cause major issues. For most applications, if the lock managers is down, you don't acquire the lock, so you don't enter the critical section. Rather than allowing inconsistency, you become unavailable (at least to writes that require a lock). -Bryce signature.asc Description: PGP signature
Re: How to reliably achieve unique constraints with Cassandra?
This looks like it: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html There's also some interesting JIRA tickets related to locking/CAS: https://issues.apache.org/jira/browse/CASSANDRA-2686 https://issues.apache.org/jira/browse/CASSANDRA-48 -Bryce On Fri, 06 Jan 2012 14:53:21 -0600 Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: Correct, any kind of locking in Cassandra requires clocks that are in sync, and requires you to wait possible clock out of sync time before reading to check if you got the lock, to prevent the issue you describe below. There was a pretty detailed discussion of locking with only Cassandra a month or so back on this list. -Jeremiah On 01/06/2012 02:42 PM, Bryce Allen wrote: On Fri, 6 Jan 2012 10:38:17 -0800 Mohit Anchliamohitanch...@gmail.com wrote: It could be as simple as reading before writing to make sure that email doesn't exist. But I think you are looking at how to handle 2 concurrent requests for same email? Only way I can think of is: 1) Create new CF say tracker 2) write email and time uuid to CF tracker 3) read from CF tracker 4) if you find a row other than yours then wait and read again from tracker after few ms 5) read from USER CF 6) write if no rows in USER CF 7) delete from tracker Please note you might have to modify this logic a little bit, but this should give you some ideas of how to approach this problem without locking. Distributed locking is pretty subtle; I haven't seen a correct solution that uses just Cassandra, even with QUORUM read/write. I suspect it's not possible. With the above proposal, in step 4 two processes could both have inserted an entry in the tracker before either gets a chance to check, so you need a way to order the requests. I don't think the timestamp works for ordering, because it's set by the client (even the internal timestamp is set by the client), and will likely be different from when the data is actually committed and available to read by other clients. For example: * At time 0ms, client 1 starts insert of u...@example.org * At time 1ms, client 2 also starts insert for u...@example.org * At time 2ms, client 2 data is committed * At time 3ms, client 2 reads tracker and sees that it's the only one, so enters the critical section * At time 4ms, client 1 data is committed * At time 5ms, client 2 reads tracker, and sees that is not the only one, but since it has the lowest timestamp (0ms vs 1ms), it enters the critical section. I don't think Cassandra counters work for ordering either. This approach is similar to the Zookeeper lock recipe: http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks but zookeeper has sequence nodes, which provide a consistent way of ordering the requests. Zookeeper also avoids the busy waiting. I'd be happy to be proven wrong. But even if it is possible, if it involves a lot of complexity and busy waiting it's probably not worth it. There's a reason people are using Zookeeper with Cassandra. -Bryce signature.asc Description: PGP signature
Re: How to reliably achieve unique constraints with Cassandra?
I don't think it's just clock drift. There is also the period of time between when the client selects a timestamp, and when the data ends up committed to cassandra. That drift seems harder to control, when the nodes and/or clients are under load. I agree that it would be nice to have something like this in Cassandra core, but from the JIRA tickets it looks like this has been tried before, and for various reasons was not added. It's definitely non-trivial to get right. On Fri, 6 Jan 2012 13:33:02 -0800 Mohit Anchlia mohitanch...@gmail.com wrote: This looks like right way to do it. But remember this still doesn't gurantee if your clocks drifts way too much. But it's trade-off with having to manage one additional component or use something internal to C*. It would be good to see similar functionality implemented in C* so that clients don't have to deal with it explicitly. On Fri, Jan 6, 2012 at 1:16 PM, Bryce Allen bal...@ci.uchicago.edu wrote: This looks like it: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Implementing-locks-using-cassandra-only-tp5527076p5527076.html There's also some interesting JIRA tickets related to locking/CAS: https://issues.apache.org/jira/browse/CASSANDRA-2686 https://issues.apache.org/jira/browse/CASSANDRA-48 -Bryce On Fri, 06 Jan 2012 14:53:21 -0600 Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: Correct, any kind of locking in Cassandra requires clocks that are in sync, and requires you to wait possible clock out of sync time before reading to check if you got the lock, to prevent the issue you describe below. There was a pretty detailed discussion of locking with only Cassandra a month or so back on this list. -Jeremiah On 01/06/2012 02:42 PM, Bryce Allen wrote: On Fri, 6 Jan 2012 10:38:17 -0800 Mohit Anchliamohitanch...@gmail.com wrote: It could be as simple as reading before writing to make sure that email doesn't exist. But I think you are looking at how to handle 2 concurrent requests for same email? Only way I can think of is: 1) Create new CF say tracker 2) write email and time uuid to CF tracker 3) read from CF tracker 4) if you find a row other than yours then wait and read again from tracker after few ms 5) read from USER CF 6) write if no rows in USER CF 7) delete from tracker Please note you might have to modify this logic a little bit, but this should give you some ideas of how to approach this problem without locking. Distributed locking is pretty subtle; I haven't seen a correct solution that uses just Cassandra, even with QUORUM read/write. I suspect it's not possible. With the above proposal, in step 4 two processes could both have inserted an entry in the tracker before either gets a chance to check, so you need a way to order the requests. I don't think the timestamp works for ordering, because it's set by the client (even the internal timestamp is set by the client), and will likely be different from when the data is actually committed and available to read by other clients. For example: * At time 0ms, client 1 starts insert of u...@example.org * At time 1ms, client 2 also starts insert for u...@example.org * At time 2ms, client 2 data is committed * At time 3ms, client 2 reads tracker and sees that it's the only one, so enters the critical section * At time 4ms, client 1 data is committed * At time 5ms, client 2 reads tracker, and sees that is not the only one, but since it has the lowest timestamp (0ms vs 1ms), it enters the critical section. I don't think Cassandra counters work for ordering either. This approach is similar to the Zookeeper lock recipe: http://zookeeper.apache.org/doc/current/recipes.html#sc_recipes_Locks but zookeeper has sequence nodes, which provide a consistent way of ordering the requests. Zookeeper also avoids the busy waiting. I'd be happy to be proven wrong. But even if it is possible, if it involves a lot of complexity and busy waiting it's probably not worth it. There's a reason people are using Zookeeper with Cassandra. -Bryce signature.asc Description: PGP signature
Re: How to reliably achieve unique constraints with Cassandra?
That's a good question, and I'm not sure - I'm fairly new to both ZK and Cassandra. I found this wiki page: http://wiki.apache.org/hadoop/ZooKeeper/FailureScenarios and I think the lock recipe still works, even if a stale read happens. Assuming that wiki page is correct. There is still subtlety to locking with ZK though, see (Locks based on ephemeral nodes) from the zk mailing list in October: http://mail-archives.apache.org/mod_mbox/zookeeper-user/201110.mbox/thread?0 -Bryce On Fri, 6 Jan 2012 13:36:52 -0800 Drew Kutcharian d...@venarc.com wrote: Bryce, I'm not sure about ZooKeeper, but I know if you have a partition between HazelCast nodes, than the nodes can acquire the same lock independently in each divided partition. How does ZooKeeper handle this situation? -- Drew On Jan 6, 2012, at 12:48 PM, Bryce Allen wrote: On Fri, 6 Jan 2012 10:03:38 -0800 Drew Kutcharian d...@venarc.com wrote: I know that this can be done using a lock manager such as ZooKeeper or HazelCast, but the issue with using either of them is that if ZooKeeper or HazelCast is down, then you can't be sure about the reliability of the lock. So this potentially, in the very rare instance where the lock manager is down and two users are registering with the same email, can cause major issues. For most applications, if the lock managers is down, you don't acquire the lock, so you don't enter the critical section. Rather than allowing inconsistency, you become unavailable (at least to writes that require a lock). -Bryce signature.asc Description: PGP signature
Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys
Thanks, that definitely has advantages over using a super column. We ran into thrift timeouts when the super column got large, and with the super column range query there is no way (AFAIK) to batch the request at the subcolumn level. -Bryce On Thu, 22 Dec 2011 10:06:58 +1300 aaron morton aa...@thelastpickle.com wrote: AFAIK there are no plans kill the BOP, but I would still try to make your life easier by using the RP. . My understanding of the problem is at certain times you snapshot the files in a dir; and the main query you want to handle is At what points between time t0 and time t1 did files x,y and z exist?. You could consider: 1) Partitioning the time series data in across each row, then make the row key is the timestamp for the start of the partition. If you have rollup partitions consider making the row key timestamp : partition_size , e.g. 123456789.1d for a 1 day partition that starts at 123456789 2) In each row use column names that have the form timestamp : file_name where time stamp is the time of the snapshot. To query between two times (t0 and t1): 1) Determine which partitions the time span covers, this will give you a list of rows. 2) Execute a multi-get slice for the all rows using t0:* and t1:* (I'm using * here as a null, check with your client to see how to use composite columns.) Hope that helps. Aaron - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/12/2011, at 9:03 AM, Bryce Allen wrote: I wasn't aware of CompositeColumns, thanks for the tip. However I think it still doesn't allow me to do the query I need - basically I need to do a timestamp range query, limiting only to certain file names at each timestamp. With BOP and a separate row for each timestamp, prefixed by a random UUID, and file names as column names, I can do this query. With CompositeColumns, I can only query one contiguous range, so I'd have to know the timestamps before hand to limit the file names. I can resolve this using indexes, but on paper it looks like this would be significantly slower (it would take me 5 round trips instead of 3 to complete each query, and the query is made multiple times on every single client request). The two down sides I've seen listed for BOP are balancing issues and hotspots. I can understand why RP is recommended, from the balancing issues alone. However these aren't problems for my application. Is there anything else I am missing? Does the Cassandra team plan on continuing to support BOP? I haven't completely ruled out RP, but I like having BOP as an option, it opens up interesting modeling alternatives that I think have real advantages for some (if uncommon) applications. Thanks, Bryce On Wed, 21 Dec 2011 08:08:16 +1300 aaron morton aa...@thelastpickle.com wrote: Bryce, Have you considered using CompositeColumns and a standard CF? Row key is the UUID column name is (timestamp : dir_entry) you can then slice all columns with a particular time stamp. Even if you have a random key, I would use the RP unless you have an extreme use case. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/12/2011, at 3:06 AM, Bryce Allen wrote: I think it comes down to how much you benefit from row range scans, and how confident you are that going forward all data will continue to use random row keys. I'm considering using BOP as a way of working around the non indexes super column limitation. In my current schema, row keys are random UUIDs, super column names are timestamps, and columns contain a snapshot in time of directory contents, and could be quite large. If instead I use row keys that are (uuid)-(timestamp), and use a standard column family, I can do a row range query and select only specific columns. I'm still evaluating if I can do this with BOP - ideally the token would just use the first 128 bits of the key, and I haven't found any documentation on how it compares keys of different length. Another trick with BOP is to use MD5(rowkey)-rowkey for data that has non uniform row keys. I think it's reasonable to use if most data is uniform and benefits from range scans, but a few things are added that aren't/don't. This trick does make the keys larger, which increases storage cost and IO load, so it's probably a bad idea if a significant subset of the data requires it. Disclaimer - I wrote that wiki article to fill in a documentation gap, since there were no examples of BOP and I wasted a lot of time before I noticed the hex byte array vs decimal distinction for specifying the initial tokens (which to be fair is documented, just easy to miss on a skim). I'm also new to cassandra, I'm just describing what makes sense to me on paper. FWIW I confirmed that random UUIDs (type 4) row
Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys
I think it comes down to how much you benefit from row range scans, and how confident you are that going forward all data will continue to use random row keys. I'm considering using BOP as a way of working around the non indexes super column limitation. In my current schema, row keys are random UUIDs, super column names are timestamps, and columns contain a snapshot in time of directory contents, and could be quite large. If instead I use row keys that are (uuid)-(timestamp), and use a standard column family, I can do a row range query and select only specific columns. I'm still evaluating if I can do this with BOP - ideally the token would just use the first 128 bits of the key, and I haven't found any documentation on how it compares keys of different length. Another trick with BOP is to use MD5(rowkey)-rowkey for data that has non uniform row keys. I think it's reasonable to use if most data is uniform and benefits from range scans, but a few things are added that aren't/don't. This trick does make the keys larger, which increases storage cost and IO load, so it's probably a bad idea if a significant subset of the data requires it. Disclaimer - I wrote that wiki article to fill in a documentation gap, since there were no examples of BOP and I wasted a lot of time before I noticed the hex byte array vs decimal distinction for specifying the initial tokens (which to be fair is documented, just easy to miss on a skim). I'm also new to cassandra, I'm just describing what makes sense to me on paper. FWIW I confirmed that random UUIDs (type 4) row keys really do evenly distribute when using BOP. -Bryce On Mon, 19 Dec 2011 19:01:00 -0800 Drew Kutcharian d...@venarc.com wrote: Hey Guys, I just came across http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me thinking. If the row keys are java.util.UUID which are generated randomly (and securely), then what type of partitioner would be the best? Since the key values are already random, would it make a difference to use RandomPartitioner or one can use ByteOrderedPartitioner or OrderPreservingPartitioning as well and get the same result? -- Drew signature.asc Description: PGP signature
Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys
I wasn't aware of CompositeColumns, thanks for the tip. However I think it still doesn't allow me to do the query I need - basically I need to do a timestamp range query, limiting only to certain file names at each timestamp. With BOP and a separate row for each timestamp, prefixed by a random UUID, and file names as column names, I can do this query. With CompositeColumns, I can only query one contiguous range, so I'd have to know the timestamps before hand to limit the file names. I can resolve this using indexes, but on paper it looks like this would be significantly slower (it would take me 5 round trips instead of 3 to complete each query, and the query is made multiple times on every single client request). The two down sides I've seen listed for BOP are balancing issues and hotspots. I can understand why RP is recommended, from the balancing issues alone. However these aren't problems for my application. Is there anything else I am missing? Does the Cassandra team plan on continuing to support BOP? I haven't completely ruled out RP, but I like having BOP as an option, it opens up interesting modeling alternatives that I think have real advantages for some (if uncommon) applications. Thanks, Bryce On Wed, 21 Dec 2011 08:08:16 +1300 aaron morton aa...@thelastpickle.com wrote: Bryce, Have you considered using CompositeColumns and a standard CF? Row key is the UUID column name is (timestamp : dir_entry) you can then slice all columns with a particular time stamp. Even if you have a random key, I would use the RP unless you have an extreme use case. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 21/12/2011, at 3:06 AM, Bryce Allen wrote: I think it comes down to how much you benefit from row range scans, and how confident you are that going forward all data will continue to use random row keys. I'm considering using BOP as a way of working around the non indexes super column limitation. In my current schema, row keys are random UUIDs, super column names are timestamps, and columns contain a snapshot in time of directory contents, and could be quite large. If instead I use row keys that are (uuid)-(timestamp), and use a standard column family, I can do a row range query and select only specific columns. I'm still evaluating if I can do this with BOP - ideally the token would just use the first 128 bits of the key, and I haven't found any documentation on how it compares keys of different length. Another trick with BOP is to use MD5(rowkey)-rowkey for data that has non uniform row keys. I think it's reasonable to use if most data is uniform and benefits from range scans, but a few things are added that aren't/don't. This trick does make the keys larger, which increases storage cost and IO load, so it's probably a bad idea if a significant subset of the data requires it. Disclaimer - I wrote that wiki article to fill in a documentation gap, since there were no examples of BOP and I wasted a lot of time before I noticed the hex byte array vs decimal distinction for specifying the initial tokens (which to be fair is documented, just easy to miss on a skim). I'm also new to cassandra, I'm just describing what makes sense to me on paper. FWIW I confirmed that random UUIDs (type 4) row keys really do evenly distribute when using BOP. -Bryce On Mon, 19 Dec 2011 19:01:00 -0800 Drew Kutcharian d...@venarc.com wrote: Hey Guys, I just came across http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me thinking. If the row keys are java.util.UUID which are generated randomly (and securely), then what type of partitioner would be the best? Since the key values are already random, would it make a difference to use RandomPartitioner or one can use ByteOrderedPartitioner or OrderPreservingPartitioning as well and get the same result? -- Drew signature.asc Description: PGP signature