Partitioner type

2013-07-04 Thread Vivek Mishra
Hi,
Is it possible to know, type of partitioner programmitcally at runtime?

-Vivek


Re: Partitioner type

2013-07-04 Thread Shubham Mittal
Yeah its possible,
It depends on which client you're using.

e,g.
In pycassa(python client for cassandra), I use
 import pycassa
 from pycassa.system_manager import *
 sys = SystemManager('hostname:portnumber')
 sys.describe_partitioner()




On Thu, Jul 4, 2013 at 5:32 PM, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 Is it possible to know, type of partitioner programmitcally at runtime?

 -Vivek



Re: Partitioner type

2013-07-04 Thread Haithem Jarraya
yes, you can query local CF in system keyspace:

 select partitioner from system.local;


H


On 4 July 2013 13:02, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 Is it possible to know, type of partitioner programmitcally at runtime?

 -Vivek



Re: Partitioner type

2013-07-04 Thread Vivek Mishra
Just saw , thrift apis describe_paritioner() method.

Thanks for quick suggestions.

-Vivek


On Thu, Jul 4, 2013 at 5:40 PM, Haithem Jarraya
haithem.jarr...@struq.comwrote:

 yes, you can query local CF in system keyspace:

  select partitioner from system.local;


 H


 On 4 July 2013 13:02, Vivek Mishra mishra.v...@gmail.com wrote:

 Hi,
 Is it possible to know, type of partitioner programmitcally at runtime?

 -Vivek





Custom Partitioner Type

2012-08-13 Thread A J
Is it possible to use a custom Partitioner type (other than RP or BOP) ?
Say if my rowkeys are all Integers and I want all even keys to go to
node1 and odd keys to node2, is it feasible ? How would I go about ?

Thanks.


Re: Custom Partitioner Type

2012-08-13 Thread aaron morton
Yes, you need to implement the org.apache.cassandra.dht.IPartitioner interface, 
there are a couple of abstract implementations you could base it on. 

 I want all even keys to go to
 node1 and odd keys to node2, is it feasible ?

I'm not endorsing the idea of doing this, but as a hack to see if the effects 
you could use the BOP and format the keys as (python):

str(key % 2) + {0:0#10}.format(key)

So all keys are 11 digit strings, even keys start with 0 and odd with 1. 

Hope that helps. 

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/08/2012, at 7:33 AM, A J s5a...@gmail.com wrote:

 Is it possible to use a custom Partitioner type (other than RP or BOP) ?
 Say if my rowkeys are all Integers and I want all even keys to go to
 node1 and odd keys to node2, is it feasible ? How would I go about ?
 
 Thanks.



Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-23 Thread aaron morton
No problems. 

IMHO you should develop a sizable bruise banging your head against a using 
Standard CF's and the Random Partitioner before using something else. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/12/2011, at 6:29 AM, Bryce Allen wrote:

 Thanks, that definitely has advantages over using a super column. We
 ran into thrift timeouts when the super column got large, and with the
 super column range query there is no way (AFAIK) to batch the request at
 the subcolumn level.
 
 -Bryce
 
 On Thu, 22 Dec 2011 10:06:58 +1300
 aaron morton aa...@thelastpickle.com wrote:
 AFAIK there are no plans kill the BOP, but I would still try to make
 your life easier by using the RP. . 
 
 My understanding of the problem is at certain times you snapshot the
 files in a dir; and the main query you want to handle is At what
 points between time t0 and time t1 did files x,y and z exist?.
 
 You could consider:
 
 1) Partitioning the time series data in across each row, then make
 the row key is the timestamp for the start of the partition. If you
 have rollup partitions consider making the row key timestamp :
 partition_size , e.g. 123456789.1d for a 1 day partition that
 starts at 123456789 2) In each row use column names that have the
 form timestamp : file_name where time stamp is the time of the
 snapshot. 
 
 To query between two times (t0 and t1):
 
 1) Determine which partitions the time span covers, this will give
 you a list of rows. 2) Execute a multi-get slice for the all rows
 using  t0:* and t1:* (I'm using * here as a null, check with your
 client to see how to use composite columns.)
 
 Hope that helps. 
 Aaron
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 21/12/2011, at 9:03 AM, Bryce Allen wrote:
 
 I wasn't aware of CompositeColumns, thanks for the tip. However I
 think it still doesn't allow me to do the query I need - basically
 I need to do a timestamp range query, limiting only to certain file
 names at each timestamp. With BOP and a separate row for each
 timestamp, prefixed by a random UUID, and file names as column
 names, I can do this query. With CompositeColumns, I can only query
 one contiguous range, so I'd have to know the timestamps before
 hand to limit the file names. I can resolve this using indexes, but
 on paper it looks like this would be significantly slower (it would
 take me 5 round trips instead of 3 to complete each query, and the
 query is made multiple times on every single client request).
 
 The two down sides I've seen listed for BOP are balancing issues and
 hotspots. I can understand why RP is recommended, from the balancing
 issues alone. However these aren't problems for my application. Is
 there anything else I am missing? Does the Cassandra team plan on
 continuing to support BOP? I haven't completely ruled out RP, but I
 like having BOP as an option, it opens up interesting modeling
 alternatives that I think have real advantages for some
 (if uncommon) applications.
 
 Thanks,
 Bryce
 
 On Wed, 21 Dec 2011 08:08:16 +1300
 aaron morton aa...@thelastpickle.com wrote:
 Bryce, 
Have you considered using CompositeColumns and a standard
 CF? Row key is the UUID column name is (timestamp : dir_entry) you
 can then slice all columns with a particular time stamp. 
 
Even if you have a random key, I would use the RP unless
 you have an extreme use case. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 21/12/2011, at 3:06 AM, Bryce Allen wrote:
 
 I think it comes down to how much you benefit from row range
 scans, and how confident you are that going forward all data will
 continue to use random row keys.
 
 I'm considering using BOP as a way of working around the non
 indexes super column limitation. In my current schema, row keys
 are random UUIDs, super column names are timestamps, and columns
 contain a snapshot in time of directory contents, and could be
 quite large. If instead I use row keys that are
 (uuid)-(timestamp), and use a standard column family, I can do a
 row range query and select only specific columns. I'm still
 evaluating if I can do this with BOP - ideally the token would
 just use the first 128 bits of the key, and I haven't found any
 documentation on how it compares keys of different length.
 
 Another trick with BOP is to use MD5(rowkey)-rowkey for data that
 has non uniform row keys. I think it's reasonable to use if most
 data is uniform and benefits from range scans, but a few things
 are added that aren't/don't. This trick does make the keys larger,
 which increases storage cost and IO load, so it's probably a bad
 idea if a significant subset of the data requires it.
 
 Disclaimer - I wrote that wiki article to fill in a documentation
 gap, since there were no examples of BOP and I wasted a lot of
 time before I noticed the hex byte array vs 

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-22 Thread Bryce Allen
Thanks, that definitely has advantages over using a super column. We
ran into thrift timeouts when the super column got large, and with the
super column range query there is no way (AFAIK) to batch the request at
the subcolumn level.

-Bryce

On Thu, 22 Dec 2011 10:06:58 +1300
aaron morton aa...@thelastpickle.com wrote:
 AFAIK there are no plans kill the BOP, but I would still try to make
 your life easier by using the RP. . 
 
 My understanding of the problem is at certain times you snapshot the
 files in a dir; and the main query you want to handle is At what
 points between time t0 and time t1 did files x,y and z exist?.
 
 You could consider:
 
 1) Partitioning the time series data in across each row, then make
 the row key is the timestamp for the start of the partition. If you
 have rollup partitions consider making the row key timestamp :
 partition_size , e.g. 123456789.1d for a 1 day partition that
 starts at 123456789 2) In each row use column names that have the
 form timestamp : file_name where time stamp is the time of the
 snapshot. 
 
 To query between two times (t0 and t1):
 
 1) Determine which partitions the time span covers, this will give
 you a list of rows. 2) Execute a multi-get slice for the all rows
 using  t0:* and t1:* (I'm using * here as a null, check with your
 client to see how to use composite columns.)
 
 Hope that helps. 
 Aaron
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 21/12/2011, at 9:03 AM, Bryce Allen wrote:
 
  I wasn't aware of CompositeColumns, thanks for the tip. However I
  think it still doesn't allow me to do the query I need - basically
  I need to do a timestamp range query, limiting only to certain file
  names at each timestamp. With BOP and a separate row for each
  timestamp, prefixed by a random UUID, and file names as column
  names, I can do this query. With CompositeColumns, I can only query
  one contiguous range, so I'd have to know the timestamps before
  hand to limit the file names. I can resolve this using indexes, but
  on paper it looks like this would be significantly slower (it would
  take me 5 round trips instead of 3 to complete each query, and the
  query is made multiple times on every single client request).
  
  The two down sides I've seen listed for BOP are balancing issues and
  hotspots. I can understand why RP is recommended, from the balancing
  issues alone. However these aren't problems for my application. Is
  there anything else I am missing? Does the Cassandra team plan on
  continuing to support BOP? I haven't completely ruled out RP, but I
  like having BOP as an option, it opens up interesting modeling
  alternatives that I think have real advantages for some
  (if uncommon) applications.
  
  Thanks,
  Bryce
  
  On Wed, 21 Dec 2011 08:08:16 +1300
  aaron morton aa...@thelastpickle.com wrote:
  Bryce, 
 Have you considered using CompositeColumns and a standard
  CF? Row key is the UUID column name is (timestamp : dir_entry) you
  can then slice all columns with a particular time stamp. 
  
 Even if you have a random key, I would use the RP unless
  you have an extreme use case. 
  
  Cheers
  
  -
  Aaron Morton
  Freelance Developer
  @aaronmorton
  http://www.thelastpickle.com
  
  On 21/12/2011, at 3:06 AM, Bryce Allen wrote:
  
  I think it comes down to how much you benefit from row range
  scans, and how confident you are that going forward all data will
  continue to use random row keys.
  
  I'm considering using BOP as a way of working around the non
  indexes super column limitation. In my current schema, row keys
  are random UUIDs, super column names are timestamps, and columns
  contain a snapshot in time of directory contents, and could be
  quite large. If instead I use row keys that are
  (uuid)-(timestamp), and use a standard column family, I can do a
  row range query and select only specific columns. I'm still
  evaluating if I can do this with BOP - ideally the token would
  just use the first 128 bits of the key, and I haven't found any
  documentation on how it compares keys of different length.
  
  Another trick with BOP is to use MD5(rowkey)-rowkey for data that
  has non uniform row keys. I think it's reasonable to use if most
  data is uniform and benefits from range scans, but a few things
  are added that aren't/don't. This trick does make the keys larger,
  which increases storage cost and IO load, so it's probably a bad
  idea if a significant subset of the data requires it.
  
  Disclaimer - I wrote that wiki article to fill in a documentation
  gap, since there were no examples of BOP and I wasted a lot of
  time before I noticed the hex byte array vs decimal distinction
  for specifying the initial tokens (which to be fair is
  documented, just easy to miss on a skim). I'm also new to
  cassandra, I'm just describing what makes sense to me on paper.
  FWIW I confirmed that random UUIDs (type 4) row 

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-21 Thread aaron morton
AFAIK there are no plans kill the BOP, but I would still try to make your life 
easier by using the RP. . 

My understanding of the problem is at certain times you snapshot the files in a 
dir; and the main query you want to handle is At what points between time t0 
and time t1 did files x,y and z exist?.

You could consider:

1) Partitioning the time series data in across each row, then make the row key 
is the timestamp for the start of the partition. If you have rollup partitions 
consider making the row key timestamp : partition_size , e.g. 
123456789.1d for a 1 day partition that starts at 123456789
2) In each row use column names that have the form timestamp : file_name 
where time stamp is the time of the snapshot. 

To query between two times (t0 and t1):

1) Determine which partitions the time span covers, this will give you a list 
of rows. 
2) Execute a multi-get slice for the all rows using  t0:* and t1:* (I'm 
using * here as a null, check with your client to see how to use composite 
columns.)

Hope that helps. 
Aaron


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 9:03 AM, Bryce Allen wrote:

 I wasn't aware of CompositeColumns, thanks for the tip. However I think
 it still doesn't allow me to do the query I need - basically I need to
 do a timestamp range query, limiting only to certain file names at
 each timestamp. With BOP and a separate row for each timestamp,
 prefixed by a random UUID, and file names as column names, I can do this
 query. With CompositeColumns, I can only query one contiguous range, so
 I'd have to know the timestamps before hand to limit the file names. I
 can resolve this using indexes, but on paper it looks like this would be
 significantly slower (it would take me 5 round trips instead of 3 to
 complete each query, and the query is made multiple times on every
 single client request).
 
 The two down sides I've seen listed for BOP are balancing issues and
 hotspots. I can understand why RP is recommended, from the balancing
 issues alone. However these aren't problems for my application. Is
 there anything else I am missing? Does the Cassandra team plan on
 continuing to support BOP? I haven't completely ruled out RP, but I
 like having BOP as an option, it opens up interesting modeling
 alternatives that I think have real advantages for some
 (if uncommon) applications.
 
 Thanks,
 Bryce
 
 On Wed, 21 Dec 2011 08:08:16 +1300
 aaron morton aa...@thelastpickle.com wrote:
 Bryce, 
  Have you considered using CompositeColumns and a standard CF?
 Row key is the UUID column name is (timestamp : dir_entry) you can
 then slice all columns with a particular time stamp. 
 
  Even if you have a random key, I would use the RP unless you
 have an extreme use case. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 21/12/2011, at 3:06 AM, Bryce Allen wrote:
 
 I think it comes down to how much you benefit from row range scans,
 and how confident you are that going forward all data will continue
 to use random row keys.
 
 I'm considering using BOP as a way of working around the non indexes
 super column limitation. In my current schema, row keys are random
 UUIDs, super column names are timestamps, and columns contain a
 snapshot in time of directory contents, and could be quite large. If
 instead I use row keys that are (uuid)-(timestamp), and use a
 standard column family, I can do a row range query and select only
 specific columns. I'm still evaluating if I can do this with BOP -
 ideally the token would just use the first 128 bits of the key, and
 I haven't found any documentation on how it compares keys of
 different length.
 
 Another trick with BOP is to use MD5(rowkey)-rowkey for data that
 has non uniform row keys. I think it's reasonable to use if most
 data is uniform and benefits from range scans, but a few things are
 added that aren't/don't. This trick does make the keys larger,
 which increases storage cost and IO load, so it's probably a bad
 idea if a significant subset of the data requires it.
 
 Disclaimer - I wrote that wiki article to fill in a documentation
 gap, since there were no examples of BOP and I wasted a lot of time
 before I noticed the hex byte array vs decimal distinction for
 specifying the initial tokens (which to be fair is documented, just
 easy to miss on a skim). I'm also new to cassandra, I'm just
 describing what makes sense to me on paper. FWIW I confirmed that
 random UUIDs (type 4) row keys really do evenly distribute when
 using BOP.
 
 -Bryce
 
 On Mon, 19 Dec 2011 19:01:00 -0800
 Drew Kutcharian d...@venarc.com wrote:
 Hey Guys,
 
 I just came across
 http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got
 me thinking. If the row keys are java.util.UUID which are generated
 randomly (and securely), then what type of partitioner would be the
 best? Since the key values are 

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-20 Thread Filipe Gonçalves
Generally, RandomPartitioner is the recommended one.
If you already provide randomized keys it doesn't make much of a
difference, the nodes should be balanced with any partitioner.
However, unless you have UUID in all keys of all column families
(highly unlikely) ByteOrderedPartitioner and
OrderPreservingPartitioning will lead to hotspots and unbalanced
rings.

2011/12/20 Drew Kutcharian d...@venarc.com:
 Hey Guys,

 I just came
 across http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me
 thinking. If the row keys are java.util.UUID which are generated randomly
 (and securely), then what type of partitioner would be the best? Since the
 key values are already random, would it make a difference to use
 RandomPartitioner or one can use ByteOrderedPartitioner or
 OrderPreservingPartitioning as well and get the same result?

 -- Drew




-- 
Filipe Gonçalves


Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-20 Thread Bryce Allen
I think it comes down to how much you benefit from row range scans, and
how confident you are that going forward all data will continue to use
random row keys.

I'm considering using BOP as a way of working around the non indexes
super column limitation. In my current schema, row keys are random
UUIDs, super column names are timestamps, and columns contain a
snapshot in time of directory contents, and could be quite large. If
instead I use row keys that are (uuid)-(timestamp), and use a standard
column family, I can do a row range query and select only specific
columns. I'm still evaluating if I can do this with BOP - ideally the
token would just use the first 128 bits of the key, and I haven't found
any documentation on how it compares keys of different length.

Another trick with BOP is to use MD5(rowkey)-rowkey for data that has
non uniform row keys. I think it's reasonable to use if most data is
uniform and benefits from range scans, but a few things are added that
aren't/don't. This trick does make the keys larger, which increases
storage cost and IO load, so it's probably a bad idea if a significant
subset of the data requires it.

Disclaimer - I wrote that wiki article to fill in a documentation gap,
since there were no examples of BOP and I wasted a lot of time before I
noticed the hex byte array vs decimal distinction for specifying the
initial tokens (which to be fair is documented, just easy to miss on a
skim). I'm also new to cassandra, I'm just describing what makes sense
to me on paper. FWIW I confirmed that random UUIDs (type 4) row keys
really do evenly distribute when using BOP.

-Bryce

On Mon, 19 Dec 2011 19:01:00 -0800
Drew Kutcharian d...@venarc.com wrote:
 Hey Guys,
 
 I just came across
 http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me
 thinking. If the row keys are java.util.UUID which are generated
 randomly (and securely), then what type of partitioner would be the
 best? Since the key values are already random, would it make a
 difference to use RandomPartitioner or one can use
 ByteOrderedPartitioner or OrderPreservingPartitioning as well and get
 the same result?
 
 -- Drew
 


signature.asc
Description: PGP signature


Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-20 Thread aaron morton
Bryce, 
Have you considered using CompositeColumns and a standard CF? Row key 
is the UUID column name is (timestamp : dir_entry) you can then slice all 
columns with a particular time stamp. 

Even if you have a random key, I would use the RP unless you have an 
extreme use case. 

 Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 3:06 AM, Bryce Allen wrote:

 I think it comes down to how much you benefit from row range scans, and
 how confident you are that going forward all data will continue to use
 random row keys.
 
 I'm considering using BOP as a way of working around the non indexes
 super column limitation. In my current schema, row keys are random
 UUIDs, super column names are timestamps, and columns contain a
 snapshot in time of directory contents, and could be quite large. If
 instead I use row keys that are (uuid)-(timestamp), and use a standard
 column family, I can do a row range query and select only specific
 columns. I'm still evaluating if I can do this with BOP - ideally the
 token would just use the first 128 bits of the key, and I haven't found
 any documentation on how it compares keys of different length.
 
 Another trick with BOP is to use MD5(rowkey)-rowkey for data that has
 non uniform row keys. I think it's reasonable to use if most data is
 uniform and benefits from range scans, but a few things are added that
 aren't/don't. This trick does make the keys larger, which increases
 storage cost and IO load, so it's probably a bad idea if a significant
 subset of the data requires it.
 
 Disclaimer - I wrote that wiki article to fill in a documentation gap,
 since there were no examples of BOP and I wasted a lot of time before I
 noticed the hex byte array vs decimal distinction for specifying the
 initial tokens (which to be fair is documented, just easy to miss on a
 skim). I'm also new to cassandra, I'm just describing what makes sense
 to me on paper. FWIW I confirmed that random UUIDs (type 4) row keys
 really do evenly distribute when using BOP.
 
 -Bryce
 
 On Mon, 19 Dec 2011 19:01:00 -0800
 Drew Kutcharian d...@venarc.com wrote:
 Hey Guys,
 
 I just came across
 http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me
 thinking. If the row keys are java.util.UUID which are generated
 randomly (and securely), then what type of partitioner would be the
 best? Since the key values are already random, would it make a
 difference to use RandomPartitioner or one can use
 ByteOrderedPartitioner or OrderPreservingPartitioning as well and get
 the same result?
 
 -- Drew
 



Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-20 Thread Bryce Allen
I wasn't aware of CompositeColumns, thanks for the tip. However I think
it still doesn't allow me to do the query I need - basically I need to
do a timestamp range query, limiting only to certain file names at
each timestamp. With BOP and a separate row for each timestamp,
prefixed by a random UUID, and file names as column names, I can do this
query. With CompositeColumns, I can only query one contiguous range, so
I'd have to know the timestamps before hand to limit the file names. I
can resolve this using indexes, but on paper it looks like this would be
significantly slower (it would take me 5 round trips instead of 3 to
complete each query, and the query is made multiple times on every
single client request).

The two down sides I've seen listed for BOP are balancing issues and
hotspots. I can understand why RP is recommended, from the balancing
issues alone. However these aren't problems for my application. Is
there anything else I am missing? Does the Cassandra team plan on
continuing to support BOP? I haven't completely ruled out RP, but I
like having BOP as an option, it opens up interesting modeling
alternatives that I think have real advantages for some
(if uncommon) applications.

Thanks,
Bryce

On Wed, 21 Dec 2011 08:08:16 +1300
aaron morton aa...@thelastpickle.com wrote:
 Bryce, 
   Have you considered using CompositeColumns and a standard CF?
 Row key is the UUID column name is (timestamp : dir_entry) you can
 then slice all columns with a particular time stamp. 
 
   Even if you have a random key, I would use the RP unless you
 have an extreme use case. 
 
  Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 21/12/2011, at 3:06 AM, Bryce Allen wrote:
 
  I think it comes down to how much you benefit from row range scans,
  and how confident you are that going forward all data will continue
  to use random row keys.
  
  I'm considering using BOP as a way of working around the non indexes
  super column limitation. In my current schema, row keys are random
  UUIDs, super column names are timestamps, and columns contain a
  snapshot in time of directory contents, and could be quite large. If
  instead I use row keys that are (uuid)-(timestamp), and use a
  standard column family, I can do a row range query and select only
  specific columns. I'm still evaluating if I can do this with BOP -
  ideally the token would just use the first 128 bits of the key, and
  I haven't found any documentation on how it compares keys of
  different length.
  
  Another trick with BOP is to use MD5(rowkey)-rowkey for data that
  has non uniform row keys. I think it's reasonable to use if most
  data is uniform and benefits from range scans, but a few things are
  added that aren't/don't. This trick does make the keys larger,
  which increases storage cost and IO load, so it's probably a bad
  idea if a significant subset of the data requires it.
  
  Disclaimer - I wrote that wiki article to fill in a documentation
  gap, since there were no examples of BOP and I wasted a lot of time
  before I noticed the hex byte array vs decimal distinction for
  specifying the initial tokens (which to be fair is documented, just
  easy to miss on a skim). I'm also new to cassandra, I'm just
  describing what makes sense to me on paper. FWIW I confirmed that
  random UUIDs (type 4) row keys really do evenly distribute when
  using BOP.
  
  -Bryce
  
  On Mon, 19 Dec 2011 19:01:00 -0800
  Drew Kutcharian d...@venarc.com wrote:
  Hey Guys,
  
  I just came across
  http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got
  me thinking. If the row keys are java.util.UUID which are generated
  randomly (and securely), then what type of partitioner would be the
  best? Since the key values are already random, would it make a
  difference to use RandomPartitioner or one can use
  ByteOrderedPartitioner or OrderPreservingPartitioning as well and
  get the same result?
  
  -- Drew
  
 


signature.asc
Description: PGP signature


Choosing a Partitioner Type for Random java.util.UUID Row Keys

2011-12-19 Thread Drew Kutcharian
Hey Guys,

I just came across http://wiki.apache.org/cassandra/ByteOrderedPartitioner and 
it got me thinking. If the row keys are java.util.UUID which are generated 
randomly (and securely), then what type of partitioner would be the best? Since 
the key values are already random, would it make a difference to use 
RandomPartitioner or one can use ByteOrderedPartitioner or 
OrderPreservingPartitioning as well and get the same result?

-- Drew



Re: about the partitioner type

2011-01-24 Thread aaron morton
The OrderPreservingPartitioner will treat the key byte array as a UTF8 string. 
Specifically it uses the nio Charset decoder 
http://download.oracle.com/javase/1.4.2/docs/api/java/nio/charset/Charset.html 
to turn the byte array into a string.

The ByteOrderedPartitioner will treat the key as a simple byte array. 

Aaron

On 24/01/2011, at 5:10 PM, raoyixuan (Shandy) wrote:

 How to compare the key itself (as a byte array) to the tokens in the ring in 
 OrderPreservingPartitioner? As in know, the node which value is larger than 
 key value will handle the key in random partitioner
  
 From: Tyler Hobbs [mailto:ty...@riptano.com] 
 Sent: Monday, January 24, 2011 11:43 AM
 To: user@cassandra.apache.org
 Subject: Re: about the partitioner type
  
 http://www.datastax.com/docs/0.7/operations/clustering#tokens-partitioners-and-the-ring
 
 - Tyler
 
 2011/1/23 raoyixuan (Shandy) raoyix...@huawei.com
 whether the random/order partitioner specify the token for  the  node ,not 
 for key?
  
 华为技术有限公司 Huawei Technologies Co., Ltd.
 
  
  
  
 Phone: 28358610
 Mobile: 13425182943
 Email: raoyix...@huawei.com
 地址:深圳市龙岗区坂田华为基地 邮编:518129
 Huawei Technologies Co., Ltd.
 Bantian, Longgang District,Shenzhen 518129, P.R.China
 http://www.huawei.com
 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
 This e-mail and its attachments contain confidential information from HUAWEI, 
 which 
 is intended only for the person or entity whose address is listed above. Any 
 use of the 
 information contained herein in any way (including, but not limited to, total 
 or partial 
 disclosure, reproduction, or dissemination) by persons other than the 
 intended 
 recipient(s) is prohibited. If you receive this e-mail in error, please 
 notify the sender by 
 phone or email immediately and delete it!
  
  



about the partitioner type

2011-01-23 Thread raoyixuan (Shandy)
whether the random/order partitioner specify the token for  the  node ,not for 
key?

华为技术有限公司 Huawei Technologies Co., Ltd.[Company_logo]




Phone: 28358610
Mobile: 13425182943
Email: raoyix...@huawei.commailto:raoyix...@huawei.com
地址:深圳市龙岗区坂田华为基地 邮编:518129
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com

本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI, 
which
is intended only for the person or entity whose address is listed above. Any 
use of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by
phone or email immediately and delete it!

inline: image001.jpg

Re: about the partitioner type

2011-01-23 Thread Tyler Hobbs
http://www.datastax.com/docs/0.7/operations/clustering#tokens-partitioners-and-the-ring

- Tyler

2011/1/23 raoyixuan (Shandy) raoyix...@huawei.com

  whether the random/order partitioner specify the token for  the  node
 ,not for key?



 华为技术有限公司 Huawei Technologies Co., Ltd.[image: Company_logo]







 Phone: 28358610
 Mobile: 13425182943
 Email: raoyix...@huawei.com
 地址:深圳市龙岗区坂田华为基地 邮编:518129
 Huawei Technologies Co., Ltd.
 Bantian, Longgang District,Shenzhen 518129, P.R.China
 http://www.huawei.com
  --

 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
 This e-mail and its attachments contain confidential information from
 HUAWEI, which
 is intended only for the person or entity whose address is listed above.
 Any use of the
 information contained herein in any way (including, but not limited to,
 total or partial
 disclosure, reproduction, or dissemination) by persons other than the
 intended
 recipient(s) is prohibited. If you receive this e-mail in error, please
 notify the sender by
 phone or email immediately and delete it!





RE: about the partitioner type

2011-01-23 Thread raoyixuan (Shandy)
How to compare the key itself (as a byte array) to the tokens in the ring in 
OrderPreservingPartitioner? As in know, the node which value is larger than key 
value will handle the key in random partitioner

From: Tyler Hobbs [mailto:ty...@riptano.com]
Sent: Monday, January 24, 2011 11:43 AM
To: user@cassandra.apache.org
Subject: Re: about the partitioner type

http://www.datastax.com/docs/0.7/operations/clustering#tokens-partitioners-and-the-ring

- Tyler
2011/1/23 raoyixuan (Shandy) raoyix...@huawei.commailto:raoyix...@huawei.com
whether the random/order partitioner specify the token for  the  node ,not for 
key?

华为技术有限公司 Huawei Technologies Co., Ltd.[Company_logo]



Phone: 28358610
Mobile: 13425182943
Email: raoyix...@huawei.commailto:raoyix...@huawei.com
地址:深圳市龙岗区坂田华为基地 邮编:518129
Huawei Technologies Co., Ltd.
Bantian, Longgang District,Shenzhen 518129, P.R.China
http://www.huawei.com

本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁
止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中
的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件!
This e-mail and its attachments contain confidential information from HUAWEI, 
which
is intended only for the person or entity whose address is listed above. Any 
use of the
information contained herein in any way (including, but not limited to, total 
or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify 
the sender by
phone or email immediately and delete it!