Re: Is column update column-atomic or row atomic?

2011-03-16 Thread Peter Schuller
 Sorry for the rather primitive question, but it's not clear to me if I need
 to fetch the whole row, add a column as a dictionary entry and re-insert it
 if I want to expand the row by one column. Help will be appreciated.

As was pointed you, reading and re-inserting is definitely not the way
to go. But note that when inserting a column, there is never going to
be a guarantee that other columns are not inserted/deleted
concurrently by other writers (unless there is external
synchronization).

Your question makes me believe you're trying to ensure some kind of
consistency across multiple columns in a row. Maybe if you describe
your use-case.

-- 
/ Peter Schuller


Re: Getting list of active cassandra nodes

2011-03-16 Thread aaron morton
moving to user list. 

describe_ring() will give you a list of the token ranges and the nodes that are 
responsible for them http://wiki.apache.org/cassandra/API . It does not include 
information on which nodes are up or down or bootstrapping. 

Information about the state of the nodes is available on the StorageService JMX 
MBean.

AAron

On 16 Mar 2011, at 15:10, Anurag Gujral wrote:

 Hi All,
  How can I get the list of active cassandra nodes using cassandra
 api 0.7.
 
 Thanks a ton,
 Anurag



swap setting on linux

2011-03-16 Thread ruslan usifov
Dear community!

Please share you settings for swap on linux box


Re: swap setting on linux

2011-03-16 Thread Maki Watanabe
According to Cassandra Wiki, best strategy is no swap at all.
http://wiki.apache.org/cassandra/MemtableThresholds#Virtual_Memory_and_Swap

2011/3/16 ruslan usifov ruslan.usi...@gmail.com:
 Dear community!

 Please share you settings for swap on linux box

-- 
w3m


Upgrade to a different version?

2011-03-16 Thread Jake Maizel
We are running 0.6.6 and are considering upgrading to either 0.6.8 or
one of the 0.7.x releases.  What is the recommended version and
procedure?  What are the issues we face?  Are there any specific
storage gotchas we need to be aware of?  Are there any docs around
this process for review?

Thanks,

jake

-- 
Jake Maizel
Soundcloud

Mail  GTalk: j...@soundcloud.com
Skype: jakecloud

Rosenthaler strasse 13, 101 19, Berlin, DE


Re: where to find the stress testing programs?

2011-03-16 Thread Eric Gilmore
There are both Python and Java stress testing tools.  I found the Java
version easier to use.  These directions (which echo the README for
stress.java) may help get you going:
http://www.datastax.com/docs/0.7/utilities/stress_java

On Tue, Mar 15, 2011 at 9:25 AM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote:

 contrib is only in the source download of cassandra

 On Mar 15, 2011, at 11:23 AM, Jonathan Colby wrote:

  According to the Cassandra Wiki and OReilly book supposedly there is a
  contrib directory within the cassandra download containing the
  Python Stress Test script stress.py.  It's not in the binary tarball
  of 0.7.3.
 
  Anyone know where to find it?
 
  Anyone know of other, maybe better stress testing scripts?
 
  Jon




Re: Upgrade to a different version?

2011-03-16 Thread Paul Pak
Hi Jake,

I'm sending this privately, because I wanted to tell you my opinion frankly.

I don't know about the .6 series or .74, but so far, all of the .7
series of cassandra has been a disaster.  I would think twice about
switching to anything in .7 series to production until things stabilize
and at least one reasonably large site starts using cassandra .7. 
Jonathan claims reddit is using cassandra, but it can't be a good
experience with the type of bugs that have been found.

.70 had data corruption issues
.71 also had data corruption issues, had major issues with anything over
2 gigs in memory
.72 issues with reading properly
.73 had major issues with anything over 2 gigs in memory, had issues
with performance due to flushing rules being broken, many people had
huge issues with large amounts of insertions, and a few had startup issues.
.74 too new to say.

In either case, do a lot of testing for your use case before switching
as things in the .7 series are still way in development.  I've talked to
Jonathan about putting it into beta status because of the severity of
the bugs, but so far, there has been no decision to do so.  Good luck.

Paul

On 3/16/2011 1:21 PM, Jake Maizel wrote:
 We are running 0.6.6 and are considering upgrading to either 0.6.8 or
 one of the 0.7.x releases.  What is the recommended version and
 procedure?  What are the issues we face?  Are there any specific
 storage gotchas we need to be aware of?  Are there any docs around
 this process for review?

 Thanks,

 jake




memory usage for secondary indexes

2011-03-16 Thread aaron morton
Was just reading through the code to get an understanding of the memory impact 
for secondary indexes. The index CF is created with the same memtable settings 
as the parent CF (in CFMetaData.newIndexMetadata). 

Does this mean that when estimating JVM heap size each index should be 
considered as another CF? I'll update the wiki with the answer 
http://wiki.apache.org/cassandra/MemtableThresholds 

Cheers
Aaron
 

Re: On 0.6.6 to 0.7.3 migration, DC-aware traffic and minimising data transfer

2011-03-16 Thread Jonathan Ellis
That should work then, assuming SimpleStrategy/RackUnawareStrategy.
Otherwise figuring out which machines share which data gets
complicated.

Note that if you have room on the machines, it's going to be faster to
copy the entire data set to each machine and run cleanup, than to have
repair fix 3 of 4 replicas from scratch.  Repair would work,
eventually, but it's kind of a worst-case scenario for it.

On Mon, Mar 14, 2011 at 10:39 AM, Jedd Rashbrooke j...@visualdna.com wrote:
  Jonathon, thank you for your answers here.

  To explain this bit ...

 On 11 March 2011 20:46, Jonathan Ellis jbel...@gmail.com wrote:
 On Thu, Mar 10, 2011 at 6:06 AM, Jedd Rashbrooke j...@visualdna.com wrote:
  Copying a cluster between AWS DC's:
  We have ~ 150-250GB per node, with a Replication Factor of 4.
  I ack that 0.6 - 0.7 is necessarily STW, so in an attempt to
  minimise that outage period I was wondering if it's possible to
  drain  stop the cluster, then copy over only the 1st, 5th, 9th,
  and 13th nodes' worth of data (which should be a full copy of
  all our actual data - we are nicely partitioned, despite the
  disparity in GB per node) and have Cassandra re-populate the
  new destination 16 nodes from those four data sets.  If this is
  feasible, is it likely to be more expensive (in terms of time the
  new cluster is unresponsive as it rebuilds) than just copying
  across all 16 sets of data - about 2.7TB.

 I'm confused.  You're trying to upgrade and add a DC at the same time?

  Yeah, I know, it's probably not the sanest route - but the hardware
  (virtualised, Amazonish EC2 that it is) will be the same between
  the two sites, so that reduces some of the usual roll in / roll out
  migration risk.

  But more importantly for us it would mean we'd have just the
  one major outage, rather than two (relocation and 0.6 - 0.7)

  cheers,
  Jedd.




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: Upgrade to a different version?

2011-03-16 Thread Paul Pak
Sorry guys, that was meant to be private.  My opinion stands, but I
didn't want to hurt any of the dev's feelings by being too frank.  I
think the progress has been good in new features, but I feel we have
taken a step back in relability and scalability since so many features
were added without adequate testing.  Hopefully, at some point soon, it
will get better and doing a data import job won't take a cassandra
cluster to it's knees or we won't experience stop the world GC issues
and have out of memory errors from routine usage.

Paul

On 3/16/2011 2:13 PM, Paul Pak wrote:
 Hi Jake,

 I'm sending this privately, because I wanted to tell you my opinion frankly.

 I don't know about the .6 series or .74, but so far, all of the .7
 series of cassandra has been a disaster.  I would think twice about
 switching to anything in .7 series to production until things stabilize
 and at least one reasonably large site starts using cassandra .7. 
 Jonathan claims reddit is using cassandra, but it can't be a good
 experience with the type of bugs that have been found.

 .70 had data corruption issues
 .71 also had data corruption issues, had major issues with anything over
 2 gigs in memory
 .72 issues with reading properly
 .73 had major issues with anything over 2 gigs in memory, had issues
 with performance due to flushing rules being broken, many people had
 huge issues with large amounts of insertions, and a few had startup issues.
 .74 too new to say.

 In either case, do a lot of testing for your use case before switching
 as things in the .7 series are still way in development.  I've talked to
 Jonathan about putting it into beta status because of the severity of
 the bugs, but so far, there has been no decision to do so.  Good luck.

 Paul

 On 3/16/2011 1:21 PM, Jake Maizel wrote:
 We are running 0.6.6 and are considering upgrading to either 0.6.8 or
 one of the 0.7.x releases.  What is the recommended version and
 procedure?  What are the issues we face?  Are there any specific
 storage gotchas we need to be aware of?  Are there any docs around
 this process for review?

 Thanks,

 jake





replace one node to onother

2011-03-16 Thread ruslan usifov
Hello

For example if we want change one server to another with ip address change
too. How can we that eases way? For now we do nodetool removetocken, then
set autobootstrap: true on new server (with the token that was on old node)


Re: Upgrade to a different version?

2011-03-16 Thread Joshua Partogi
So did you downgraded it back to 0.6.x series?

On Thu, Mar 17, 2011 at 6:36 AM, Paul Pak p...@yellowseo.com wrote:
 Sorry guys, that was meant to be private.  My opinion stands, but I
 didn't want to hurt any of the dev's feelings by being too frank.  I
 think the progress has been good in new features, but I feel we have
 taken a step back in relability and scalability since so many features
 were added without adequate testing.  Hopefully, at some point soon, it
 will get better and doing a data import job won't take a cassandra
 cluster to it's knees or we won't experience stop the world GC issues
 and have out of memory errors from routine usage.

 Paul

 On 3/16/2011 2:13 PM, Paul Pak wrote:
 Hi Jake,

 I'm sending this privately, because I wanted to tell you my opinion frankly.

 I don't know about the .6 series or .74, but so far, all of the .7
 series of cassandra has been a disaster.  I would think twice about
 switching to anything in .7 series to production until things stabilize
 and at least one reasonably large site starts using cassandra .7.
 Jonathan claims reddit is using cassandra, but it can't be a good
 experience with the type of bugs that have been found.

 .70 had data corruption issues
 .71 also had data corruption issues, had major issues with anything over
 2 gigs in memory
 .72 issues with reading properly
 .73 had major issues with anything over 2 gigs in memory, had issues
 with performance due to flushing rules being broken, many people had
 huge issues with large amounts of insertions, and a few had startup issues.
 .74 too new to say.

 In either case, do a lot of testing for your use case before switching
 as things in the .7 series are still way in development.  I've talked to
 Jonathan about putting it into beta status because of the severity of
 the bugs, but so far, there has been no decision to do so.  Good luck.

 Paul

 On 3/16/2011 1:21 PM, Jake Maizel wrote:
 We are running 0.6.6 and are considering upgrading to either 0.6.8 or
 one of the 0.7.x releases.  What is the recommended version and
 procedure?  What are the issues we face?  Are there any specific
 storage gotchas we need to be aware of?  Are there any docs around
 this process for review?

 Thanks,

 jake







-- 
http://twitter.com/jpartogi


Please help decipher /proc/cpuinfo for optimal Cassandra config

2011-03-16 Thread buddhasystem
Dear All,
this is from my new Cassandra server. It obviously uses hyperthreading, I
just don't know how to translate this to concurrent readers and writers in
cassandra.yaml -- can somebody take a look and tell me what number of cores
I need to assume for concurrent_reads and concurrent_writes. Is it 24?
Thanks!

[cassandra@cassandra01 bin]$ cat /proc/cpuinfo
processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 0
cpu cores   : 6
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.91
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 1
cpu cores   : 6
apicid  : 2
initial apicid  : 2
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.15
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 2
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 2
cpu cores   : 6
apicid  : 4
initial apicid  : 4
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.15
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 3
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 8
cpu cores   : 6
apicid  : 16
initial apicid  : 16
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
flexpriority ept vpid
bogomips: 5333.15
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management:

processor   : 4
vendor_id   : GenuineIntel
cpu family  : 6
model   : 44
model name  : Intel(R) Xeon(R) CPU   X5650  @ 2.67GHz
stepping: 2
cpu MHz : 1596.000
cache size  : 12288 KB
physical id : 0
siblings: 12
core id : 9
cpu cores   : 6
apicid  : 18
initial apicid  : 18
fpu : yes
fpu_exception   : yes
cpuid level : 11
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16

Re: Is column update column-atomic or row atomic?

2011-03-16 Thread buddhasystem
Hello Peter, thanks for the note.

I'm not looking for anything fancy. It's just when I'm looking at the
following bit of Pycassa docs, it's not 100% clear to me that it won't
overwrite the entire row for the key, if I want to simply add an extra
column {'foo':'bar'} to the already existing row. I don't care about
cross-node consistency at this point.

insert(key, columns[, timestamp][, ttl][, write_consistency_level])¶

Insert or update columns in the row with key key.

columns should be a dictionary of columns or super columns to insert or
update. If this is a standard column family, columns should look like
{column_name: column_value}. If this is a super column family, columns
should look like {super_column_name: {sub_column_name: value}}

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6179492.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Upgrade to a different version?

2011-03-16 Thread Jeremy Hanna
Paul,

Don't feel like you have to hold back when it comes to feedback.  There is a 
place to vote on releases.  If you have something that could potentially be 
critical that you can isolate, by all means chime in.  Even if your vote isn't 
binding if you are not a committer, votes with something credible behind them 
get taken seriously.  Votes happen on the dev@cassandra mailing list.  
Alternately, feel free to create Jira tickets any time.

Also, there are unit tests, integration tests, and distributed tests.  If you 
feel like you can add to any of these, please get involved.  It sounds like you 
already do internal testing so it might be fairly simple to add to some of 
these tests.  Wrt the distributed tests, some devs at twitter along with others 
have contributed a distributed test harness for Cassandra which has been in 0.7 
since 0.7.1.  See CASSANDRA-1859 for the beginning and 
http://svn.apache.org/repos/asf/cassandra/branches/cassandra-0.7/test/ for the 
latest.  This uses apache whirr to spin up some nodes and runs tests over them.

In any case, we all want to make a solid release and if you have specifics on 
what can make it better, it would benefit the whole community.

Jeremy

On Mar 16, 2011, at 2:36 PM, Paul Pak wrote:

 Sorry guys, that was meant to be private.  My opinion stands, but I
 didn't want to hurt any of the dev's feelings by being too frank.  I
 think the progress has been good in new features, but I feel we have
 taken a step back in relability and scalability since so many features
 were added without adequate testing.  Hopefully, at some point soon, it
 will get better and doing a data import job won't take a cassandra
 cluster to it's knees or we won't experience stop the world GC issues
 and have out of memory errors from routine usage.
 
 Paul
 
 On 3/16/2011 2:13 PM, Paul Pak wrote:
 Hi Jake,
 
 I'm sending this privately, because I wanted to tell you my opinion frankly.
 
 I don't know about the .6 series or .74, but so far, all of the .7
 series of cassandra has been a disaster.  I would think twice about
 switching to anything in .7 series to production until things stabilize
 and at least one reasonably large site starts using cassandra .7. 
 Jonathan claims reddit is using cassandra, but it can't be a good
 experience with the type of bugs that have been found.
 
 .70 had data corruption issues
 .71 also had data corruption issues, had major issues with anything over
 2 gigs in memory
 .72 issues with reading properly
 .73 had major issues with anything over 2 gigs in memory, had issues
 with performance due to flushing rules being broken, many people had
 huge issues with large amounts of insertions, and a few had startup issues.
 .74 too new to say.
 
 In either case, do a lot of testing for your use case before switching
 as things in the .7 series are still way in development.  I've talked to
 Jonathan about putting it into beta status because of the severity of
 the bugs, but so far, there has been no decision to do so.  Good luck.
 
 Paul
 
 On 3/16/2011 1:21 PM, Jake Maizel wrote:
 We are running 0.6.6 and are considering upgrading to either 0.6.8 or
 one of the 0.7.x releases.  What is the recommended version and
 procedure?  What are the issues we face?  Are there any specific
 storage gotchas we need to be aware of?  Are there any docs around
 this process for review?
 
 Thanks,
 
 jake
 
 
 



Re: Please help decipher /proc/cpuinfo for optimal Cassandra config

2011-03-16 Thread Edward Capriolo
On Wed, Mar 16, 2011 at 9:58 PM, buddhasystem potek...@bnl.gov wrote:
 Dear All,
 this is from my new Cassandra server. It obviously uses hyperthreading, I
 just don't know how to translate this to concurrent readers and writers in
 cassandra.yaml -- can somebody take a look and tell me what number of cores
 I need to assume for concurrent_reads and concurrent_writes. Is it 24?
 Thanks!

 [cassandra@cassandra01 bin]$ cat /proc/cpuinfo
 processor       : 0
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 44
 model name      : Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
 stepping        : 2
 cpu MHz         : 1596.000
 cache size      : 12288 KB
 physical id     : 0
 siblings        : 12
 core id         : 0
 cpu cores       : 6
 apicid          : 0
 initial apicid  : 0
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 11
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
 pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
 aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
 flexpriority ept vpid
 bogomips        : 5333.91
 clflush size    : 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:

 processor       : 1
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 44
 model name      : Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
 stepping        : 2
 cpu MHz         : 1596.000
 cache size      : 12288 KB
 physical id     : 0
 siblings        : 12
 core id         : 1
 cpu cores       : 6
 apicid          : 2
 initial apicid  : 2
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 11
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
 pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
 aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
 flexpriority ept vpid
 bogomips        : 5333.15
 clflush size    : 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:

 processor       : 2
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 44
 model name      : Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
 stepping        : 2
 cpu MHz         : 1596.000
 cache size      : 12288 KB
 physical id     : 0
 siblings        : 12
 core id         : 2
 cpu cores       : 6
 apicid          : 4
 initial apicid  : 4
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 11
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
 pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
 aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
 flexpriority ept vpid
 bogomips        : 5333.15
 clflush size    : 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:

 processor       : 3
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 44
 model name      : Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
 stepping        : 2
 cpu MHz         : 1596.000
 cache size      : 12288 KB
 physical id     : 0
 siblings        : 12
 core id         : 8
 cpu cores       : 6
 apicid          : 16
 initial apicid  : 16
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 11
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
 pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb
 rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc
 aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm arat tpr_shadow vnmi
 flexpriority ept vpid
 bogomips        : 5333.15
 clflush size    : 64
 cache_alignment : 64
 address sizes   : 40 bits physical, 48 bits virtual
 power management:

 processor       : 4
 vendor_id       : GenuineIntel
 cpu family      : 6
 model           : 44
 model name      : Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
 stepping        : 2
 cpu MHz         : 1596.000
 cache size      : 12288 KB
 physical id     : 0
 siblings        : 12
 core id         : 9
 cpu cores       : 6
 apicid          : 18
 initial apicid  : 18
 fpu             : yes
 fpu_exception   : yes
 cpuid level     : 11
 wp              : yes
 flags           : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
 pat pse36 

Re: Is column update column-atomic or row atomic?

2011-03-16 Thread Tyler Hobbs
insert() will only overwrite (or insert) the columns that you supply in the
dictionary.  So, if you do:

  cf.insert('key', {'foo': 'bar'})

and the column 'foo' doesn't exist in that row yet, the column will simply
be added to the other columns in the row.

On Wed, Mar 16, 2011 at 9:00 PM, buddhasystem potek...@bnl.gov wrote:

 Hello Peter, thanks for the note.

 I'm not looking for anything fancy. It's just when I'm looking at the
 following bit of Pycassa docs, it's not 100% clear to me that it won't
 overwrite the entire row for the key, if I want to simply add an extra
 column {'foo':'bar'} to the already existing row. I don't care about
 cross-node consistency at this point.

 insert(key, columns[, timestamp][, ttl][, write_consistency_level])¶

Insert or update columns in the row with key key.

columns should be a dictionary of columns or super columns to insert or
 update. If this is a standard column family, columns should look like
 {column_name: column_value}. If this is a super column family, columns
 should look like {super_column_name: {sub_column_name: value}}

 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6179492.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at
 Nabble.com.




-- 
Tyler Hobbs
Software Engineer, DataStax http://datastax.com/
Maintainer of the pycassa http://github.com/pycassa/pycassa Cassandra
Python client library


Re: reduced cached mem; resident set size growth

2011-03-16 Thread Zhu Han
On Thu, Feb 3, 2011 at 1:49 AM, Ryan King r...@twitter.com wrote:

 On Wed, Feb 2, 2011 at 6:22 AM, Chris Burroughs
 chris.burrou...@gmail.com wrote:
  On 01/28/2011 09:19 PM, Chris Burroughs wrote:
  Thanks Oleg and Zhu.  I swear that wasn't a new hotspot version when I
  checked, but that's obviously not the case.  I'll update one node to the
  latest as soon as I can and report back.
 
 
  RSS over 48 hours with java 6 update 23:
 
  http://img716.imageshack.us/img716/5202/u2348hours.png
 
  I'll continue monitoring but RSS still appears to grow without bounds.
  Zhu reported a similar problem with Ubuntu 10.04.  While possible, it
  would seem seam extraordinary unlikely that there is a glibc or kernel
  bug affecting us both.

 We're seeing a similar problem with one of our clusters (but over a
 longer time scale). Its possible that its not a leak, but just
 fragmentation. Unless you've told it otherwise, the jvm uses glibc's
 malloc implementation for off-heap allocations. We're currently
 running a test with jemalloc on one node to see if the problem goes
 away.


Ryan, does jemalloc solve the RSS growth problem in your test?

-ryan



Re: Is column update column-atomic or row atomic?

2011-03-16 Thread buddhasystem
Thanks for clarification, Tyler, sorry again for the basic question. I've
been doing straight inserts from Oracle so far but now I need to update rows
with new columns.


--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Is-column-update-column-atomic-or-row-atomic-tp6174445p6179536.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Please help decipher /proc/cpuinfo for optimal Cassandra config

2011-03-16 Thread buddhasystem
Thanks! Docs say it's good to set it to 8*Ncores, are saying you see 8 cores
in this output? I know I need to go way above default 32 with this setup.

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Please-help-decipher-proc-cpuinfo-for-optimal-Cassandra-config-tp6179487p6179539.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: reduced cached mem; resident set size growth

2011-03-16 Thread Zhu Han
On Thu, Mar 17, 2011 at 10:27 AM, Zhu Han schumi@gmail.com wrote:



 On Thu, Feb 3, 2011 at 1:49 AM, Ryan King r...@twitter.com wrote:

 On Wed, Feb 2, 2011 at 6:22 AM, Chris Burroughs
 chris.burrou...@gmail.com wrote:
  On 01/28/2011 09:19 PM, Chris Burroughs wrote:
  Thanks Oleg and Zhu.  I swear that wasn't a new hotspot version when I
  checked, but that's obviously not the case.  I'll update one node to
 the
  latest as soon as I can and report back.
 
 
  RSS over 48 hours with java 6 update 23:
 
  http://img716.imageshack.us/img716/5202/u2348hours.png
 
  I'll continue monitoring but RSS still appears to grow without bounds.
  Zhu reported a similar problem with Ubuntu 10.04.  While possible, it
  would seem seam extraordinary unlikely that there is a glibc or kernel
  bug affecting us both.

 We're seeing a similar problem with one of our clusters (but over a
 longer time scale).


Does it mean not all your clusters running cassandra observed the same RSS
growth problem?



 Its possible that its not a leak, but just
 fragmentation. Unless you've told it otherwise, the jvm uses glibc's
 malloc implementation for off-heap allocations. We're currently
 running a test with jemalloc on one node to see if the problem goes
 away.


 Ryan, does jemalloc solve the RSS growth problem in your test?

  -ryan





super_column.name?

2011-03-16 Thread Michael Fortin
Hi,

I've been working on a scala based api for cassandra.  I've built it directly 
on top of thrift.  I'm having a problem getting a slice of a superColumn.  When 
I get a columnOrSuperColumn back, and call 'cos.super_column.name' and 
deserialize the bytes I'm not getting the expected output.

Here's whats in cassandra
---
RowKey: key
= (super_column=super-col-0,
 (column=column, value=76616c756530, timestamp=1300330948240)
 (column=column1, value=76616c756530, timestamp=1300330948244))
….

and this is the deserialized string

? get_slicesuper-col-0
columnvalue0
.?æ?column1value0
.?æ?super-col-1columnvalue1
.?æ?column1value1
.?æ?super-col-2columnvalue2
.?æ?column1value2
.?æ?super-col-3columnvalue3
.?æ?column1value3
.?æ?

I would expect 
super-col-0

Any ideas on what I'm doing wrong?

Thanks,
Mike

Re: AW: problems while TimeUUIDType-index-querying with two expressions

2011-03-16 Thread Jonathan Ellis
Thanks for tracking that down, Roland.  I've created
https://issues.apache.org/jira/browse/CASSANDRA-2347 to fix this.

On Wed, Mar 16, 2011 at 10:37 AM, Roland Gude roland.g...@yoochoose.com wrote:
 I have applied the suggested changes in my local source tree and did run all
 my testcases (the supplied ones as well as those with real data).

 They do work now.



 Von: Roland Gude [mailto:roland.g...@yoochoose.com]
 Gesendet: Mittwoch, 16. März 2011 16:29

 An: user@cassandra.apache.org
 Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
 expressions



 With debugging into it i found something that might be the issue (please
 correct me if I am wrong):

 In ColumnFamilyStore.java lines 1597 to 1613 is the code that checks whether
 some column satisfies an index expression.

 In line 1608 it compares the value of the index expression with the value
 given in the expression.



 For this comparison it utilizes the comparator of the columnfamily while it
 should use the comparator of the Column validation class.



     private static boolean satisfies(ColumnFamily data, IndexClause clause,
 IndexExpression first)

     {

     for (IndexExpression expression : clause.expressions)

     {

     // (we can skip first since we already know it's satisfied)

     if (expression == first)

     continue;

     // check column data vs expression

     IColumn column = data.getColumn(expression.column_name);

     if (column == null)

     return false;

     int v = data.getComparator().compare(column.value(),
 expression.value);

     if (!satisfies(v, expression.op))

     return false;

     }

     return true;

     }





 The line 1608 should be changed from:

     int v = data.getComparator().compare(column.value(),
 expression.value);



 to

     int v = data.metadata().getValueValidator
 (expression.column_name).compare(column.value(), expression.value);







 greetings roland





 Von: Roland Gude [mailto:roland.g...@yoochoose.com]
 Gesendet: Mittwoch, 16. März 2011 14:50
 An: user@cassandra.apache.org
 Betreff: AW: AW: problems while TimeUUIDType-index-querying with two
 expressions



 Hi Aaron,



 now I am completely confused.

 The code that did not work for days now – like a miracle – works even
 against the unpatched Cassandra 0.7.3 but the testcase still does not…

 There seems to be some randomness in whether it works or not (which is a bad
 sign I think)… I will debug a little deeper into this and report anything I
 find.



 Greetings,

 roland



 Von: aaron morton [mailto:aa...@thelastpickle.com]
 Gesendet: Mittwoch, 16. März 2011 01:15
 An: user@cassandra.apache.org
 Betreff: Re: AW: problems while TimeUUIDType-index-querying with two
 expressions



 Have attached a patch
 to https://issues.apache.org/jira/browse/CASSANDRA-2328



 Can you give it a try ? You should not get a InvalidRequestException when
 you send an invalid name or value in the query expression.



 Aaron



 On 16 Mar 2011, at 10:30, aaron morton wrote:



 Will have the Jira I created finished soon, it's a legitimate issue we
 should be validating the column names and values when a ger_indexed_slice()
 request is sent. The error in your original email shows that.



 WRT your code example. You are using the TimeUUID Validator for the column
 name when creating the index expression, but are using a string serialiser
 for the value...

 IndexedSlicesQueryString, UUID, String indexQuery = HFactory
     .createIndexedSlicesQuery(keyspace,
    stringSerializer,
 UUID_SERIALIZER, stringSerializer);
         indexQuery.addEqualsExpression(MANDATOR_UUID, mandator);

 But your schema is saying it is a bytes type...



 column_metadata=[{column_name: --1000--,
 validation_class: BytesType, index_name: mandatorIndex, index_type: KEYS},
 {column_name: 0001--1000--, validation_class:
 BytesType, index_name: useridIndex, index_type: KEYS}];On 15 Mar 2011, at
 22:41,



 Once I have the patch can you apply it and run your test again ?



 You may also want to ask on the Hector list if it automagically check you
 are using the correct types when creating an IndexedSlicesQuery.



 Aaron



 Roland Gude wrote:



 Forgot to attach the source code… here it comes



 Von: Roland Gude [mailto:roland.g...@yoochoose.com]
 Gesendet: Dienstag, 15. März 2011 10:39
 An: user@cassandra.apache.org
 Betreff: AW: problems while TimeUUIDType-index-querying with two expressions



 Actually its not the column values that should be UUIDs in our case, but the
 column keys. The CF uses TimeUUID ordering and the values are just some
 ByteArrays. Even with changing the code to use UUIDSerializer instead of
 serializing the UUIDs manually the issue still exists.



 As far as I can see, there is 

Cassandra c++ client

2011-03-16 Thread Anurag Gujral
Hi All,
   Anyone knows about stable C++ client for cassandra?
Thanks
Anurag


Re: Cassandra c++ client

2011-03-16 Thread Primal Wijesekera
You could try this,

https://github.com/posulliv/libcassandra


- primal




From: Anurag Gujral anurag.guj...@gmail.com
To: user@cassandra.apache.org
Sent: Wed, March 16, 2011 9:36:25 PM
Subject: Cassandra c++ client

Hi All,
   Anyone knows about stable C++ client for cassandra?
Thanks
Anurag



  

Re: Cassandra c++ client

2011-03-16 Thread Narendra Sharma
libcassandra isn't vary active. Since we already has a object pool library,
we went for using raw thrift in C++ instead of using any other library.

Thanks,
Naren

On Wed, Mar 16, 2011 at 10:03 PM, Primal Wijesekera 
primalwijesek...@yahoo.com wrote:

 You could try this,

 https://github.com/posulliv/libcassandra

 - primal

 --
 *From:* Anurag Gujral anurag.guj...@gmail.com
 *To:* user@cassandra.apache.org
 *Sent:* Wed, March 16, 2011 9:36:25 PM
 *Subject:* Cassandra c++ client

 Hi All,
Anyone knows about stable C++ client for cassandra?
 Thanks
 Anurag