Effective allocation of multiple disks

2010-03-10 Thread Eric Rosenberry
Based on the documentation, it is clear that with Cassandra you want to have
one disk for commitlog, and one disk for data.

My question is: If you think your workload is going to require more io
performance to the data disks than a single disk can handle, how would you
recommend effectively utilizing additional disks?

It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
 If we use one for commitlog, is there a way to have Cassandra itself
equally split data across the three remaining disks?  Or is this something
that needs to be handled by the hardware level, or operating system/file
system level?

Options include a hardware RAID controller in a RAID 0 stripe (this is more
$$$ and for what gain?), or utilizing a volume manager like LVM.

Along those same lines, if you do implement some type of striping, what RAID
stripe size is recommended?  (I think Todd Burruss asked this earlier but I
did not see a response)

Thanks for any input!

-Eric


RE: Effective allocation of multiple disks

2010-03-10 Thread Stu Hood
You can list multiple DataFileDirectories, and Cassandra will scatter files 
across all of them. Use 1 disk for the commitlog, and 3 disks for data 
directories.

See http://wiki.apache.org/cassandra/CassandraHardware#Disk

Thanks,
Stu

-Original Message-
From: Eric Rosenberry epros...@gmail.com
Sent: Wednesday, March 10, 2010 2:00am
To: cassandra-user@incubator.apache.org
Subject: Effective allocation of multiple disks

Based on the documentation, it is clear that with Cassandra you want to have
one disk for commitlog, and one disk for data.

My question is: If you think your workload is going to require more io
performance to the data disks than a single disk can handle, how would you
recommend effectively utilizing additional disks?

It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
 If we use one for commitlog, is there a way to have Cassandra itself
equally split data across the three remaining disks?  Or is this something
that needs to be handled by the hardware level, or operating system/file
system level?

Options include a hardware RAID controller in a RAID 0 stripe (this is more
$$$ and for what gain?), or utilizing a volume manager like LVM.

Along those same lines, if you do implement some type of striping, what RAID
stripe size is recommended?  (I think Todd Burruss asked this earlier but I
did not see a response)

Thanks for any input!

-Eric




Re: Effective allocation of multiple disks

2010-03-10 Thread Eric Rosenberry
Ahh, thanks!  I had read that, but I had assumed the reference to use one
or more devices for DataFileDirectories was referring to somehow making
multiple physical devices into one logical device via some underlying RAID
system.

So then as far as free space on the disks go, I have seen references to
keeping utilization below 50% to handle compaction.  Would it not be true to
say that you only need as much free space as the to handle another copy of
the largest data file you have?  (i.e. perhaps less than 50% of the disk)

Due to the compaction space requirement, would it be more efficient to do
RAID 0 somewhere under the hood?

Just simply being able to specify multiple DataFileDirectories does does
indeed sound appealing...

Thanks.

-Eric

On Wed, Mar 10, 2010 at 12:08 AM, Stu Hood stu.h...@rackspace.com wrote:

 You can list multiple DataFileDirectories, and Cassandra will scatter files
 across all of them. Use 1 disk for the commitlog, and 3 disks for data
 directories.

 See http://wiki.apache.org/cassandra/CassandraHardware#Disk

 Thanks,
 Stu

 -Original Message-
 From: Eric Rosenberry epros...@gmail.com
 Sent: Wednesday, March 10, 2010 2:00am
 To: cassandra-user@incubator.apache.org
 Subject: Effective allocation of multiple disks

 Based on the documentation, it is clear that with Cassandra you want to
 have
 one disk for commitlog, and one disk for data.

 My question is: If you think your workload is going to require more io
 performance to the data disks than a single disk can handle, how would you
 recommend effectively utilizing additional disks?

 It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
  If we use one for commitlog, is there a way to have Cassandra itself
 equally split data across the three remaining disks?  Or is this something
 that needs to be handled by the hardware level, or operating system/file
 system level?

 Options include a hardware RAID controller in a RAID 0 stripe (this is more
 $$$ and for what gain?), or utilizing a volume manager like LVM.

 Along those same lines, if you do implement some type of striping, what
 RAID
 stripe size is recommended?  (I think Todd Burruss asked this earlier but I
 did not see a response)

 Thanks for any input!

 -Eric





CassandraHardware link on the wiki FrontPage

2010-03-10 Thread Eric Rosenberry
Would it be possible to add a link to the CassandraHardware page from the
FrontPage of the wiki?

I think other new folks to Cassandra may find it useful.  ;-)

(I would do it myself, though that page is Immutable)

http://wiki.apache.org/cassandra/FrontPage

http://wiki.apache.org/cassandra/CassandraHardware

Thanks!

-Eric


RE: CassandraHardware link on the wiki FrontPage

2010-03-10 Thread Stu Hood
Anyone can edit any page once they have an account: click the Login link at 
the top right next to the search box to create an account.

Thanks,
Stu

-Original Message-
From: Eric Rosenberry e...@rosenberry.org
Sent: Wednesday, March 10, 2010 2:52am
To: cassandra-user@incubator.apache.org
Subject: CassandraHardware link on the wiki FrontPage

Would it be possible to add a link to the CassandraHardware page from the
FrontPage of the wiki?

I think other new folks to Cassandra may find it useful.  ;-)

(I would do it myself, though that page is Immutable)

http://wiki.apache.org/cassandra/FrontPage

http://wiki.apache.org/cassandra/CassandraHardware

Thanks!

-Eric




Re: schema design question

2010-03-10 Thread Matteo Caprari
Well, I don't like clunky and I'm java friendly. I'll go for the abstract class.

Thanks for the help.

On Tue, Mar 9, 2010 at 7:33 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Tue, Mar 9, 2010 at 7:30 AM, Matteo Caprari matteo.capr...@gmail.com 
 wrote:
 On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis jbel...@gmail.com wrote:
 That's true.  So you'd want to use a custom comparator where first 64
 bits is the Long and the rest is the userid, for instance.

 (Long + something else is common enough that we might want to add it
 to the defaults...)

 What about using a SuperColumn for each like-count and then the list
 of users that hit that level?

 That would also work, it's just a little clunky pulling things out of
 a nested structure when really you want a flat list.  But if you are
 allergic to Java that is the way to go so you don't have to write a
 custom AbstractType subclass. :)

 -Jonathan




-- 
:Matteo Caprari
matteo.capr...@gmail.com


Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-10 Thread Sylvain Lebresne
Well, I've found the reason.
The default cassandra configuration use a 10% row cache.
And the row cache reads all the row each time. So it was indeed reading the
full row each time even though the request was asking for only one column.

My bad (at least I learned something).

--
Sylvain

On Tue, Mar 9, 2010 at 9:49 PM, Brandon Williams dri...@gmail.com wrote:
 On Tue, Mar 9, 2010 at 2:28 PM, Sylvain Lebresne sylv...@yakaz.com wrote:

  A row causes a disk seek while columns are contiguous.  So if the row
  isn't
  in the cache, you're being impaired by the seeks.  In general, fatter
  rows
  should be more performant than skinny ones.

 Sure, I understand that. Still, I get 400 columns by seconds (ie, 400
 seeks by
 seconds) when the rows only have one column by row, while I have 10
 columns
 by seconds when the row have 100 columns, even though I read only the
 first
 column.

 Doesn't that imply the disk is having to seek further for the rows with more
 columns?
 -Brandon


Login Failure Error

2010-03-10 Thread shirish
hello,

I have just download the source code from the trunk using svn, I have set up
the following configuration

Created a different user and group named cassandra
When i do *cassandra -f* the following is the output I get

 INFO 18:02:16,697 Auto DiskAccessMode determined to be standard
 INFO 18:02:16,995 Saved Token not found. Using
4812241153415237834436824812586788175
 INFO 18:02:17,008 Creating new commitlog segment
/u02/cassandra/commitlog/CommitLog-1268224337008.log
 INFO 18:02:17,105 Starting up server gossip
 INFO 18:02:17,163 Binding thrift service to localhost/127.0.0.1:9160
 INFO 18:02:17,169 Cassandra starting up...

and next i ran* cassandra-cli --host 127.0.0.1 --port 9160* I get the
following

Login failure. Did you specify 'keyspace', 'username' and 'password'?
Welcome to cassandra CLI.

Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.


What could have gone wrong ?


Re: Login Failure Error Attached to storage-conf.xml file

2010-03-10 Thread shirish
Shirish Reddy P
(Student)
Indian Institute Of Information Technology, Allahabad
Mob No. +919651418099



On Wed, Mar 10, 2010 at 6:16 PM, shirish shirishredd...@gmail.com wrote:

 hello,

 I have just download the source code from the trunk using svn, I have set
 up the following configuration

 Created a different user and group named cassandra
 When i do *cassandra -f* the following is the output I get

  INFO 18:02:16,697 Auto DiskAccessMode determined to be standard
  INFO 18:02:16,995 Saved Token not found. Using
 4812241153415237834436824812586788175
  INFO 18:02:17,008 Creating new commitlog segment
 /u02/cassandra/commitlog/CommitLog-1268224337008.log
  INFO 18:02:17,105 Starting up server gossip
  INFO 18:02:17,163 Binding thrift service to localhost/127.0.0.1:9160
  INFO 18:02:17,169 Cassandra starting up...

 and next i ran* cassandra-cli --host 127.0.0.1 --port 9160* I get the
 following

 Login failure. Did you specify 'keyspace', 'username' and 'password'?
 Welcome to cassandra CLI.

 Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.


 What could have gone wrong ?





!--
 ~ Licensed to the Apache Software Foundation (ASF) under one
 ~ or more contributor license agreements.  See the NOTICE file
 ~ distributed with this work for additional information
 ~ regarding copyright ownership.  The ASF licenses this file
 ~ to you under the Apache License, Version 2.0 (the
 ~ License); you may not use this file except in compliance
 ~ with the License.  You may obtain a copy of the License at
 ~
 ~http://www.apache.org/licenses/LICENSE-2.0
 ~
 ~ Unless required by applicable law or agreed to in writing,
 ~ software distributed under the License is distributed on an
 ~ AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 ~ KIND, either express or implied.  See the License for the
 ~ specific language governing permissions and limitations
 ~ under the License.
--
Storage
  !--==--
  !-- Basic Configuration  --
  !--==--

  !-- 
   ~ The name of this cluster.  This is mainly used to prevent machines in
   ~ one logical cluster from joining another.
  --
  ClusterNameTest Cluster/ClusterName

  !--
   ~ Turn on to make new [non-seed] nodes automatically migrate the right data 
   ~ to themselves.  (If no InitialToken is specified, they will pick one 
   ~ such that they will get half the range of the most-loaded node.)
   ~ If a node starts up without bootstrapping, it will mark itself bootstrapped
   ~ so that you can't subsequently accidently bootstrap a node with
   ~ data on it.  (You can reset this by wiping your data and commitlog
   ~ directories.)
   ~
   ~ Off by default so that new clusters and upgraders from 0.4 don't
   ~ bootstrap immediately.  You should turn this on when you start adding
   ~ new nodes to a cluster that already has data on it.  (If you are upgrading
   ~ from 0.4, start your cluster with it off once before changing it to true.
   ~ Otherwise, no data will be lost but you will incur a lot of unnecessary
   ~ I/O before your cluster starts up.)
  --
  AutoBootstrapfalse/AutoBootstrap

  !--
   ~ Keyspaces and ColumnFamilies:
   ~ A ColumnFamily is the Cassandra concept closest to a relational
   ~ table.  Keyspaces are separate groups of ColumnFamilies.  Except in
   ~ very unusual circumstances you will have one Keyspace per application.

   ~ There is an implicit keyspace named 'system' for Cassandra internals.
  --
  Keyspaces
Keyspace Name=Keyspace1
  !--
   ~ ColumnFamily definitions have one required attribute (Name)
   ~ and several optional ones.
   ~
   ~ The CompareWith attribute tells Cassandra how to sort the columns
   ~ for slicing operations.  The default is BytesType, which is a
   ~ straightforward lexical comparison of the bytes in each column.
   ~ Other options are AsciiType, UTF8Type, LexicalUUIDType, TimeUUIDType,
   ~ and LongType.  You can also specify the fully-qualified class
   ~ name to a class of your choice extending
   ~ org.apache.cassandra.db.marshal.AbstractType.
   ~ 
   ~ SuperColumns have a similar CompareSubcolumnsWith attribute.
   ~ 
   ~ BytesType: Simple sort by byte value.  No validation is performed.
   ~ AsciiType: Like BytesType, but validates that the input can be 
   ~parsed as US-ASCII.
   ~ UTF8Type: A string encoded as UTF8
   ~ LongType: A 64bit long
   ~ LexicalUUIDType: A 128bit UUID, compared lexically (by byte value)
   ~ TimeUUIDType: a 128bit version 1 UUID, compared by timestamp
   ~
   ~ (To get the closest approximation to 0.3-style supercolumns, you
   ~ would use CompareWith=UTF8Type CompareSubcolumnsWith=LongType.)
   ~
   ~ An optional `Comment` attribute may be used to attach additional
   ~ 

Re: Login Failure Error

2010-03-10 Thread Jonathan Ellis
Please don't use trunk unless you're actively fixing bugs.  If you
want the latest  greatest, get the 0.6 branch from svn.

On Wed, Mar 10, 2010 at 6:46 AM, shirish shirishredd...@gmail.com wrote:
 hello,

 I have just download the source code from the trunk using svn, I have set up
 the following configuration

 Created a different user and group named cassandra
 When i do cassandra -f the following is the output I get

  INFO 18:02:16,697 Auto DiskAccessMode determined to be standard
  INFO 18:02:16,995 Saved Token not found. Using
 4812241153415237834436824812586788175
  INFO 18:02:17,008 Creating new commitlog segment
 /u02/cassandra/commitlog/CommitLog-1268224337008.log
  INFO 18:02:17,105 Starting up server gossip
  INFO 18:02:17,163 Binding thrift service to localhost/127.0.0.1:9160
  INFO 18:02:17,169 Cassandra starting up...

 and next i ran cassandra-cli --host 127.0.0.1 --port 9160 I get the
 following

 Login failure. Did you specify 'keyspace', 'username' and 'password'?
 Welcome to cassandra CLI.

 Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.


 What could have gone wrong ?







Re: Login Failure Error

2010-03-10 Thread shirish
Every thing ran fine using the stable release. I wanted to start
contributing and hence downloaded the source code. What could possibly be
giving this error ?

On Wed, Mar 10, 2010 at 6:49 PM, Jonathan Ellis jbel...@gmail.com wrote:

 Please don't use trunk unless you're actively fixing bugs.  If you
 want the latest  greatest, get the 0.6 branch from svn.

 On Wed, Mar 10, 2010 at 6:46 AM, shirish shirishredd...@gmail.com wrote:
  hello,
 
  I have just download the source code from the trunk using svn, I have set
 up
  the following configuration
 
  Created a different user and group named cassandra
  When i do cassandra -f the following is the output I get
 
   INFO 18:02:16,697 Auto DiskAccessMode determined to be standard
   INFO 18:02:16,995 Saved Token not found. Using
  4812241153415237834436824812586788175
   INFO 18:02:17,008 Creating new commitlog segment
  /u02/cassandra/commitlog/CommitLog-1268224337008.log
   INFO 18:02:17,105 Starting up server gossip
   INFO 18:02:17,163 Binding thrift service to localhost/127.0.0.1:9160
   INFO 18:02:17,169 Cassandra starting up...
 
  and next i ran cassandra-cli --host 127.0.0.1 --port 9160 I get the
  following
 
  Login failure. Did you specify 'keyspace', 'username' and 'password'?
  Welcome to cassandra CLI.
 
  Type 'help' or '?' for help. Type 'quit' or 'exit' to quit.
 
 
  What could have gone wrong ?
 
 
 
 
 



exception with python client

2010-03-10 Thread Matteo Caprari
Hi.

On Cassandra 0.6 beta-2

I have this schema:
Keyspace Name=KS
ColumnFamily Name=Users CompareWith=BytesType/
ColumnFamily Name=Items CompareWith=BytesType ColumnType=Super
CompareSubcolumnsWith=BytesType/

I'm trying the batch_mutate api using python:

socket = TSocket.TSocket(localhost, 9160)
transport = TTransport.TBufferedTransport(socket)
protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)
client = Cassandra.Client(protocol)
transport.open()

m = {
'exmpl_item_id': {
'Items': 
[Mutation(ColumnOrSuperColumn(super_column=SuperColumn('users',[Column('name','matteo')])))]
}}
client.batch_mutate('KS', m, ConsistencyLevel.ONE)

I get an exception, but it's a shy one and can't figure out what is
that I'm doing wrong.

Thanks.

Traceback (most recent call last):
  File test-migrate.py, line 23, in module
client.batch_mutate('KS', m, ConsistencyLevel.ONE)
  File 
/Users/dikappa/Documents/workspace/likelike/python/cassandra/Cassandra.py,
line 771, in batch_mutate
self.recv_batch_mutate()
  File 
/Users/dikappa/Documents/workspace/likelike/python/cassandra/Cassandra.py,
line 784, in recv_batch_mutate
(fname, mtype, rseqid) = self._iprot.readMessageBegin()
  File build/bdist.macosx-10.6-i386/egg/thrift/protocol/TBinaryProtocol.py,
line 126, in readMessageBegin
  File build/bdist.macosx-10.6-i386/egg/thrift/protocol/TBinaryProtocol.py,
line 203, in readI32
  File build/bdist.macosx-10.6-i386/egg/thrift/transport/TTransport.py,
line 58, in readAll
  File build/bdist.macosx-10.6-i386/egg/thrift/transport/TTransport.py,
line 155, in read
  File build/bdist.macosx-10.6-i386/egg/thrift/transport/TSocket.py,
line 94, in read
thrift.transport.TTransport.TTransportException: None




-- 
:Matteo Caprari
matteo.capr...@gmail.com


Re: exception with python client

2010-03-10 Thread Gary Dusbabek
On Wed, Mar 10, 2010 at 08:33, Matteo Caprari matteo.capr...@gmail.com wrote:

 protocol = TBinaryProtocol.TBinaryProtocolAccelerated(transport)
 client = Cassandra.Client(protocol)
 transport.open()

before attempting the mutation, try adding:

client.transport = transport

Gary.


RE: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-10 Thread David Dabbs
So did you disable the row cache entirely?

 From: Sylvain Lebresne 
 
 Well, I've found the reason.
 The default cassandra configuration use a 10% row cache.
 And the row cache reads all the row each time. So it was indeed reading
 the
 full row each time even though the request was asking for only one
 column.
 
 Sylvain




Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-10 Thread Sylvain Lebresne
 So did you disable the row cache entirely?

Yes (getting back reasonable performances).

 From: Sylvain Lebresne

 Well, I've found the reason.
 The default cassandra configuration use a 10% row cache.
 And the row cache reads all the row each time. So it was indeed reading
 the
 full row each time even though the request was asking for only one
 column.

 Sylvain





Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'

2010-03-10 Thread Jonathan Ellis
For the record, I note that no row cache is the default on
user-defined CFs; we include it in the sample configuration file as an
example only.

On Wed, Mar 10, 2010 at 9:58 AM, Sylvain Lebresne sylv...@yakaz.com wrote:
 So did you disable the row cache entirely?

 Yes (getting back reasonable performances).

 From: Sylvain Lebresne

 Well, I've found the reason.
 The default cassandra configuration use a 10% row cache.
 And the row cache reads all the row each time. So it was indeed reading
 the
 full row each time even though the request was asking for only one
 column.

 Sylvain






Re: cassandra 0.6.0 beta 2 download contains beta 1?

2010-03-10 Thread Eric Evans
On Tue, 2010-03-09 at 12:38 -0800, Omer van der Horst Jansen wrote:
 The apache-cassandra-0.6.0-beta2-bin.tar.gz download contains both these files
 in the apache-cassandra-0.6.0-beta2/lib directory:
 
  apache-cassandra-0.6.0-beta1.jar
  apache-cassandra-0.6.0-beta2.jar

Ugh, my bad. I must have failed to `clean' in between the aborted beta1
and beta2.

 Given the way the classpath is constructed, it's possible that anyone using
 this download is actually running beta 1 rather than beta 2...

The only difference between beta1 and beta2 was a couple of bytes worth
of version metadata, so it really wouldn't matter in this context.

-- 
Eric Evans
eev...@rackspace.com



Re: Effective allocation of multiple disks

2010-03-10 Thread Jonathan Ellis
Thanks for testing that, added a note to
http://wiki.apache.org/cassandra/CassandraHardware on stripe size.

On Wed, Mar 10, 2010 at 11:03 AM, B. Todd Burruss bburr...@real.com wrote:
 with the file sizes we're talking about with cassandra and other database
 products, the stripe size doesn't seem to matter.  i suppose there may be a
 modicum of overhead with a small stripe size, but i'm not sure.  mine is set
 to 128k, which produced the same results as 16k and 256k.

 i will say the number of drives within the RAID 0 setup does seem to matter.
  more you have the more parallelism you can get with a good RAID controller.

 Eric Rosenberry wrote:

 Based on the documentation, it is clear that with Cassandra you want to
 have one disk for commitlog, and one disk for data.

 My question is: If you think your workload is going to require more io
 performance to the data disks than a single disk can handle, how would you
 recommend effectively utilizing additional disks?

 It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
  If we use one for commitlog, is there a way to have Cassandra itself
 equally split data across the three remaining disks?  Or is this something
 that needs to be handled by the hardware level, or operating system/file
 system level?

 Options include a hardware RAID controller in a RAID 0 stripe (this is
 more $$$ and for what gain?), or utilizing a volume manager like LVM.

 Along those same lines, if you do implement some type of striping, what
 RAID stripe size is recommended?  (I think Todd Burruss asked this earlier
 but I did not see a response)

 Thanks for any input!

 -Eric



Re: schema design question

2010-03-10 Thread Jonathan Ellis
if you want to select stuff out w/ one query, then single CF is the
only sane choice

if not then 2 CFs may be more performant

On Wed, Mar 10, 2010 at 4:42 AM, Matteo Caprari
matteo.capr...@gmail.com wrote:
 I can't quite decide if to go with a flat schema, with keys repeated
 in different CFs
 or have one CF with nested supercolumns.

 I guess there is no straight answer here,  but what's a good reasoning
 about the choice?

 This two mutation maps should clarify my dillemma:

 deep_mutation_map = {
        'example_item': {
                'Items': [
                        Mutation(SuperColumn('details', [
                                Column('title', 'an article'),
                                Column('link', 'www.example.com')
                        ])),
                        Mutation(SuperColumn('likers', [
                                Column('user_1', 'xx'),
                                Column('user_2', 'xx')
                        ]))
                ]
        }
 }

 flat_mutation_map = {
        'example_item': {
                'Item_Info': [
                        Mutation(Column('title', 'an_article')),
                        Mutation(Column('link', 'www.example.com')),
                ],
                'Item_likers': [
                        Mutation(Column('user_1', 'xx')),
                        Mutation(Column('user_2', 'xx'))
                ]
        }
 }


 On Tue, Mar 9, 2010 at 7:33 PM, Jonathan Ellis jbel...@gmail.com wrote:
 On Tue, Mar 9, 2010 at 7:30 AM, Matteo Caprari matteo.capr...@gmail.com 
 wrote:
 On Tue, Mar 9, 2010 at 1:23 PM, Jonathan Ellis jbel...@gmail.com wrote:
 That's true.  So you'd want to use a custom comparator where first 64
 bits is the Long and the rest is the userid, for instance.

 (Long + something else is common enough that we might want to add it
 to the defaults...)

 What about using a SuperColumn for each like-count and then the list
 of users that hit that level?

 That would also work, it's just a little clunky pulling things out of
 a nested structure when really you want a flat list.  But if you are
 allergic to Java that is the way to go so you don't have to write a
 custom AbstractType subclass. :)

 -Jonathan




 --
 :Matteo Caprari
 matteo.capr...@gmail.com



Re: exception with python client

2010-03-10 Thread Matteo Caprari
There was indeed a very clear message in the logs.
I was missing the timestamp in the Column declaration.

Thanks

On Wed, Mar 10, 2010 at 3:42 PM, Eric Evans eev...@rackspace.com wrote:
 On Wed, 2010-03-10 at 14:33 +, Matteo Caprari wrote:
 I get an exception, but it's a shy one and can't figure out what is
 that I'm doing wrong.

 Thanks.

 Traceback (most recent call last):
   File test-migrate.py, line 23, in module
     client.batch_mutate('KS', m, ConsistencyLevel.ONE)
   File
 /Users/dikappa/Documents/workspace/likelike/python/cassandra/Cassandra.py,
 line 771, in batch_mutate
     self.recv_batch_mutate()
   File
 /Users/dikappa/Documents/workspace/likelike/python/cassandra/Cassandra.py,
 line 784, in recv_batch_mutate
     (fname, mtype, rseqid) = self._iprot.readMessageBegin()
   File
 build/bdist.macosx-10.6-i386/egg/thrift/protocol/TBinaryProtocol.py,
 line 126, in readMessageBegin
   File
 build/bdist.macosx-10.6-i386/egg/thrift/protocol/TBinaryProtocol.py,
 line 203, in readI32
   File
 build/bdist.macosx-10.6-i386/egg/thrift/transport/TTransport.py,
 line 58, in readAll
   File
 build/bdist.macosx-10.6-i386/egg/thrift/transport/TTransport.py,
 line 155, in read
   File build/bdist.macosx-10.6-i386/egg/thrift/transport/TSocket.py,
 line 94, in read
 thrift.transport.TTransport.TTransportException: None

 I believe this simply means that the read didn't return a response.
 Start by checking the cassandra logs to see if there are any exceptions,
 and double check your connection parameters, network setup, etc.

 --
 Eric Evans
 eev...@rackspace.com





-- 
:Matteo Caprari
matteo.capr...@gmail.com


Re: Hackathon?!?

2010-03-10 Thread Peter Chang
Sweet I'm in!

Is there going to be a more formal invite? If not, can we get the details on
where Digg is and where at Digg?

Peter

On Tue, Mar 9, 2010 at 9:28 PM, Dan Di Spaltro dan.dispal...@gmail.comwrote:

 Great, that would probably get us a lot more room.  Sweet, so its settled,
 we'll do it at Digg WHQ!

 On Tue, Mar 9, 2010 at 9:13 PM, Chris Goffinet goffi...@digg.com wrote:

 +1 from Digg if you wanna have it at our place as well, got the OK from
 the boss.

  -Chris

 On Mar 9, 2010, at 6:05 PM, Dan Di Spaltro wrote:

 Alright guys, we have settled on a date for the Cassandra meetup on...

 April 15th, better known as, Tax day!

 We can host it here at Cloudkick, unless a cooler startup wants to host
 it.

 http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=19
 http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=191499
 Potrero Ave San Francisco CA 94110

 Bottom line, it would be great to get some folks together and spend some
 time doing an intro, cover some deployments, data models and try to address
 all the other burning questions out there.

 We pushed it out from PyCON and hopefully settled on a good day, lets get
 a count for how many folks are interested!

 Thanks,

 On Tue, Feb 9, 2010 at 3:10 PM, Reuben Smith reuben.sm...@gmail.comwrote:

 I live in the city and I'd like to add my vote for an Intro to
 Cassandra night.

 Reuben

 On Tue, Feb 9, 2010 at 10:43 AM, Dan Di Spaltro dan.dispal...@gmail.com
 wrote:
  I think the tentative plans would be to push this out a bit farther
  away from PyCon, to get a bigger attendance.
 
  It sounds like an Intro to Cassandra would be a better theme; focus
  on the education piece.
 
  But it will happen! So stay tuned.
 
  On Tue, Feb 9, 2010 at 3:53 AM, Wayne Lewis wa...@lewisclan.org
 wrote:
 
  Hi Dan,
 
  Are you still planning for end of Feb?
 
  Please add me to the very interested list.
 
  Thanks!
  Wayne Lewis
 
 
  On Jan 26, 2010, at 8:42 PM, Dan Di Spaltro wrote:
 
  Would anyone be interested in a Cassandra hack-a-thon at the end of
  February in San Francisco?
 
  I think it would be great to get everyone together, since the last
  hack-a-thon was at the Twitter office back around OSCON time.   We
  could provide space in the Mission area or someone else could too,
 our
  office is in a pretty interesting area
 
  (
 http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=17
 ).
 
  Tell me what you guys think!
 
  --
  Dan Di Spaltro
 
 
 
 
 
  --
  Dan Di Spaltro
 




 --
 Dan Di Spaltro





 --
 Dan Di Spaltro



Re: Hackathon?!?

2010-03-10 Thread Jonathan Ellis
I'm in either way, but if we push it a week later then the twitter
guys could (a) make it and (b) pimp it at their own conference.

On Wed, Mar 10, 2010 at 12:26 AM, Jeff Hodges jhod...@twitter.com wrote:
 Ah, hell. Thought this was the first day. Can't make it.
 --
 Jeff

 On Mar 9, 2010 9:32 PM, Ryan King r...@twitter.com wrote:

 I'm already committed to talking about cassandra that day at our
 company's developer conference (chirp.twitter.com).

 -ryan

 On Tue, Mar 9, 2010 at 6:26 PM, Jeff Hodges jhod...@twitter.com wrote:
 I'm down.
 --
 Jeff

 ...


Re: Hackathon?!?

2010-03-10 Thread Chris Goffinet
I'll work on putting together the formal invite. Stay tuned.

-Chris

On Mar 10, 2010, at 9:54 AM, Peter Chang wrote:

 Sweet I'm in! 
 
 Is there going to be a more formal invite? If not, can we get the details on 
 where Digg is and where at Digg?
 
 Peter
 
 On Tue, Mar 9, 2010 at 9:28 PM, Dan Di Spaltro dan.dispal...@gmail.com 
 wrote:
 Great, that would probably get us a lot more room.  Sweet, so its settled, 
 we'll do it at Digg WHQ!
 
 On Tue, Mar 9, 2010 at 9:13 PM, Chris Goffinet goffi...@digg.com wrote:
 +1 from Digg if you wanna have it at our place as well, got the OK from the 
 boss.
 
 -Chris
 
 On Mar 9, 2010, at 6:05 PM, Dan Di Spaltro wrote:
 
 Alright guys, we have settled on a date for the Cassandra meetup on...
 
 April 15th, better known as, Tax day!
 
 We can host it here at Cloudkick, unless a cooler startup wants to host it.
 http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=19
 1499 Potrero Ave San Francisco CA 94110 
 
 Bottom line, it would be great to get some folks together and spend some 
 time doing an intro, cover some deployments, data models and try to address 
 all the other burning questions out there.
 
 We pushed it out from PyCON and hopefully settled on a good day, lets get a 
 count for how many folks are interested!
 
 Thanks,
 
 On Tue, Feb 9, 2010 at 3:10 PM, Reuben Smith reuben.sm...@gmail.com wrote:
 I live in the city and I'd like to add my vote for an Intro to
 Cassandra night.
 
 Reuben
 
 On Tue, Feb 9, 2010 at 10:43 AM, Dan Di Spaltro dan.dispal...@gmail.com 
 wrote:
  I think the tentative plans would be to push this out a bit farther
  away from PyCon, to get a bigger attendance.
 
  It sounds like an Intro to Cassandra would be a better theme; focus
  on the education piece.
 
  But it will happen! So stay tuned.
 
  On Tue, Feb 9, 2010 at 3:53 AM, Wayne Lewis wa...@lewisclan.org wrote:
 
  Hi Dan,
 
  Are you still planning for end of Feb?
 
  Please add me to the very interested list.
 
  Thanks!
  Wayne Lewis
 
 
  On Jan 26, 2010, at 8:42 PM, Dan Di Spaltro wrote:
 
  Would anyone be interested in a Cassandra hack-a-thon at the end of
  February in San Francisco?
 
  I think it would be great to get everyone together, since the last
  hack-a-thon was at the Twitter office back around OSCON time.   We
  could provide space in the Mission area or someone else could too, our
  office is in a pretty interesting area
 
  (http://maps.google.com/maps/ms?ie=UTF8hl=enmsa=0msid=100290781618196563860.000478354937656785449z=17).
 
  Tell me what you guys think!
 
  --
  Dan Di Spaltro
 
 
 
 
 
  --
  Dan Di Spaltro
 
 
 
 
 -- 
 Dan Di Spaltro
 
 
 
 
 -- 
 Dan Di Spaltro
 



Re: cassandra 0.6.0 beta 2 download contains beta 1?

2010-03-10 Thread Vick Khera
On Wed, Mar 10, 2010 at 11:30 AM, Eric Evans eev...@rackspace.com wrote:
  apache-cassandra-0.6.0-beta1.jar
  apache-cassandra-0.6.0-beta2.jar

 Ugh, my bad. I must have failed to `clean' in between the aborted beta1
 and beta2.


The beta2 also does not include the other support jar files like
log4j.  Not being a java person, I didn't know what to do so I just
started my experimentation with the 0.5.1 release which has it all
bundled.


Re: NoSQL live tomorrow

2010-03-10 Thread Tim Haines
Hey Jonathan,

What event is this and will it be livecasted/recorded?

Cheers,

Tim.

On Thu, Mar 11, 2010 at 10:21 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Ryan King and I will have 20 minutes to talk about Cassandra in the
 Lab part of the program.

 20 minutes isn't enough to present a whole lot in a structured manner
 so we are planning to just do QA the whole time.  So if you are going
 to be there, come with your questions.

 I will also bring a few slides about 0.6 / 0.7 features to kick things
 off in case we have a slow start.

 -Jonathan



Re: Strategy to delete/expire keys in cassandra

2010-03-10 Thread Weijun Li
Hi Sylvain,

I applied your patch to 0.5 but it seems that it's not compilable:

1) column.getTtl() is no defined in RowMutation.java
public static RowMutation getRowMutation(String table, String key,
MapString, ListColumnOrSuperColumn cfmap)
{
RowMutation rm = new RowMutation(table, key.trim());
for (Map.EntryString, ListColumnOrSuperColumn entry :
cfmap.entrySet())
{
String cfName = entry.getKey();
for (ColumnOrSuperColumn cosc : entry.getValue())
{
if (cosc.column == null)
{
assert cosc.super_column != null;
for (org.apache.cassandra.service.Column column :
cosc.super_column.columns)
{
rm.add(new QueryPath(cfName, cosc.super_column.name,
column.name), column.value, column.timestamp, column.getTtl());
}
}
else
{
assert cosc.super_column == null;
rm.add(new QueryPath(cfName, null, cosc.column.name),
cosc.column.value, cosc.column.timestamp, cosc.column.getTtl());
}
}
}
return rm;
}

2) CassandraServer.java: Column.setTtl() is not defined.
if (column instanceof ExpiringColumn)
{
thrift_column.setTtl(((ExpiringColumn)
column).getTimeToLive());
}

3) CliClient.java: type mismatch for ColumnParent
thriftClient_.insert(tableName, key, new ColumnParent(columnFamily,
superColumnName),
 new Column(columnName, value.getBytes(),
System.currentTimeMillis()), ConsistencyLevel.ONE);

It seems that the patch doesn't add getTtl()/setTtl() stuff to Column.java?

Thanks,
-Weijun

-Original Message-
 From: Sylvain Lebresne [mailto:sylv...@yakaz.com]
 Sent: Thursday, February 25, 2010 2:23 AM
 To: Weijun Li
 Cc: cassandra-user@incubator.apache.org
 Subject: Re: Strategy to delete/expire keys in cassandra

 Hi,

  Should I just run command (in Cassandra 0.5 source folder?) like:
  patch –p1 –i  0001-Add-new-ExpiringColumn-class.patch
  for all of the five patches in your ticket?

 Well, actually I lied. The patches were made for a version a little after
 0.5.
 If you really want to try, I attach a version of those patches that
 (should)
 work with 0.5 (There is only the 3 first patch, but the fourth one is for
 tests so not necessary per se). Apply them with your patch command.
 Still, to compile that you will have to regenerate the thrift java
 interface
 (with ant gen-thrift-java), but for that you will have to install the right
 svn revision of thrift (which is libthrift-r820831 for 0.5). And if you
 manage to make it work, you will have to digg in cassandra.thrift as it
 make
 change to it.

 In the end, remember that this is not an official patch yet and it *will
 not* make it in Cassandra in its current form. All I can tell you is that I
 need those expiring columns for quite some of my usage and I will do what I
 can to make this feature included if and when possible.

  Also what’s your opinion on extending ExpiringColumn to expire a key
  completely? Otherwise it will be difficult to track what are expired
  or old rows in Cassandra.

 I'm not sure how to make full rows (or even full superColumns for that
 matter) expire. What if you set a row to expire after some time and add new
 columns before this expiration ? Should you update the expiration of the
 row
 ? Which is to say that a row will expires when it's last column expire,
 which is almost what you get with expiring column.
 The one thing you may want though is that when all the columns of a row
 expire (or, to be precise, get physically deleted), the row itself is
 deleted. Looking at the code, I'm not convince this happen and I'm not sure
 why.

 --
 Sylvain




Re: NoSQL live tomorrow

2010-03-10 Thread Jonathan Ellis
http://nosqlboston.eventbrite.com/

don't know about recording / casting plans.

On Wed, Mar 10, 2010 at 3:25 PM, Tim Haines tmhai...@gmail.com wrote:
 Hey Jonathan,
 What event is this and will it be livecasted/recorded?
 Cheers,
 Tim.

 On Thu, Mar 11, 2010 at 10:21 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Ryan King and I will have 20 minutes to talk about Cassandra in the
 Lab part of the program.

 20 minutes isn't enough to present a whole lot in a structured manner
 so we are planning to just do QA the whole time.  So if you are going
 to be there, come with your questions.

 I will also bring a few slides about 0.6 / 0.7 features to kick things
 off in case we have a slow start.

 -Jonathan




Re: Effective allocation of multiple disks

2010-03-10 Thread Stu Hood
Yea, I suppose major compactions are the wildcard here. Nonetheless, the 
situation where you only have 1 SSTable should be very rare.

I'll open a ticket though, because we really ought to be able to utilize those 
disks more thoroughly, and I have some ideas there.


-Original Message-
From: Anthony Molinaro antho...@alumni.caltech.edu
Sent: Wednesday, March 10, 2010 3:38pm
To: cassandra-user@incubator.apache.org
Subject: Re: Effective allocation of multiple disks

This is incorrect, as discussed a few weeks ago.  I have a setup with multiple
disks, and as soon as compaction occurs all the data ends up on one disk.  If
you need the additional io, you will want raid0.  But simply listing multiple
DataFileDirectories will not work.

-Anthony

On Wed, Mar 10, 2010 at 02:08:13AM -0600, Stu Hood wrote:
 You can list multiple DataFileDirectories, and Cassandra will scatter files 
 across all of them. Use 1 disk for the commitlog, and 3 disks for data 
 directories.
 
 See http://wiki.apache.org/cassandra/CassandraHardware#Disk
 
 Thanks,
 Stu
 
 -Original Message-
 From: Eric Rosenberry epros...@gmail.com
 Sent: Wednesday, March 10, 2010 2:00am
 To: cassandra-user@incubator.apache.org
 Subject: Effective allocation of multiple disks
 
 Based on the documentation, it is clear that with Cassandra you want to have
 one disk for commitlog, and one disk for data.
 
 My question is: If you think your workload is going to require more io
 performance to the data disks than a single disk can handle, how would you
 recommend effectively utilizing additional disks?
 
 It would seem a number of vendors sell 1U boxes with four 3.5 inch disks.
  If we use one for commitlog, is there a way to have Cassandra itself
 equally split data across the three remaining disks?  Or is this something
 that needs to be handled by the hardware level, or operating system/file
 system level?
 
 Options include a hardware RAID controller in a RAID 0 stripe (this is more
 $$$ and for what gain?), or utilizing a volume manager like LVM.
 
 Along those same lines, if you do implement some type of striping, what RAID
 stripe size is recommended?  (I think Todd Burruss asked this earlier but I
 did not see a response)
 
 Thanks for any input!
 
 -Eric
 
 

-- 

Anthony Molinaro   antho...@alumni.caltech.edu




Re: Testing row cache feature in trunk: write should put record in cache

2010-03-10 Thread Jonathan Ellis
Thanks for that, Daniel.

I'm pretty heads down finishing off the last 0.6 issues right now, but
this is on my list to get to.

On Mon, Mar 8, 2010 at 1:25 PM, Daniel Kluesing d...@bluekai.com wrote:
 This is interesting for the use cases I'm looking at Cassandra for, so if 
 that offer still stands I'll take you up on it. I took a crack at it in 
 https://issues.apache.org/jira/browse/CASSANDRA-860 - also in large part to 
 get my feet wet with the code.

 -Original Message-
 From: Jonathan Ellis [mailto:jbel...@gmail.com]
 Sent: Tuesday, February 16, 2010 9:22 PM
 To: cassandra-user@incubator.apache.org
 Subject: Re: Testing row cache feature in trunk: write should put record in 
 cache

 ... tell you what, if you write the option-processing part in
 DatabaseDescriptor I will do the actual cache part. :)

 On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis jbel...@gmail.com wrote:
 https://issues.apache.org/jira/secure/CreateIssue!default.jspa, but
 this is pretty low priority for me.

 On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li weiju...@gmail.com wrote:
 Just tried to make quick change to enable it but it didn't work out :-(

    ColumnFamily cachedRow = cfs.getRawCachedRow(mutation.key());

     // What I modified
     if( cachedRow == null ) {
         cfs.cacheRow(mutation.key());
         cachedRow = cfs.getRawCachedRow(mutation.key());
     }

     if (cachedRow != null)
         cachedRow.addAll(columnFamily);

 How can I open a ticket for you to make the change (enable row cache write
 through with an option)?

 Thanks,
 -Weijun

 On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis jbel...@gmail.com wrote:
  On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li weiju...@gmail.com wrote:
  Just started to play with the row cache feature in trunk: it seems to
  be
  working fine so far except that for RowsCached parameter you need to
  specify
  number of rows rather than a percentage (e.g., 20% doesn't work).
 
  20% works, but it's 20% of the rows at server startup.  So on a fresh
  start that is zero.
 
  Maybe we should just get rid of the % feature...

 (Actually, it shouldn't be hard to update this on flush, if you want
 to open a ticket.)






problem with running simple example using cassandra-cli with 0.6.0-beta2

2010-03-10 Thread Bill Au
I am checking out 0.6.0-beta2 since I need the batch-mutate function.  I am
just trying to run the example is the cassandra-cli Wiki:

http://wiki.apache.org/cassandra/CassandraCli

Here is what I am getting:

cassandra set Keyspace1.Standard1['jsmith']['first'] = 'John'
Value inserted.
cassandra get Keyspace1.Standard1['jsmith']
= (column=6669727374, value=John, timestamp=1268261785077)
Returned 1 results.

The column name being returned by get (6669727374) does not match what is
set (first).  This is true for all column names.

cassandra set Keyspace1.Standard1['jsmith']['last'] = 'Smith'
Value inserted.
cassandra set Keyspace1.Standard1['jsmith']['age'] = '42'
Value inserted.
cassandra get Keyspace1.Standard1['jsmith']
= (column=6c617374, value=Smith, timestamp=1268262480130)
= (column=6669727374, value=John, timestamp=1268261785077)
= (column=616765, value=42, timestamp=1268262484133)
Returned 3 results.

Is this a problem in 0.6.0-beta2 or am I doing anything wrong?

Bill


Re: cassandra 0.6.0 beta 2 download contains beta 1?

2010-03-10 Thread Bill Au
I am building from source and found the same problem.  I manually copied all
the jar files from build/lib/jars to lib and that seems to do the trick.

Bill

On Wed, Mar 10, 2010 at 1:39 PM, Vick Khera vi...@khera.org wrote:

 On Wed, Mar 10, 2010 at 11:30 AM, Eric Evans eev...@rackspace.com wrote:
   apache-cassandra-0.6.0-beta1.jar
   apache-cassandra-0.6.0-beta2.jar
 
  Ugh, my bad. I must have failed to `clean' in between the aborted beta1
  and beta2.
 

 The beta2 also does not include the other support jar files like
 log4j.  Not being a java person, I didn't know what to do so I just
 started my experimentation with the 0.5.1 release which has it all
 bundled.



Re: problem with running simple example using cassandra-cli with 0.6.0-beta2

2010-03-10 Thread Brandon Williams
On Wed, Mar 10, 2010 at 5:09 PM, Bill Au bill.w...@gmail.com wrote:

 I am checking out 0.6.0-beta2 since I need the batch-mutate function.  I am
 just trying to run the example is the cassandra-cli Wiki:

 http://wiki.apache.org/cassandra/CassandraCli

 Here is what I am getting:

 cassandra set Keyspace1.Standard1['jsmith']['first'] = 'John'
 Value inserted.
 cassandra get Keyspace1.Standard1['jsmith']
 = (column=6669727374, value=John, timestamp=1268261785077)
 Returned 1 results.

 The column name being returned by get (6669727374) does not match what is
 set (first).  This is true for all column names.

 cassandra set Keyspace1.Standard1['jsmith']['last'] = 'Smith'
 Value inserted.
 cassandra set Keyspace1.Standard1['jsmith']['age'] = '42'
 Value inserted.
 cassandra get Keyspace1.Standard1['jsmith']
 = (column=6c617374, value=Smith, timestamp=1268262480130)
 = (column=6669727374, value=John, timestamp=1268261785077)
 = (column=616765, value=42, timestamp=1268262484133)
 Returned 3 results.

 Is this a problem in 0.6.0-beta2 or am I doing anything wrong?

 Bill


This is normal.  You've added the 'first', 'last', and 'age' columns to the
'jsmith' row, and then asked for the entire row, so you got all 3 columns
back.

-Brandon


Re: Strategy to delete/expire keys in cassandra

2010-03-10 Thread Weijun Li
Never mind. Figured out I forgot to compile thrift :)

Thanks,

-Weijun

On Wed, Mar 10, 2010 at 1:43 PM, Weijun Li weiju...@gmail.com wrote:

 Hi Sylvain,

 I applied your patch to 0.5 but it seems that it's not compilable:

 1) column.getTtl() is no defined in RowMutation.java
 public static RowMutation getRowMutation(String table, String key,
 MapString, ListColumnOrSuperColumn cfmap)
 {
 RowMutation rm = new RowMutation(table, key.trim());
 for (Map.EntryString, ListColumnOrSuperColumn entry :
 cfmap.entrySet())
 {
 String cfName = entry.getKey();
 for (ColumnOrSuperColumn cosc : entry.getValue())
 {
 if (cosc.column == null)
 {
 assert cosc.super_column != null;
 for (org.apache.cassandra.service.Column column :
 cosc.super_column.columns)
 {
 rm.add(new QueryPath(cfName,
 cosc.super_column.name, column.name), column.value, column.timestamp,
 column.getTtl());
 }
 }
 else
 {
 assert cosc.super_column == null;
 rm.add(new QueryPath(cfName, null, cosc.column.name),
 cosc.column.value, cosc.column.timestamp, cosc.column.getTtl());
 }
 }
 }
 return rm;
 }

 2) CassandraServer.java: Column.setTtl() is not defined.
 if (column instanceof ExpiringColumn)
 {
 thrift_column.setTtl(((ExpiringColumn)
 column).getTimeToLive());
 }

 3) CliClient.java: type mismatch for ColumnParent
 thriftClient_.insert(tableName, key, new ColumnParent(columnFamily,
 superColumnName),
  new Column(columnName, value.getBytes(),
 System.currentTimeMillis()), ConsistencyLevel.ONE);

 It seems that the patch doesn't add getTtl()/setTtl() stuff to Column.java?


 Thanks,
 -Weijun

  -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@yakaz.com]
 Sent: Thursday, February 25, 2010 2:23 AM
 To: Weijun Li
 Cc: cassandra-user@incubator.apache.org
 Subject: Re: Strategy to delete/expire keys in cassandra

 Hi,

  Should I just run command (in Cassandra 0.5 source folder?) like:
  patch –p1 –i  0001-Add-new-ExpiringColumn-class.patch
  for all of the five patches in your ticket?

 Well, actually I lied. The patches were made for a version a little after
 0.5.
 If you really want to try, I attach a version of those patches that
 (should)
 work with 0.5 (There is only the 3 first patch, but the fourth one is for
 tests so not necessary per se). Apply them with your patch command.
 Still, to compile that you will have to regenerate the thrift java
 interface
 (with ant gen-thrift-java), but for that you will have to install the
 right
 svn revision of thrift (which is libthrift-r820831 for 0.5). And if you
 manage to make it work, you will have to digg in cassandra.thrift as it
 make
 change to it.

 In the end, remember that this is not an official patch yet and it *will
 not* make it in Cassandra in its current form. All I can tell you is that
 I
 need those expiring columns for quite some of my usage and I will do what
 I
 can to make this feature included if and when possible.

  Also what’s your opinion on extending ExpiringColumn to expire a key
  completely? Otherwise it will be difficult to track what are expired
  or old rows in Cassandra.

 I'm not sure how to make full rows (or even full superColumns for that
 matter) expire. What if you set a row to expire after some time and add
 new
 columns before this expiration ? Should you update the expiration of the
 row
 ? Which is to say that a row will expires when it's last column expire,
 which is almost what you get with expiring column.
 The one thing you may want though is that when all the columns of a row
 expire (or, to be precise, get physically deleted), the row itself is
 deleted. Looking at the code, I'm not convince this happen and I'm not
 sure
 why.

 --
 Sylvain






Re: NoSQL live tomorrow

2010-03-10 Thread B. Todd Burruss

does anyone know if there is a plan for nosql seattle anytime soon?

Jonathan Ellis wrote:

http://nosqlboston.eventbrite.com/

don't know about recording / casting plans.

On Wed, Mar 10, 2010 at 3:25 PM, Tim Haines tmhai...@gmail.com wrote:
  

Hey Jonathan,
What event is this and will it be livecasted/recorded?
Cheers,
Tim.

On Thu, Mar 11, 2010 at 10:21 AM, Jonathan Ellis jbel...@gmail.com wrote:


Ryan King and I will have 20 minutes to talk about Cassandra in the
Lab part of the program.

20 minutes isn't enough to present a whole lot in a structured manner
so we are planning to just do QA the whole time.  So if you are going
to be there, come with your questions.

I will also bring a few slides about 0.6 / 0.7 features to kick things
off in case we have a slow start.

-Jonathan
  



Re: NoSQL live tomorrow

2010-03-10 Thread David Timothy Strauss
I will be at NoSQL LIve, but I have a client call for most of the lab part.

--Original Message--
From: Jonathan Ellis
To: cassandra-user@incubator.apache.org
ReplyTo: cassandra-user@incubator.apache.org
Subject: NoSQL live tomorrow
Sent: Mar 10, 2010 21:21

Ryan King and I will have 20 minutes to talk about Cassandra in the
Lab part of the program.

20 minutes isn't enough to present a whole lot in a structured manner
so we are planning to just do QA the whole time.  So if you are going
to be there, come with your questions.

I will also bring a few slides about 0.6 / 0.7 features to kick things
off in case we have a slow start.

-Jonathan




Re: Hackathon?!?

2010-03-10 Thread Chris Goffinet
We could do it on April 22 (1 week later), that's my birthday :-) What better 
way to celebrate haha.

-Chris

On Mar 10, 2010, at 9:58 AM, Jonathan Ellis wrote:

 I'm in either way, but if we push it a week later then the twitter
 guys could (a) make it and (b) pimp it at their own conference.
 
 On Wed, Mar 10, 2010 at 12:26 AM, Jeff Hodges jhod...@twitter.com wrote:
 Ah, hell. Thought this was the first day. Can't make it.
 --
 Jeff
 
 On Mar 9, 2010 9:32 PM, Ryan King r...@twitter.com wrote:
 
 I'm already committed to talking about cassandra that day at our
 company's developer conference (chirp.twitter.com).
 
 -ryan
 
 On Tue, Mar 9, 2010 at 6:26 PM, Jeff Hodges jhod...@twitter.com wrote:
 I'm down.
 --
 Jeff
 
 ...



Re: problem with running simple example using cassandra-cli with 0.6.0-beta2

2010-03-10 Thread Jonathan Ellis
I think he means how the column names are rendered as bytes but the
values are strings.

On Wed, Mar 10, 2010 at 5:22 PM, Brandon Williams dri...@gmail.com wrote:
 On Wed, Mar 10, 2010 at 5:09 PM, Bill Au bill.w...@gmail.com wrote:

 I am checking out 0.6.0-beta2 since I need the batch-mutate function.  I
 am just trying to run the example is the cassandra-cli Wiki:

 http://wiki.apache.org/cassandra/CassandraCli

 Here is what I am getting:

 cassandra set Keyspace1.Standard1['jsmith']['first'] = 'John'
 Value inserted.
 cassandra get Keyspace1.Standard1['jsmith']
 = (column=6669727374, value=John, timestamp=1268261785077)
 Returned 1 results.

 The column name being returned by get (6669727374) does not match what is
 set (first).  This is true for all column names.

 cassandra set Keyspace1.Standard1['jsmith']['last'] = 'Smith'
 Value inserted.
 cassandra set Keyspace1.Standard1['jsmith']['age'] = '42'
 Value inserted.
 cassandra get Keyspace1.Standard1['jsmith']
 = (column=6c617374, value=Smith, timestamp=1268262480130)
 = (column=6669727374, value=John, timestamp=1268261785077)
 = (column=616765, value=42, timestamp=1268262484133)
 Returned 3 results.

 Is this a problem in 0.6.0-beta2 or am I doing anything wrong?

 Bill

 This is normal.  You've added the 'first', 'last', and 'age' columns to the
 'jsmith' row, and then asked for the entire row, so you got all 3 columns
 back.
 -Brandon


Re: Effective allocation of multiple disks

2010-03-10 Thread Jonathan Ellis
On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro
antho...@alumni.caltech.edu wrote:
 I would almost
 recommend just keeping things simple and removing multiple data directories
 from the config altogether and just documenting that you should plan on using
 OS level mechanisms for growing diskspace and io.

I think that is a pretty sane suggestion actually.

-Jonathan


Strategies for storing lexically ordered data in supercolumns

2010-03-10 Thread Peter Chang
I'm wondering about good strategies for picking keys that I want to be
lexically sorted in a super column family. For example, my data looks like
this:

[user1_uuid][connections][some_key_for_user2] = 
[user1_uuid][connections][some_key_for_user3] = 

I was thinking that I wanted some_key_for_user2 to be sorted by a user's
name. So I was thinking I set the subcolumn compareWith to UTF8Type or
BytesType and construct a key

[user's lastname + user's firstname + user's uuid]

This would result in sorted subcolumn and user list. That's fine. But I
wonder what would happen if, say, a user changes their last name. Happens
rarely but I imagine people getting married and modifying their name. Now
the sort is no longer correct. There seems to be some bad consequences to
creating keys based on data that can change.

So what is the general (elegant, easy to maintain) strategy here? Always
sort in your server-side code and don't bother trying to have the data
sorted?

I'm a cassandra noob with all my experience in relational DBMS.

TIA
Pete