Re: reads are slow
On Tue, Feb 23, 2010 at 10:06 AM, Jonathan Ellis jbel...@gmail.com wrote: the standard workaround is to change your data model to use non-super columns instead. supercolumns are really only for relatively small numbers of subcolumns until 598 is addressed. is there any limit on the number of supercolumns i can have?
reads are slow
to load 100 columns of a super column it takes over a second. how to improve this performance. i am using cassandra 0.5 version. output of nodeprobe info 29814395632524962303611017038378268216 Load : 9.18 GB Generation No: 1266945238 Uptime (seconds) : 638131 Heap Memory (MB) : 47.55 / 10237.94 the full config file is here http://pastie.org/838916 I have given 10GB RAM in cassandra.in.sh. -Xmx10G \ i have increased KeysCachedFraction to 0.04. i have two different drives for commitlog and data directoy. i have about 3 million rows. what can i do to improve read speed? thanks a lot!
Re: reads are slow
i dont think so /dev/sdc1 is the commitlog drive, and /dev/sdd1 is the data directory. http://pastie.org/838943 On Tue, Feb 23, 2010 at 9:35 AM, Jonathan Ellis jbel...@gmail.com wrote: are you i/o bound? http://spyced.blogspot.com/2010/01/linux-performance-basics.html On Tue, Feb 23, 2010 at 11:33 AM, kevin kevincastigli...@gmail.com wrote: to load 100 columns of a super column it takes over a second. how to improve this performance. i am using cassandra 0.5 version. output of nodeprobe info 29814395632524962303611017038378268216 Load : 9.18 GB Generation No: 1266945238 Uptime (seconds) : 638131 Heap Memory (MB) : 47.55 / 10237.94 the full config file is here http://pastie.org/838916 I have given 10GB RAM in cassandra.in.sh. -Xmx10G \ i have increased KeysCachedFraction to 0.04. i have two different drives for commitlog and data directoy. i have about 3 million rows. what can i do to improve read speed? thanks a lot!
Re: reads are slow
On Tue, Feb 23, 2010 at 9:51 AM, Brandon Williams dri...@gmail.com wrote: On Tue, Feb 23, 2010 at 11:33 AM, kevin kevincastigli...@gmail.comwrote: I have given 10GB RAM in cassandra.in.sh. -Xmx10G \ i have increased KeysCachedFraction to 0.04. i have two different drives for commitlog and data directoy. i have about 3 million rows. what can i do to improve read speed? thanks a lot! Since 0.5 doesn't have row caching, 10G is probably too much to give the JVM and is hampering the OS cache. Try between 2-6GB instead and see if that helps. the machine has 24GB of RAM, and about 4G used and 17G cached. http://pastie.org/838979 do you still think reducing the 10G will help? it seems to be a little counter-intuitive. i assumed giving JVM more RAM would help it! what is row caching? how do i enable it? thanks
Re: reads are slow
On Tue, Feb 23, 2010 at 10:06 AM, Jonathan Ellis jbel...@gmail.com wrote: the standard workaround is to change your data model to use non-super columns instead. is there any limit on the number of standard column families that i can set up on cassandra?
Re: reads are slow
On Tue, Feb 23, 2010 at 11:49 AM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Feb 23, 2010 at 12:12 PM, kevin kevincastigli...@gmail.com wrote: On Tue, Feb 23, 2010 at 10:07 AM, Jonathan Ellis jbel...@gmail.com wrote: you enable row caching by upgrading to 0.6. :) where can i get 0.6 from? svn trunk? svn branches/cassandra-0.6 like I said, we're voting on a beta today, so you should probably just wait for that. thanks! i saw the changes and it says batch_insert is being deprecated, and a new command batch_mutate is coming in. can you guys add example in the wiki API page! * add batch_mutate thrift command, deprecating batch_insert (CASSANDRA-336) http://wiki.apache.org/cassandra/API thanks a lot
cassandra browser
hi guys is there an admin tool (like pgadmin III) to browse the data stored in cassandra? thanks
Re: [VOTE] Graduation
+23 On Mon, Jan 25, 2010 at 2:50 PM, Chris Goffinet goffi...@digg.com wrote: +1 --- Chris Goffinet goffi...@digg.com On Jan 25, 2010, at 1:11 PM, Eric Evans wrote: There was some additional discussion[1] concerning Cassandra's graduation on the incubator list, and as a result we've altered the initial resolution to expand the size of the PMC by three to include our active mentors (new draft attached). I propose a vote for Cassandra's graduation to a top-level project. We'll leave this open for 72 hours, and assuming it passes, we can then take it to a vote with the Incubator PMC. +1 from me! [1] http://thread.gmane.org/gmane.comp.apache.incubator.general/24427 -- Eric Evans eev...@rackspace.com cassandra-resolution.txt
Re: disk space and load after removing data
i did nodeprobe -host localhost flush Keyspace1 nodeprobe -host localhost compact nodeprobe -host localhost cleanup and there is no change in the disk usage or Load info On Sat, Jan 23, 2010 at 11:31 PM, kevin kevincastigli...@gmail.com wrote: after deleting a lot of column families, column the disk space used by cassandra and the Load reported by nodeprobe info is not going down at all. is there something i have to do recover the disk space? also what about the Load reported by nodeprobe info? thanks
Re: disk space and load after removing data
any help here guys? thanks a lot! On Sun, Jan 24, 2010 at 10:51 AM, kevin kevincastigli...@gmail.com wrote: i did nodeprobe -host localhost flush Keyspace1 nodeprobe -host localhost compact nodeprobe -host localhost cleanup and there is no change in the disk usage or Load info On Sat, Jan 23, 2010 at 11:31 PM, kevin kevincastigli...@gmail.comwrote: after deleting a lot of column families, column the disk space used by cassandra and the Load reported by nodeprobe info is not going down at all. is there something i have to do recover the disk space? also what about the Load reported by nodeprobe info? thanks
Re: multiget_slice failed: unknown result
i also get this error randomly! data = client.multiget_slice(keyspace, keys, column_parent, predicate, ConsistencyLevel.ONE) File /home/work/common/lazyboy/connection.py, line 109, in func return getattr(client, attr).__call__(*args, **kwargs) File /home/work/common/cassandra/Cassandra.py, line 300, in multiget_slice return self.recv_multiget_slice() File /home/work/common/cassandra/Cassandra.py, line 322, in recv_multiget_slice result.read(self._iprot) File /home/work/common/cassandra/Cassandra.py, line 1699, in read fastbinary.decode_binary(self, iprot.trans, (self.__class__, self.thrift_spec)) TypeError: got wrong ttype while reading field while reading upstream, On Mon, Jan 18, 2010 at 12:02 PM, kevin kevincastigli...@gmail.com wrote: hi guys i keep getting this error pretty sporadically. how to fix this? i am using cassandra 0.5rc3 thanks! data = client.multiget_slice(keyspace, keys, column_parent, predicate, ConsistencyLevel.ONE) File /home/work/common/lazyboy/connection.py, line 116, in func raise exc.ErrorThriftMessage(message) ErrorThriftMessage: multiget_slice failed: unknown result while reading upstream,
refill claimed to have refilled the buffer, but didn't
hi i get this this error sporadically. how to figure out what is going on? thanks column_families = client.get_slice(keyspace, key, column_parent, predicate, ConsistencyLevel.ONE) File /home/work/common/lazyboy/connection.py, line 109, in func return getattr(client, attr).__call__(*args, **kwargs) File /home/work/common/cassandra/Cassandra.py, line 214, in get_slice return self.recv_get_slice() File /home/work/common/cassandra/Cassandra.py, line 236, in recv_get_slice result.read(self._iprot) File /home/work/common/cassandra/Cassandra.py, line 1276, in read fastbinary.decode_binary(self, iprot.trans, (self.__class__, self.thrift_spec)) TypeError: refill claimed to have refilled the buffer, but didn't!!
Re: Thrift 0.2.0 release
is this patch available? On Wed, Dec 16, 2009 at 1:55 PM, Jake Luciani jak...@gmail.com wrote: Would you accept a patch that fixes the Cassandra impl? Least we can do. On Dec 16, 2009, at 4:37 PM, Jonathan Ellis jbel...@gmail.com wrote: 0.2 shipped with a regression that breaks Cassandra's internal use of thrift (https://issues.apache.org/jira/browse/THRIFT-529), so Cassandra devs cannot easily upgrade to the 0.2 compiler, but on-the-wire compatibility should be fine from a client perspective. -Jonathans On Wed, Dec 16, 2009 at 3:32 PM, Anthony Molinaro antho...@alumni.caltech.edu wrote: Hi, So I seem to recall some sort of issues with the 0.2.0 release of thrift and cassandra. Were those ever resolved? Is it safe to use them together? Has anyone tested them? Thanks, -Anthony -- Anthony Molinaro antho...@alumni.caltech.edu
Re: Jsondra - Start of an http/json interface for Cassandra using tornado and lazyboy
thanks for jsondra, it looks super exciting. is it possible to read an entire super_column_family with jsondra? thanks! On Wed, Dec 9, 2009 at 7:40 PM, Joseph Bowman bowman.jos...@gmail.comwrote: There was some concern about the approach I had taken with Jsondra, basically, that it wasn't really an HTTP interface as everything was done with GET/POST and verbage was included in the url. I've rewritten it, it uses the proper HTTP GET/PUT/POST/DELETE methods. Urls map to keys, ie http://localhost/keyspace/columnfamily/key/ The server, and node.js client, are functional now. I'll be adding the ability for batch operations soon. By the way, this really is pretty darn simple, with all the functionality being provided by Cassandra for the most part. Awesome job guys. On Sat, Dec 5, 2009 at 10:55 PM, Joseph Bowman bowman.jos...@gmail.comwrote: I'm trying to wrap my head around node.js right now. Basically, trying to figure out it's event model to create a nonblocking client to Jsondra. The one thing I loved about Lazyboy was it was easy to make it so Jsondra could have the json interface, but everything is stored as individual columns which should make the data easier to interact with for other applications that may have to interact with the same dataset. I wrote what currently exists for Jsondra in about and hour and a half, and am about 6 hours into trying to figure node.js. Go figure. On Sat, Dec 5, 2009 at 10:49 PM, Rich Atkinson atkins...@gmail.comwrote: That is cool! Having played around with couchdb a bit, it does have some great features that will help it's adoption; most notably the json/http API. I think being REST-like provides a familiar, cosy environment; although couch does seem to force you to map/reduce everything. Conceptually, something along the lines of node.js could be an awesome fit for cassandra. Rich On Sun, Dec 6, 2009 at 11:29 AM, Jonathan Ellis jbel...@gmail.com wrote: Cool, I know several people have mentioned wanting cassandra-over-http. On Sat, Dec 5, 2009 at 1:37 PM, Joseph Bowman bowman.jos...@gmail.com wrote: Hi everyone, I wanted to take a look at node.js as an alternative to tornado for an application idea I'm working on, and since I didn't see a real javascript interface for thrift, I threw together this Jsondra app really quick. It uses tornado and lazyboy basically because I was already using them, and was the quickest to implement. Currently it supports get/put/delete for individual keys only. It returns json, and you must submit json encoded values for put requests. Just threw it together this morning in order to play with node.js. I'll probably only update it on a need to basis, but thought I'd throw it out there in case anyone else might find it useful. It's Apache licensed, same as tornado. I believe if I understand Digg's license for lazyboy, everything is in compliance license wise. Here's the URL - http://github.com/joerussbowman/jsondra
Re: Jsondra - Start of an http/json interface for Cassandra using tornado and lazyboy
i dont think lazyboy supports it eventhough cassandra supports it. im using cassdict python client which supports it and it is working great. can you please check it to see if it is a better fit for jsondra instead of lazyboy? thanks! On Fri, Dec 11, 2009 at 2:33 PM, Joseph Bowman bowman.jos...@gmail.comwrote: You mean pull a super column and everything under it? Not sure, I'll see if I can test it when I get a chance sometime this weekend. The quick answer is that jsondra supports what lazyboy supports. At the moment it uses record objects for everything, support for recordsets and views will come later. On Dec 11, 2009 5:06 PM, kevin kevincastigli...@gmail.com wrote: thanks for jsondra, it looks super exciting. is it possible to read an entire super_column_family with jsondra? thanks! On Wed, Dec 9, 2009 at 7:40 PM, Joseph Bowman bowman.jos...@gmail.com wrote: There was some...
Re: urgent: missing data!
thanks jbellis for the fix https://issues.apache.org/jira/browse/CASSANDRA-583 On Tue, Nov 24, 2009 at 6:24 PM, kevin kevincastigli...@gmail.com wrote: On Tue, Nov 24, 2009 at 6:05 PM, kevin kevincastigli...@gmail.com wrote: hi guys i have been using cassandra this version Path: . URL: http://svn.apache.org/repos/asf/incubator/cassandra/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 831540 Node Kind: directory Schedule: normal Last Changed Author: jbellis Last Changed Rev: 831433 Last Changed Date: 2009-10-30 12:45:38 -0700 (Fri, 30 Oct 2009) ./bin/nodeprobe -host localhost info 29814395632524962303611017038378268216 Load : 753.64 MB Generation No: 1259113951 Uptime (seconds) : 443 Heap Memory (MB) : 121.32 / 12285.94 i just have one server. and the config is here http://pastie.org/713889 all of a sudden super columns stored in the column family Supern1 are disappearing. I tried flushing the cassandra node and starting again, and still the same problem. Any suggestions how to figure out what the problem is and to retrieve back the data? The size of the cassandra data directory hasnt reduced. thanks a lot! this is the settings in file cassandra.in.sh http://pastie.org/713922
urgent: missing data!
hi guys i have been using cassandra this version Path: . URL: http://svn.apache.org/repos/asf/incubator/cassandra/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 831540 Node Kind: directory Schedule: normal Last Changed Author: jbellis Last Changed Rev: 831433 Last Changed Date: 2009-10-30 12:45:38 -0700 (Fri, 30 Oct 2009) ./bin/nodeprobe -host localhost info 29814395632524962303611017038378268216 Load : 753.64 MB Generation No: 1259113951 Uptime (seconds) : 443 Heap Memory (MB) : 121.32 / 12285.94 i just have one server. and the config is here http://pastie.org/713889 all of a sudden super columns stored in the column family Supern1 are disappearing. I tried flushing the cassandra node and starting again, and still the same problem. Any suggestions how to figure out what the problem is and to retrieve back the data? The size of the cassandra data directory hasnt reduced. thanks a lot!
Re: urgent: missing data!
On Tue, Nov 24, 2009 at 6:05 PM, kevin kevincastigli...@gmail.com wrote: hi guys i have been using cassandra this version Path: . URL: http://svn.apache.org/repos/asf/incubator/cassandra/trunk Repository Root: http://svn.apache.org/repos/asf Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68 Revision: 831540 Node Kind: directory Schedule: normal Last Changed Author: jbellis Last Changed Rev: 831433 Last Changed Date: 2009-10-30 12:45:38 -0700 (Fri, 30 Oct 2009) ./bin/nodeprobe -host localhost info 29814395632524962303611017038378268216 Load : 753.64 MB Generation No: 1259113951 Uptime (seconds) : 443 Heap Memory (MB) : 121.32 / 12285.94 i just have one server. and the config is here http://pastie.org/713889 all of a sudden super columns stored in the column family Supern1 are disappearing. I tried flushing the cassandra node and starting again, and still the same problem. Any suggestions how to figure out what the problem is and to retrieve back the data? The size of the cassandra data directory hasnt reduced. thanks a lot! this is the settings in file cassandra.in.sh http://pastie.org/713922
Re: Cassandra Database Format Compatibility
On Mon, Nov 23, 2009 at 3:27 PM, Brandon Williams dri...@gmail.com wrote: On Mon, Nov 23, 2009 at 4:47 PM, Jon Graham sjclou...@gmail.com wrote: Is there an estimated release date for the 0.5 release? Beta should happen fairly soon -- it's up for a vote in the IPMC right now. what does ipmc mean?
error when doing nodeprobe flush
i get this error on cassandra when i do nodeprobe flush. nodeprobe -host x.x.x.x flush 11:36:53,008 ERROR DebuggableThreadPoolExecutor:120 - Error in executor futuretask java.util.concurrent.ExecutionException: java.lang.AssertionError at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:112) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.AssertionError at org.apache.cassandra.db.BinaryMemtable.getSortedKeys(BinaryMemtable.java:127) at org.apache.cassandra.db.ColumnFamilyStore$2.run(ColumnFamilyStore.java:947) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) ... 2 more am running svn version, Revision: 831540 Node Kind: directory Schedule: normal Last Changed Author: jbellis Last Changed Rev: 831433 Last Changed Date: 2009-10-30 12:45:38 -0700 (Fri, 30 Oct 2009)
Re: error when doing nodeprobe flush
On Wed, Nov 18, 2009 at 11:49 AM, Jonathan Ellis jbel...@gmail.com wrote: This happens when you flush and there was no data in the binarymemtable. It's harmless (everything that does have data, still gets flushed). thanks for the info. Since you're running from svn you can update to the latest 0.4 code though which fixes the exception. can you tell how to upgrade. is the data everything compatible? do i just stop and start cassandra? thanks
Re: [VOTE] Website
looks gr8! +1 On Wed, Nov 11, 2009 at 2:35 PM, Ryan King r...@twitter.com wrote: Looks great. +1 -ryan On Wed, Nov 11, 2009 at 2:22 PM, Johan Oskarsson jo...@oskarsson.nu wrote: +1. A great step forward from the current version and a good base to improve upon. /Johan Eric Evans wrote: The current website is quite ugly, and I don't know about you, but I'm itching to put the new project logo to use, so I'd like to propose publishing http://cassandra.deadcafe.org (to http://incubator.apache.org/cassandra). This is a slightly tweaked version of Daniel Lundin's work from CASSANDRA-231[1] (thanks Daniel!), and the content is nearly identical to what is on the current site. I do not consider this to be the final word on the matter, I think there is still much to be done. For example, the logo w/text is something I cobbled together in Gimp and should be considered a placeholder. Still, it's much better than what we currently have and we can incrementally improve it as we go forward. [1] https://issues.apache.org/jira/browse/CASSANDRA-231
Re: Incr/Decr Counters in Cassandra
counters in cassandra will be awesome! On Wed, Nov 4, 2009 at 1:32 PM, Chris Goffinet goffi...@digg.com wrote: Hey, At Digg we've been thinking about counters in Cassandra. In a lot of our use cases we need this type of support from a distributed storage system. Anyone else out there who has such needs as well? Zookeeper actually has such support and we might use that if we can't get the support in Cassandra. --- Chris Goffinet goffi...@digg.com
lazyboy: is it possible to get entire supercolumn family
is it possible to get all the super columns of a supercolumn family with lazyboy? from lazyboy import * from lazyboy.key import Key connection.add_pool('Keyspace1', ['localhost:9160']) x=Record() x.load(Key('Keyspace1','Supern1', 'user_id')) Traceback (most recent call last): File stdin, line 1, in module File /Users/mark/work/common/lazyboy/record.py, line 140, in load dict([(obj.column.name, obj.column) for obj in _slice])) AttributeError: 'NoneType' object has no attribute 'name'
Re: are columns of a supercolumn name sorted?
i got it, with what i specified the order of super columns is reversed client.get_slice(keyspace, key, ColumnParent(column_family=Super1), SlicePredicate(slice_range=SliceRange(start=, finish=,reversed=True)), ConsistencyLevel.ONE) is there anyway i can specify reversed here so that the order of columns in all the super columns are also reversed? if i set the super_column variable in the ColumnParent I can get the desired behavior of columns in reverse chronological order. On Sun, Oct 25, 2009 at 8:23 PM, kevin kevincastigli...@gmail.com wrote: thanks for the hint. i have inserted 3 columns into a super column with values 1, 2 and 3 in that order. but reversed variable in SliceRange has no effect on the order of column received. reversed set to either True or False in SliceRange i get the column in the same order. it is not getting reversed. Can you tell what is wrong here? import lazyboy,time, pprint,uuid from cassandra import Cassandra from cassandra.ttypes import * client=lazyboy.connection.Client(['localhost:9160']) keyspace = Keyspace1 key='mykeyx' column_path = ColumnPath(column_family=Super1,column=uuid.uuid1().bytes,super_column='sc_2') client.insert(keyspace, key, column_path, '1', time.time(), ConsistencyLevel.ONE); column_path = ColumnPath(column_family=Super1,column=uuid.uuid1().bytes,super_column='sc_2') client.insert(keyspace, key, column_path, '2', time.time(), ConsistencyLevel.ONE); column_path = ColumnPath(column_family=Super1,column=uuid.uuid1().bytes,super_column='sc_2') client.insert(keyspace, key, column_path, '3', time.time(), ConsistencyLevel.ONE); client.get_slice(keyspace, key, ColumnParent(column_family=Super1), SlicePredicate(slice_range=SliceRange(start=, finish=,reversed=True)), ConsistencyLevel.ONE) [ColumnOrSuperColumn(column=None, super_column=SuperColumn(name='sc_2', columns=[Column(timestamp=1256527256, name='\x93T\x12\x1c\xc1\xde\x11\xde\xa2;\x00%K\xcb\xc7F', value='1'), Column(timestamp=1256527257, name='\x93\x95`\xc1\xde\x11\xde\xa4\x0c\x00%K\xcb\xc7F', value='2'), Column(timestamp=1256527257, name='\x93\xb1\x11T\xc1\xde\x11\xde\xae\xd7\x00%K\xcb\xc7F', value='3')]))] client.get_slice(keyspace, key, ColumnParent(column_family=Super1), SlicePredicate(slice_range=SliceRange(start=, finish=,reversed=False)), ConsistencyLevel.ONE) [ColumnOrSuperColumn(column=None, super_column=SuperColumn(name='sc_2', columns=[Column(timestamp=1256527256, name='\x93T\x12\x1c\xc1\xde\x11\xde\xa2;\x00%K\xcb\xc7F', value='1'), Column(timestamp=1256527257, name='\x93\x95`\xc1\xde\x11\xde\xa4\x0c\x00%K\xcb\xc7F', value='2'), Column(timestamp=1256527257, name='\x93\xb1\x11T\xc1\xde\x11\xde\xae\xd7\x00%K\xcb\xc7F', value='3')]))] On Sun, Oct 25, 2009 at 8:00 PM, Jonathan Ellis jbel...@gmail.com wrote: it's the column attribute of the column_path parameter. uuids have a specific meaning: http://en.wikipedia.org/wiki/Universally_Unique_Identifier test/system/test_server.py has an example of passing time uuids in python. On Sun, Oct 25, 2009 at 8:52 PM, kevin kevincastigli...@gmail.com wrote: i tried the TimeUUIDType and I get the error. can you tell me which here should be a UUID? what is time based uuid? and which parameter here should be uuid? thanks import lazyboy,time, pprint from cassandra import Cassandra from cassandra.ttypes import * client=lazyboy.connection.Client(['localhost:9160']) keyspace = Keyspace1 key='mykeyx' column_path = ColumnPath(column_family=Super1,column=x,super_column='sc_2') client.insert(keyspace, key, column_path, 'a', time.time(), ConsistencyLevel.ONE); Traceback (most recent call last): File stdin, line 1, in module File /Users/kevin/common/lazyboy/connection.py, line 109, in func raise e cassandra.ttypes.InvalidRequestException: InvalidRequestException(why='UUIDs must be exactly 16 bytes') this is the config in storage-conf.xml ColumnFamily CompareWith=BytesType Name=Standard1 FlushPeriodInMinutes=60/ ColumnFamily CompareWith=UTF8Type Name=Standard2/ ColumnFamily CompareWith=TimeUUIDType Name=StandardByUUID1/ ColumnFamily ColumnType=Super CompareWith=UTF8Type CompareSubcolumnsWith=TimeUUIDType Name=Super1/ On Sun, Oct 25, 2009 at 5:40 PM, Jonathan Ellis jbel...@gmail.com wrote: Sorry, the paragraph about subcolumns always being sorted by time is outdated. (I've taken it out on the 0.4 branch now -- it was already gone in trunk.) Read just below that about how comparewith and comparesubcolumnswith work. Sounds like using TimeUUIDType for subcolumns is what you want (with the reverse option to slice, to get most-recent-first). On Sun, Oct 25, 2009 at 5:17 PM, kevin kevincastigli...@gmail.com wrote: i am inserting three columns x,a,z
Re: are columns of a supercolumn name sorted?
On Mon, Oct 26, 2009 at 9:08 AM, Jonathan Ellis jbel...@gmail.com wrote: if you're slicing supercolumns, you're going to get all the subcolumns back anyway, so there's not much point in reversing them on the server correct me if i am wrong, but i can get only the count [100 default or more] number of subcolumns right? -- it just adds overhead. But you can iterate in reverse order client side w/ no performance penalty (since you don't actually have to re-order the list to do that). if i have a large number of subcolumns then it would be better to do on the server right? On Mon, Oct 26, 2009 at 9:51 AM, kevin kevincastigli...@gmail.com wrote: i got it, with what i specified the order of super columns is reversed client.get_slice(keyspace, key, ColumnParent(column_family=Super1), SlicePredicate(slice_range=SliceRange(start=, finish=,reversed=True)), ConsistencyLevel.ONE) is there anyway i can specify reversed here so that the order of columns in all the super columns are also reversed? if i set the super_column variable in the ColumnParent I can get the desired behavior of columns in reverse chronological order. On Sun, Oct 25, 2009 at 8:23 PM, kevin kevincastigli...@gmail.com wrote: thanks for the hint. i have inserted 3 columns into a super column with values 1, 2 and 3 in that order. but reversed variable in SliceRange has no effect on the order of column received. reversed set to either True or False in SliceRange i get the column in the same order. it is not getting reversed. Can you tell what is wrong here? import lazyboy,time, pprint,uuid from cassandra import Cassandra from cassandra.ttypes import * client=lazyboy.connection.Client(['localhost:9160']) keyspace = Keyspace1 key='mykeyx' column_path = ColumnPath(column_family=Super1,column=uuid.uuid1().bytes,super_column='sc_2') client.insert(keyspace, key, column_path, '1', time.time(), ConsistencyLevel.ONE); column_path = ColumnPath(column_family=Super1,column=uuid.uuid1().bytes,super_column='sc_2') client.insert(keyspace, key, column_path, '2', time.time(), ConsistencyLevel.ONE); column_path = ColumnPath(column_family=Super1,column=uuid.uuid1().bytes,super_column='sc_2') client.insert(keyspace, key, column_path, '3', time.time(), ConsistencyLevel.ONE); client.get_slice(keyspace, key, ColumnParent(column_family=Super1), SlicePredicate(slice_range=SliceRange(start=, finish=,reversed=True)), ConsistencyLevel.ONE) [ColumnOrSuperColumn(column=None, super_column=SuperColumn(name='sc_2', columns=[Column(timestamp=1256527256, name='\x93T\x12\x1c\xc1\xde\x11\xde\xa2;\x00%K\xcb\xc7F', value='1'), Column(timestamp=1256527257, name='\x93\x95`\xc1\xde\x11\xde\xa4\x0c\x00%K\xcb\xc7F', value='2'), Column(timestamp=1256527257, name='\x93\xb1\x11T\xc1\xde\x11\xde\xae\xd7\x00%K\xcb\xc7F', value='3')]))] client.get_slice(keyspace, key, ColumnParent(column_family=Super1), SlicePredicate(slice_range=SliceRange(start=, finish=,reversed=False)), ConsistencyLevel.ONE) [ColumnOrSuperColumn(column=None, super_column=SuperColumn(name='sc_2', columns=[Column(timestamp=1256527256, name='\x93T\x12\x1c\xc1\xde\x11\xde\xa2;\x00%K\xcb\xc7F', value='1'), Column(timestamp=1256527257, name='\x93\x95`\xc1\xde\x11\xde\xa4\x0c\x00%K\xcb\xc7F', value='2'), Column(timestamp=1256527257, name='\x93\xb1\x11T\xc1\xde\x11\xde\xae\xd7\x00%K\xcb\xc7F', value='3')]))] On Sun, Oct 25, 2009 at 8:00 PM, Jonathan Ellis jbel...@gmail.com wrote: it's the column attribute of the column_path parameter. uuids have a specific meaning: http://en.wikipedia.org/wiki/Universally_Unique_Identifier test/system/test_server.py has an example of passing time uuids in python. On Sun, Oct 25, 2009 at 8:52 PM, kevin kevincastigli...@gmail.com wrote: i tried the TimeUUIDType and I get the error. can you tell me which here should be a UUID? what is time based uuid? and which parameter here should be uuid? thanks import lazyboy,time, pprint from cassandra import Cassandra from cassandra.ttypes import * client=lazyboy.connection.Client(['localhost:9160']) keyspace = Keyspace1 key='mykeyx' column_path = ColumnPath(column_family=Super1,column=x,super_column='sc_2') client.insert(keyspace, key, column_path, 'a', time.time(), ConsistencyLevel.ONE); Traceback (most recent call last): File stdin, line 1, in module File /Users/kevin/common/lazyboy/connection.py, line 109, in func raise e cassandra.ttypes.InvalidRequestException: InvalidRequestException(why='UUIDs must be exactly 16 bytes') this is the config in storage-conf.xml ColumnFamily CompareWith=BytesType Name=Standard1 FlushPeriodInMinutes=60/ ColumnFamily CompareWith=UTF8Type Name=Standard2
Re: are columns of a supercolumn name sorted?
On Mon, Oct 26, 2009 at 9:35 AM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Oct 26, 2009 at 10:25 AM, kevin kevincastigli...@gmail.com wrote: correct me if i am wrong, but i can get only the count [100 default or more] number of subcolumns right? if you are slicing supercolumns, you always get all the subcolumns of each great thanks for clarifying this. when im slicing super columns i will get count number of super columns with each super column having all their sub columns which i can reverse in my client.
Re: are columns of a supercolumn name sorted?
i tried the TimeUUIDType and I get the error. can you tell me which here should be a UUID? what is time based uuid? and which parameter here should be uuid? thanks import lazyboy,time, pprint from cassandra import Cassandra from cassandra.ttypes import * client=lazyboy.connection.Client(['localhost:9160']) keyspace = Keyspace1 key='mykeyx' column_path = ColumnPath(column_family=Super1,column=x,super_column='sc_2') client.insert(keyspace, key, column_path, 'a', time.time(), ConsistencyLevel.ONE); Traceback (most recent call last): File stdin, line 1, in module File /Users/kevin/common/lazyboy/connection.py, line 109, in func raise e cassandra.ttypes.InvalidRequestException: InvalidRequestException(why='UUIDs must be exactly 16 bytes') this is the config in storage-conf.xml ColumnFamily CompareWith=BytesType Name=Standard1 FlushPeriodInMinutes=60/ ColumnFamily CompareWith=UTF8Type Name=Standard2/ ColumnFamily CompareWith=TimeUUIDType Name=StandardByUUID1/ ColumnFamily ColumnType=Super CompareWith=UTF8Type CompareSubcolumnsWith=TimeUUIDType Name=Super1/ On Sun, Oct 25, 2009 at 5:40 PM, Jonathan Ellis jbel...@gmail.com wrote: Sorry, the paragraph about subcolumns always being sorted by time is outdated. (I've taken it out on the 0.4 branch now -- it was already gone in trunk.) Read just below that about how comparewith and comparesubcolumnswith work. Sounds like using TimeUUIDType for subcolumns is what you want (with the reverse option to slice, to get most-recent-first). On Sun, Oct 25, 2009 at 5:17 PM, kevin kevincastigli...@gmail.com wrote: i am inserting three columns x,a,z into a super column named 'sc_2'. the config file says that the columns of a super column are timesorted, but when i get_slice it is sorted by the name of the columns. how do i get it time sorted so that i get the most recently inserted/updated column first? cassandra version apache-cassandra-incubating-0.4.1-bin.tar.gz, and lazyboy latest git clone. thanks import lazyboy,time, pprint from cassandra import Cassandra from cassandra.ttypes import * client=lazyboy.connection.Client(['localhost:9160']) keyspace = Keyspace1 key='mykeyx' column_path = ColumnPath(column_family=Super1,column=x,super_column='sc_2') client.insert(keyspace, key, column_path, 'a', time.time(), ConsistencyLevel.ONE); column_path = ColumnPath(column_family=Super1,column=a,super_column='sc_2') client.insert(keyspace, key, column_path, 'a', time.time(), ConsistencyLevel.ONE); column_path = ColumnPath(column_family=Super1,column=z,super_column='sc_2') client.insert(keyspace, key, column_path, 'a', time.time(), ConsistencyLevel.ONE); slice_range = SliceRange(start=, finish=) predicate = SlicePredicate(slice_range=slice_range) column_parent = ColumnParent(column_family=Super1) client.get_slice(keyspace, key, column_parent, predicate, ConsistencyLevel.ONE) [ColumnOrSuperColumn(column=None, super_column=SuperColumn(name='sc_2', columns=[Column(timestamp=1256512261, name='a', value='a'), Column(timestamp=1256512252, name='x', value='a'), Column(timestamp=1256512267, name='z', value='a')]))]
Re: are columns of a supercolumn name sorted?
thanks for the hint. i have inserted 3 columns into a super column with values 1, 2 and 3 in that order. but reversed variable in SliceRange has no effect on the order of column received. reversed set to either True or False in SliceRange i get the column in the same order. it is not getting reversed. Can you tell what is wrong here? import lazyboy,time, pprint,uuid from cassandra import Cassandra from cassandra.ttypes import * client=lazyboy.connection.Client(['localhost:9160']) keyspace = Keyspace1 key='mykeyx' column_path = ColumnPath(column_family=Super1,column=uuid.uuid1().bytes,super_column='sc_2') client.insert(keyspace, key, column_path, '1', time.time(), ConsistencyLevel.ONE); column_path = ColumnPath(column_family=Super1,column=uuid.uuid1().bytes,super_column='sc_2') client.insert(keyspace, key, column_path, '2', time.time(), ConsistencyLevel.ONE); column_path = ColumnPath(column_family=Super1,column=uuid.uuid1().bytes,super_column='sc_2') client.insert(keyspace, key, column_path, '3', time.time(), ConsistencyLevel.ONE); client.get_slice(keyspace, key, ColumnParent(column_family=Super1), SlicePredicate(slice_range=SliceRange(start=, finish=,reversed=True)), ConsistencyLevel.ONE) [ColumnOrSuperColumn(column=None, super_column=SuperColumn(name='sc_2', columns=[Column(timestamp=1256527256, name='\x93T\x12\x1c\xc1\xde\x11\xde\xa2;\x00%K\xcb\xc7F', value='1'), Column(timestamp=1256527257, name='\x93\x95`\xc1\xde\x11\xde\xa4\x0c\x00%K\xcb\xc7F', value='2'), Column(timestamp=1256527257, name='\x93\xb1\x11T\xc1\xde\x11\xde\xae\xd7\x00%K\xcb\xc7F', value='3')]))] client.get_slice(keyspace, key, ColumnParent(column_family=Super1), SlicePredicate(slice_range=SliceRange(start=, finish=,reversed=False)), ConsistencyLevel.ONE) [ColumnOrSuperColumn(column=None, super_column=SuperColumn(name='sc_2', columns=[Column(timestamp=1256527256, name='\x93T\x12\x1c\xc1\xde\x11\xde\xa2;\x00%K\xcb\xc7F', value='1'), Column(timestamp=1256527257, name='\x93\x95`\xc1\xde\x11\xde\xa4\x0c\x00%K\xcb\xc7F', value='2'), Column(timestamp=1256527257, name='\x93\xb1\x11T\xc1\xde\x11\xde\xae\xd7\x00%K\xcb\xc7F', value='3')]))] On Sun, Oct 25, 2009 at 8:00 PM, Jonathan Ellis jbel...@gmail.com wrote: it's the column attribute of the column_path parameter. uuids have a specific meaning: http://en.wikipedia.org/wiki/Universally_Unique_Identifier test/system/test_server.py has an example of passing time uuids in python. On Sun, Oct 25, 2009 at 8:52 PM, kevin kevincastigli...@gmail.com wrote: i tried the TimeUUIDType and I get the error. can you tell me which here should be a UUID? what is time based uuid? and which parameter here should be uuid? thanks import lazyboy,time, pprint from cassandra import Cassandra from cassandra.ttypes import * client=lazyboy.connection.Client(['localhost:9160']) keyspace = Keyspace1 key='mykeyx' column_path = ColumnPath(column_family=Super1,column=x,super_column='sc_2') client.insert(keyspace, key, column_path, 'a', time.time(), ConsistencyLevel.ONE); Traceback (most recent call last): File stdin, line 1, in module File /Users/kevin/common/lazyboy/connection.py, line 109, in func raise e cassandra.ttypes.InvalidRequestException: InvalidRequestException(why='UUIDs must be exactly 16 bytes') this is the config in storage-conf.xml ColumnFamily CompareWith=BytesType Name=Standard1 FlushPeriodInMinutes=60/ ColumnFamily CompareWith=UTF8Type Name=Standard2/ ColumnFamily CompareWith=TimeUUIDType Name=StandardByUUID1/ ColumnFamily ColumnType=Super CompareWith=UTF8Type CompareSubcolumnsWith=TimeUUIDType Name=Super1/ On Sun, Oct 25, 2009 at 5:40 PM, Jonathan Ellis jbel...@gmail.com wrote: Sorry, the paragraph about subcolumns always being sorted by time is outdated. (I've taken it out on the 0.4 branch now -- it was already gone in trunk.) Read just below that about how comparewith and comparesubcolumnswith work. Sounds like using TimeUUIDType for subcolumns is what you want (with the reverse option to slice, to get most-recent-first). On Sun, Oct 25, 2009 at 5:17 PM, kevin kevincastigli...@gmail.com wrote: i am inserting three columns x,a,z into a super column named 'sc_2'. the config file says that the columns of a super column are timesorted, but when i get_slice it is sorted by the name of the columns. how do i get it time sorted so that i get the most recently inserted/updated column first? cassandra version apache-cassandra-incubating-0.4.1-bin.tar.gz, and lazyboy latest git clone. thanks import lazyboy,time, pprint from cassandra import Cassandra from cassandra.ttypes import * client=lazyboy.connection.Client(['localhost:9160']) keyspace = Keyspace1 key='mykeyx' column_path
Re: cassandra slows down after inserts
Jonathan, good to know this is normal. but the reason i sent this is , once i stop and start cassandra it gets ready in few seconds and insertion and get_slices are super fast like few milliseconds. but after it starts to slow down even after 10 hours it is still slow. why is this the case? On Mon, Jul 13, 2009 at 6:14 AM, Jonathan Ellis jbel...@gmail.com wrote: Cassandra is replaying the transaction log and preloading SSTable indexes. This is normal. On Mon, Jul 13, 2009 at 8:10 AM, rkmr...@gmail.comrkmr...@gmail.com wrote: when i stop cassandra and start it again, this is what is printed. it takes just a couple of seconds for this to run. and after that it becomes really fast. Listening for transport dt_socket at address: DEBUG - Loading settings from ./../conf/storage-conf.xml DEBUG - adding Super1 as 0 DEBUG - adding Standard2 as 1 DEBUG - adding Standard1 as 2 DEBUG - adding StandardByTime1 as 3 DEBUG - adding LocationInfo as 4 DEBUG - adding HintsColumnFamily as 5 DEBUG - INDEX LOAD TIME for /home/mark/local/var/cassandra/data/Table1-Super1-9-Data.db: 400 ms. DEBUG - INDEX LOAD TIME for /home/mark/local/var/cassandra/data/Table1-Super1-52-Data.db: 300 ms. DEBUG - INDEX LOAD TIME for /home/mark/local/var/cassandra/data/Table1-Super1-92-Data.db: 300 ms. DEBUG - INDEX LOAD TIME for /home/mark/local/var/cassandra/data/Table1-Super1-138-Data.db: 751 ms. DEBUG - INDEX LOAD TIME for /home/mark/local/var/cassandra/data/Table1-Super1-150-Data.db: 100 ms. DEBUG - INDEX LOAD TIME for /home/mark/local/var/cassandra/data/Table1-Super1-152-Data.db: 50 ms. DEBUG - INDEX LOAD TIME for /home/mark/local/var/cassandra/data/Table1-Super1-154-Data.db: 100 ms. INFO - Compacting [/home/mark/local/var/cassandra/data/Table1-Super1-138-Data.db,/home/mark/local/var/cassandra/data/Table1-Super1-150-Data.db,/home/mark/local/var/cassandra/data/Table1-Super1-152-Data.db,/home/mark/local/var/cassandra/data/Table1-Super1-154-Data.db] DEBUG - INDEX LOAD TIME for /home/mark/local/var/cassandra/data/Table1-Standard1-2-Data.db: 0 ms. DEBUG - INDEX LOAD TIME for /home/mark/local/var/cassandra/data/Table1-Standard1-4-Data.db: 50 ms. DEBUG - INDEX LOAD TIME for /home/mark/local/var/cassandra/data/Table1-Standard1-6-Data.db: 0 ms. INFO - Replaying /home/mark/local/var/cassandra/commitlog/CommitLog-1247454203796.log DEBUG - index size for bloom filter calc for file : /home/mark/local/var/cassandra/data/Table1-Super1-138-Data.db : 73600 DEBUG - index size for bloom filter calc for file : /home/mark/local/var/cassandra/data/Table1-Super1-150-Data.db : 84224 DEBUG - index size for bloom filter calc for file : /home/mark/local/var/cassandra/data/Table1-Super1-152-Data.db : 94848 DEBUG - index size for bloom filter calc for file : /home/mark/local/var/cassandra/data/Table1-Super1-154-Data.db : 105472 DEBUG - Expected bloom filter size : 105472 INFO - Compacted to /home/mark/local/var/cassandra/data/Table1-Super1-139-Data.db. 0/28831084 bytes for 104856/104860 keys read/written. Time: 8119ms. INFO - Flushing Memtable(Super1)@552364977 DEBUG - Submitting Super1 for compaction INFO - Completed flushing Memtable(Super1)@552364977 INFO - Flushing Memtable(Standard1)@1290243769 DEBUG - Submitting Standard1 for compaction INFO - Completed flushing Memtable(Standard1)@1290243769 INFO - Compacting [/home/mark/local/var/cassandra/data/Table1-Standard1-2-Data.db,/home/mark/local/var/cassandra/data/Table1-Standard1-4-Data.db,/home/mark/local/var/cassandra/data/Table1-Standard1-6-Data.db,/home/mark/local/var/cassandra/data/Table1-Standard1-8-Data.db] DEBUG - index size for bloom filter calc for file : /home/mark/local/var/cassandra/data/Table1-Standard1-2-Data.db : 256 DEBUG - index size for bloom filter calc for file : /home/mark/local/var/cassandra/data/Table1-Standard1-4-Data.db : 512 DEBUG - index size for bloom filter calc for file : /home/mark/local/var/cassandra/data/Table1-Standard1-6-Data.db : 768 DEBUG - index size for bloom filter calc for file : /home/mark/local/var/cassandra/data/Table1-Standard1-8-Data.db : 1024 DEBUG - Expected bloom filter size : 1024 INFO - Compacted to /home/mark/local/var/cassandra/data/Table1-Standard1-3-Data.db. 0/210 bytes for 0/1 keys read/written. Time: 301ms. DEBUG - Starting to listen on 127.0.0.1:7001 INFO - Cassandra starting up... On Mon, Jul 13, 2009 at 6:06 AM, rkmr...@gmail.com rkmr...@gmail.com wrote: how do i find out if JVM is GCing? On Sun, Jul 12, 2009 at 10:37 PM, Sandeep Tata sandeep.t...@gmail.com wrote: What hardware are you running one? dual quadcore intel xeon 2.0 ghz, 32GB ram, and hardware raid config operating system is fedora core 9 How long does the slowdown last ? i stopped inserting data after slowdown starts, and it is still slow now after over 10 hours. however
Re: Up and Running with Cassandra
any ideas when this will happen? thanks On Tue, Jul 7, 2009 at 10:52 AM, Evan Weaver ewea...@gmail.com wrote: It will; I don't think the change is committed yet. Evan On Tue, Jul 7, 2009 at 10:50 AM, Kevin Castiglionekevincastigli...@gmail.com wrote: thanks for this post! you have said that: the on-disk storage format is expected to change in version 0.4.0. im using svn latest revision 791696. will the on-disk storage format change affect this version? On Mon, Jul 6, 2009 at 11:18 PM, Evan Weaver ewea...@gmail.com wrote: In case you missed it, a big introductory post: http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/ Evan -- Evan Weaver -- Evan Weaver