Re: Strategies for storing lexically ordered data in supercolumns
On Fri, Mar 12, 2010 at 7:21 PM, Peter Chang pete...@gmail.com wrote: My original post is probably confusing. I was originally talking about columns and I don't see what the solution is. Sorry, I misunderstood. * So I was thinking I set the subcolumn compareWith to UTF8Type or BytesType and construct a key [for the subcolumn, not a row key] * * * *[user's lastname + user's firstname + user's uuid]* * * *This would result in sorted subcolumn and user list.* * * Nevertheless, I still don't see/understand the solution. Let's say the person's name changes. The sort is no longer valid. That column value would need to be changed in order for the sort to be correct. When their name changes, you delete the existing column and insert a new one with the correct name, which will then sort correctly. -Brandon
Re: problem with running simple example using cassandra-cli with 0.6.0-beta2
On Wed, Mar 10, 2010 at 5:09 PM, Bill Au bill.w...@gmail.com wrote: I am checking out 0.6.0-beta2 since I need the batch-mutate function. I am just trying to run the example is the cassandra-cli Wiki: http://wiki.apache.org/cassandra/CassandraCli Here is what I am getting: cassandra set Keyspace1.Standard1['jsmith']['first'] = 'John' Value inserted. cassandra get Keyspace1.Standard1['jsmith'] = (column=6669727374, value=John, timestamp=1268261785077) Returned 1 results. The column name being returned by get (6669727374) does not match what is set (first). This is true for all column names. cassandra set Keyspace1.Standard1['jsmith']['last'] = 'Smith' Value inserted. cassandra set Keyspace1.Standard1['jsmith']['age'] = '42' Value inserted. cassandra get Keyspace1.Standard1['jsmith'] = (column=6c617374, value=Smith, timestamp=1268262480130) = (column=6669727374, value=John, timestamp=1268261785077) = (column=616765, value=42, timestamp=1268262484133) Returned 3 results. Is this a problem in 0.6.0-beta2 or am I doing anything wrong? Bill This is normal. You've added the 'first', 'last', and 'age' columns to the 'jsmith' row, and then asked for the entire row, so you got all 3 columns back. -Brandon
Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'
On Tue, Mar 9, 2010 at 1:14 PM, Sylvain Lebresne sylv...@yakaz.com wrote: I've inserted 1000 row of 100 column each (python stress.py -t 2 -n 1000 -c 100 -i 5) If I read, I get the roughly the same number of row whether I read the whole row (python stress.py -t 10 -n 1000 -o read -r -c 100) or only the first column (python stress.py -t 10 -n 1000 -o read -r -c 1). And that's less that 10 rows by seconds. So sure, when I read the whole row, that almost 1000 columns by seconds, which is roughly 50M/s troughput, which is quite good. But when I read only the first column, I get 10 columns by seconds, that 500K/s, which is less good. Now, from what I've understood so far, cassandra doesn't deserialize whole row to read a single column (I'm not using supercolumn here), so I don't understand those numbers. A row causes a disk seek while columns are contiguous. So if the row isn't in the cache, you're being impaired by the seeks. In general, fatter rows should be more performant than skinny ones. -Brandon
Re: Bad read performances: 'few rows of many columns' vs 'many rows of few columns'
On Tue, Mar 9, 2010 at 2:28 PM, Sylvain Lebresne sylv...@yakaz.com wrote: A row causes a disk seek while columns are contiguous. So if the row isn't in the cache, you're being impaired by the seeks. In general, fatter rows should be more performant than skinny ones. Sure, I understand that. Still, I get 400 columns by seconds (ie, 400 seeks by seconds) when the rows only have one column by row, while I have 10 columns by seconds when the row have 100 columns, even though I read only the first column. Doesn't that imply the disk is having to seek further for the rows with more columns? -Brandon
Re: Using Cassandra via the Erlang Thrift Client API (HOW ??)
On Thu, Mar 4, 2010 at 11:27 AM, J T jt4websi...@googlemail.com wrote: Once I have something working I'll write a new post back with a couple of examples here to help future newbies on how to talk to cassandra from erlang, since those examples are not present on the cassandra/thrift wiki as far as I can tell. Could you update the wiki instead? :) http://wiki.apache.org/cassandra/ClientExamples -Brandon
Re: finding Cassandra servers
2010/3/3 Ted Zlatanov t...@lifelogs.com On Wed, 3 Mar 2010 09:04:37 -0800 Ryan King r...@twitter.com wrote: RK Something like RRDNS is no more complex that managing a list of seed nodes. My concern is that both RRDNS and seed node lists are vulnerable to individual node failure. They're not. That's why they're lists. If one doesn't work out, move along to the next. Updating DNS when a node dies means you have to wait until the TTL expires, and if you lower the TTL too much your server will get killed. Don't do that. Make your clients keep trying. Any failure is likely to be transient anyway, so running around messing with DNS every time a machine is offline doesn't make much sense. -Brandon
Re: Is Cassandra a document based DB?
On Mon, Mar 1, 2010 at 5:34 AM, HHB hubaghd...@yahoo.ca wrote: What are the advantages/disadvantages of Cassandra over HBase? Ease of setup: all nodes are the same. No single point of failure: all nodes are the same. Speed: http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf Richer model: supercolumns. Multi-datacenter awareness. There are likely other things I'm forgetting, but those stand out for me. -Brandon
Re: problem about bootstrapping when used in huge node
On Tue, Feb 23, 2010 at 7:31 AM, Jonathan Ellis jbel...@gmail.com wrote: (2) How to use node has 12 1TB disk?? You should use a better filesystem than ext3. :) We use xfs at rackspace. Also, don't use RAID5. Let Cassandra's replication handle disk failure scenarios instead, and supply multiple DataFileDirectory directives to unique mount points. If you must use RAID, RAID0, 1, or 10 would be better. -Brandon
Re: reads are slow
On Tue, Feb 23, 2010 at 11:33 AM, kevin kevincastigli...@gmail.com wrote: I have given 10GB RAM in cassandra.in.sh. -Xmx10G \ i have increased KeysCachedFraction to 0.04. i have two different drives for commitlog and data directoy. i have about 3 million rows. what can i do to improve read speed? thanks a lot! Since 0.5 doesn't have row caching, 10G is probably too much to give the JVM and is hampering the OS cache. Try between 2-6GB instead and see if that helps. -Brandon
Re: Cassandra paging, gathering stats
On Tue, Feb 23, 2010 at 11:54 AM, Sonny Heer sonnyh...@gmail.com wrote: Columns can easily be paginated via the 'start' and 'finish' parameters. You can't jump to a random page, but you can provide next/previous behavior. Do you have an example of this? From a client, they can pass in the last key, which can then be used as the start with some predefined count. But how can you do previous? To go backwards, you pass the first column seen as the finish parameter and use an empty start parameter with an appropriate count. -Brandon
Re: Cassandra paging, gathering stats
On Tue, Feb 23, 2010 at 2:28 PM, Jonathan Ellis jbel...@gmail.com wrote: you'd actually use first column as start, empty finish, count=pagesize, and reversed=True, unless I'm misunderstanding something. Oops, Jonathan is correct. -Brandon
Re: Cassandra paging, gathering stats
On Mon, Feb 22, 2010 at 1:40 PM, Sonny Heer sonnyh...@gmail.com wrote: Hey, We are in the process of implementing a cassandra application service. we have already ingested TB of data using the cassandra bulk loader (StorageService). One of the requirements is to get a data explosion factor as a result of denormalization. Since the writes are going to the memory tables, I'm not sure how I could grab stats. I cant get size of data before ingest since some of the data may be duplicated. Are you talking about duplication across nodes due to the replication factor, or because some rows may still be in the memtable? I think what you want to do is bin/nodeprobe flush, bin/nodeprobe compact, wait until the system is idle and then sum the size of everything in your data paths that starts with the name of your column family. Also a general problem we are running into is an easy way to do paging over the data set (not just rows but columns). Looks like now the API has ways to do count, but no offset. Columns can easily be paginated via the 'start' and 'finish' parameters. You can't jump to a random page, but you can provide next/previous behavior. -Brandon
Re: Row with many columns
On Wed, Feb 17, 2010 at 9:48 AM, ruslan usifov ruslan.usi...@gmail.comwrote: Hello For example if we have table, which have rows with many columns (1 or more) how this data will by partitioned?? I expect that one row may by slit on some nodes. But look at source of cassandra i think that one row store on one node, and never slits, or i be mistaken?? You are correct, a row must fit on a node. -Brandon
Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?
On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller martin.grabmuel...@eleven.de wrote: In my tests I have observed that good read latency depends on keeping the number of data files low. In my current test setup, I have stored 1.9 TB of data on a single node, which is in 21 data files, and read latency is between 10 and 60ms (for small reads, larger read of course take more time). In earlier stages of my test, I had up to 5000 data files, and read performance was quite bad: my configured 10-second RPC timeout was regularly encountered. I believe it is known that crossing sstables is O(NlogN) but I'm unable to find the ticket on this at the moment. Perhaps Stu Hood will jump in and enlighten me, but in any case I believe https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve it. Keeping write volume low enough that compaction can keep up is one solution, and throwing hardware at the problem is another, if necessary. Also, the row caching in trunk (soon to be 0.6 we hope) helps greatly for repeat hits. -Brandon
Re: Nodeprobe Not Working Properly
On Tue, Feb 16, 2010 at 11:08 AM, Shahan Khan cont...@shahan.me wrote: I can ping to the other server using db1a instead of the host name. By 'host name' I assume you mean IP address. 192.168.1.13db1a ::1 localhost ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts # Auto-generated hostname. Please do not remove this comment. 127.0.0.1 db1b.domain.com localhost db1b localhost.localdomain db1b:~$ ping db1a PING db1a (192.168.1.13) 56(84) bytes of data. 64 bytes from db1a (192.168.1.13): icmp_seq=1 ttl=64 time=0.252 ms 64 bytes from db1a (192.168.1.13): icmp_seq=2 So db1b's host resolution appears to be ok. Is this output from db1a, or db1b? It appears to be db1b, but your last issue was with db1a resolving db1b's IP address. Cassandra doesn't do anything magical with hostname resolution, it relies on the underlying system for that. -Brandon
Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?
On Tue, Feb 16, 2010 at 11:50 AM, Weijun Li weiju...@gmail.com wrote: Dumped 50mil records into my 2-node cluster overnight, made sure that there's not many data files (around 30 only) per Martin's suggestion. The size of the data directory is 63GB. Now when I read records from the cluster the read latency is still ~44ms, --there's no write happening during the read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is saturated: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 47.6767.67 190.33 17.00 23933.33 677.33 118.70 5.24 25.25 4.64 96.17 sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 sda2 47.6767.67 190.33 17.00 23933.33 677.33 118.70 5.24 25.25 4.64 96.17 sda3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000.00 0.00 0.00 CPU usage is low. Does this mean disk i/o is the bottleneck for my case? Will it help if I increase KCF to cache all sstable index? That's exactly what this means. Disk is slow :( Also, this is the almost a read-only mode test, and in reality, our write/read ratio is close to 1:1 so I'm guessing read latency will even go higher in that case because there will be difficult for cassandra to find a good moment to compact the data files that are being busy written. Reads that cause disk seeks are always going to slow things down, since disk seeks are inherently the slowest operation in a machine. Writes in Cassandra should always be fast, as they do not cause any disk seeks. -Brandon
Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?
On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li weiju...@gmail.com wrote: One more thoughts about Martin's suggestion: is it possible to put the data files into multiple directories that are located in different physical disks? This should help to improve the i/o bottleneck issue. Yes, you can already do this, just add more DataFileDirectory directives pointed at multiple drives. Has anybody tested the row-caching feature in trunk (shoot for 0.6?)? Row cache and key cache both help tremendously if your read pattern has a decent repeat rate. Completely random io can only be so fast, however. -Brandon
Re: Cassandra benchmark shows OK throughput but high read latency ( 100ms)?
On Tue, Feb 16, 2010 at 12:16 PM, Weijun Li weiju...@gmail.com wrote: Thanks for for DataFileDirectory trick and I'll give a try. Just noticed the impact of number of data files: node A has 13 data files with read latency of 20ms and node B has 27 files with read latency of 60ms. After I ran nodeprobe compact on node B its read latency went up to 150ms. The read latency of node A became as low as 10ms. Is this normal behavior? I'm using random partitioner and the hardware/JVM settings are exactly the same for these two nodes. It sounds like the latency jumped to 150ms because the newly written file was not in the OS cache. Another problem is that Java heap usage is always 900mb out of 6GB? Is there any way to utilize all of the heap space to decrease the read latency? By default, Cassandra will use a 1GB heap, as set in bin/cassandra.in.sh. You can adjust the jvm heap there via the -Xmx option, but generally you want to balance the jvm vs the OS cache. With 6GB, I would probably give 2GB to the jvm, but if you aren't having issues now increasing the jvm's memory probably won't provide any performance gains, but it's worth noting that with row cache in 0.6 this may change. -Brandon
Re: Nodeprobe Not Working Properly
On Mon, Feb 15, 2010 at 1:13 PM, Shahan Khan cont...@shahan.me wrote: db1b:~# nodeprobe -host db1a ring Error connecting to remote JMX agent! java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: This seems to indicate that db1a resolves as 127.0.0.1 on db1b, when it actually needs to resolve to the 192.168 address. Try passing the ip address as the host and it should work. -Brandon
Re: Best design in Cassandra
On Tue, Feb 2, 2010 at 9:27 AM, Erik Holstad erikhols...@gmail.com wrote: A supercolumn can still only compare subcolumns in a single way. Yeah, I know that, but you can have a super column per sort order without having to restart the cluster. You get a CompareWith for the columns, and a CompareSubcolumnsWith for subcolumns. If you need more column types to get different sort orders, you need another ColumnFamily. -Brandon
Re: How to retrieve keys from Cassandra ?
2010/2/2 Sébastien Pierre sebastien.pie...@gmail.com Hi Jonathan, In my case, I'll have much more columns (thousands to millions) than keys in logs (campaign x days), so it's not an issue to retrieve all of them. If that's the case, your dataset is small enough that you could maintain an index of the keys in another CF. If it needs to scale further, you can segment the index keys by year, month, etc. -Brandon
Re: Reverse sort order comparator?
On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad erikhols...@gmail.com wrote: Hey! I'm looking for a comparator that sort columns in reverse order on for example bytes? I saw that you can write your own comparator class, but just thought that someone must have done that already. When you get_slice, just set reverse to true in the SliceRange and it will reverse the order. -Brandon
Re: Reverse sort order comparator?
On Tue, Feb 2, 2010 at 11:29 AM, Erik Holstad erikhols...@gmail.com wrote: Thanks guys! So I want to use sliceRange but thinking about using the count parameter. For example give me the first x columns, next call I would like to call it with a start value and a count. If I was to use the reverse param in sliceRange I would have to fetch all the columns first, right? If you pass reverse as true, then instead of getting the first x columns, you'll get the last x columns. If you want to head backwards toward the beginning, you can pass the first column as the end value. -Brandon
Re: get_slice() slow if more number of columns present in a SCF.
On Tue, Feb 2, 2010 at 9:27 AM, envio user enviou...@gmail.com wrote: All, Here are some tests[batch_insert() and get_slice()] I performed on cassandra. snip I am ok with TEST1A and TEST1B. I want to populate the SCF with 500 columns and read 25 columns per key. snip This test is more worrying for us. We can't even read 1000 reads per second. Is there any limitation on cassandra, which will not work with more number of columns ?. Or mm I doing something wrong here?. Please let me know. I think you're mostly being limited by http://issues.apache.org/jira/browse/CASSANDRA-598 Can you try with a simple CF? -Brandon
Re: Internal structure of api calls
On Mon, Feb 1, 2010 at 3:48 PM, Erik Holstad erikhols...@gmail.com wrote: Hey guys! I'm totally new to Cassandra and have a couple of question about the internal structure of some of the calls. When using the slicerange(count) for the get calls, does the actual result being truncated on the server or is it happening on the client ie is it more efficient than the regular call? It happens on the server. Is there an internal counter for the get_count call that keeps track of the count or do you only save on return IO? No, get_count currently has to deserialize the row and count the columns, excluding tombstones. -Brandon
Re: Best design in Cassandra
On Mon, Feb 1, 2010 at 5:20 PM, Erik Holstad erikhols...@gmail.com wrote: Hey! Have a couple of questions about the best way to use Cassandra. Using the random partitioner + the multi_get calls vs order preservation + range_slice calls? When you use an OPP, the distribution of your keys becomes your problem. If you don't have an even distribution, this will be reflected in the load on the nodes, while the RP gives you even distribution. What is the benefit of using multiple families vs super column? http://issues.apache.org/jira/browse/CASSANDRA-598 is currrently why I prefer simple CFs instead of supercolumns. For example in the case of sorting in different orders. One good thing that I can see here when using super column is that you don't have to restart your cluster every time you want to add something new order. A supercolumn can still only compare subcolumns in a single way. When http://issues.apache.org/jira/browse/CASSANDRA-44 is completed, you will be able to add CFs without restarting. -Brandon
Re: Error running chiton GTK
On Thu, Jan 28, 2010 at 7:10 AM, Richard Grossman richie...@gmail.comwrote: Hi I've a need for admin tool to cassandra I would like to try chiton GTK. I've make a clean install all module are Ok when I launch the application I get : Traceback (most recent call last): File ./chiton-client, line 6, in module from chiton.viewer import ChitonViewer ImportError: No module named chiton.viewer someone could help me may be ? Make sure your PYTHONPATH variable is set correctly. It probably needs to look something like PYTHONPATH=/path/to/telephus:/path/to/chiton (assuming you didn't not install telephus from the debian package.) -Brandon
Re: map/reduce on Cassandra
On Mon, Jan 25, 2010 at 1:13 PM, Ryan Daum r...@thimbleware.com wrote: I agree with what Jeff says here about RandomPartitioner support being key. +1 For my purposes with map/reduce I'd personally be fine with some general all-keys dump utility that wrote contents of one node to a file, and then just write my own integration from that file into Hadoop, etc.. I guess I'm thinking something similar to sstable2json except that unfortunately sstable2json will dump replica data not just the local node's data. Getting the contents of the commitlog into the file would be nice, too. bin/sstablekeys will dump just the keys from an sstable without row deserialization overhead, but it can't introspect a commitlog. -Brandon
Re: [VOTE] Graduation
On Mon, Jan 25, 2010 at 3:11 PM, Eric Evans eev...@rackspace.com wrote: There was some additional discussion[1] concerning Cassandra's graduation on the incubator list, and as a result we've altered the initial resolution to expand the size of the PMC by three to include our active mentors (new draft attached). I propose a vote for Cassandra's graduation to a top-level project. We'll leave this open for 72 hours, and assuming it passes, we can then take it to a vote with the Incubator PMC. +1 from me! [1] http://thread.gmane.org/gmane.comp.apache.incubator.general/24427 +1 -Brandon
Re: 'large' node configuration question
2010/1/21 Ted Zlatanov t...@lifelogs.com On Wed, 20 Jan 2010 21:14:27 -0600 Jonathan Ellis jbel...@gmail.com wrote: JE (I only mention that caveat because I don't know how well the JVM JE scales to heaps that large.) Sun mentions garbage collection issues in the HotSpot FAQ: http://java.sun.com/docs/hotspot/HotSpotFAQ.html#64bit_description Based on that, it seems like a good idea to enable the parallel or concurrent garbage collectors with large heaps. We're looking at this at our site as well so I'm curious about people's experiences. Cassandra already uses the ParNew and CMS GCs by default (in cassandra.in.sh ) -Brandon
Re: 'large' node configuration question
2010/1/21 Ted Zlatanov t...@lifelogs.com may not be right for a heap 16-64 times larger than the 1 GB heap specified in cassandra.in.sh. Using a heap that large probably does not make sense; you want that ram for filesystem cache. Also, maybe these options: -ea \ -Xdebug \ -XX:+HeapDumpOnOutOfMemoryError \ -Xrunjdwp:transport=dt_socket,server=y,address=,suspend=n \ should go in a debugging configuration, triggered by setting $CASSANDRA_DEBUG? With a 60+ GB heap, dumping it to a file could be very painful. It's pretty bad with a smaller heap too. You can always override the CASSANDRA_INCLUDE environment variable and point it at a file with your own options. -Brandon
Re: Cassandra to store logs as a list
2010/1/20 Sébastien Pierre sebastien.pie...@gmail.com Hi Mark, The most common query would be basically get all the logs for this particular day (and campaign) or get all the logs since this particular time stamp (and campaign), where everything would be aggregated by campaign id (it's for an ad server). In this case, would using a key like the following improve balancing: campaign:HEX_PADDED_CAMPAIGN_ID:NANOTIMESTAMP ? Also, if I add a prefix (like campaign:HEX_PADDED_CAMPAIGN_ID:), would the key have to be UTF8Type instead of TimeUUIDType ? If this is your only query, then you don't need an OPP and don't have to worry about balancing with the RandomPartitioner. I would make the keys something between campaign_id:year and capaign_id:year:month:day:hour depending on how much volume you expect, so as not not overload a row. -Brandon
Re: Cassandra to store logs as a list
2010/1/20 Sébastien Pierre sebastien.pie...@gmail.com Hmmm, but the only thing that is not clear is how I would store a lot of values for the same key ? With redis, I was using keys like campaign:campaign_id:MMDD to store a *list* of JSON-serialized log info, and the list could scale to litteraly millions of entries. From my understanding, Cassandra can only store 1 value per (colum key, field) couple, doesn't it ? Each row in Cassandra can have an arbitrary number of columns consisting of a name and value (and timestamp.) The columns are sorted on the name based on the type used, which is why I recommended the TimeUUIDType so you would get time-based sorting. So your row keys would be like campaign:campaign_id:MMDD, your column names a TimeUUIDType, and your values the JSON data. Millions of columns in a row is ok, I would begin using caution beyond perhaps 100M though. -Brandon
Re: something bizzare occured
On Sat, Jan 16, 2010 at 11:00 AM, Todd Burruss bburr...@real.com wrote: do these patches work for the 0.5 branch? they don't seem to be in the tip of the branch They might, I've not tried. However, 685 was deemed a large enough change to apply to trunk only, not 0.5, which is why you don't see them. -Brandon
Re: something bizzare occured
On Fri, Jan 15, 2010 at 5:43 PM, B. Todd Burruss bburr...@real.com wrote: so i changed to QUORUM and retested. puts again work as expected when a node is down. thx! however, the response time for puts went from about 5ms to 400ms because i took 1 of the 5 nodes out. ROW-MUTATION-STAGE pendings jumped into to 100's on one of the remaining nodes and the WriteLatency for the column family on this node also went thru the roof. i added the server back and the performance immediately went back to the way it was. is cassandra trying to constantly connect to the downed server? or what might be causing the performance to drop so dramatically? It sounds like you're running into: http://issues.apache.org/jira/browse/CASSANDRA-658 -Brandon
Re: easy interface to Cassandra
2010/1/12 Ted Zlatanov t...@lifelogs.com Map latest = client.get(new String[] { row1 }, Values/-1[]); Reminds me of the old colon-separated CF format. I'm not fond of passing parameters to my functions that have their own special syntax. +1 to language-specific idiomaticness instead. -Brandon
Re: Graduation
On Mon, Jan 11, 2010 at 1:14 PM, Eric Evans eev...@rackspace.com wrote: The response to this was quite favorable and consensus seems to be that we are ready. How many people had a chance to review the draft board resolution that was attached to the original mail (and is attached again to this one)? I have reviewed it and everything looks good. +1 -Brandon
Re: async calls in cassandra
On Thu, Dec 31, 2009 at 8:46 AM, Ran Tavory ran...@gmail.com wrote: Does cassandra/thrift support asynchronous IO calls? Is this planned for an upcoming release? Cassandra does not, since it uses a thread per connection model. I spent some time trying to enable it (because I use twisted python) by switching to the thrift THsHaServer, but this did not work and dropped performance significantly. That said, I don't find using a connection pool to be much trouble, and actually becomes an advantage when you're interacting with multiple hosts. -Brandon
Re: create only - no update
On Tue, Dec 15, 2009 at 5:19 PM, Brian Burruss bburr...@real.com wrote: can the cassandra client (java specifically) specify that a particular put should be create only, do not update? If the value already exists in the database, i want the put to fail. for instance, two users want the exact same username, so they both do a get to determine if the username already exists, it doesn't, so they create. the last one to create wins, correct? Correct. You would need to implement a locking mechanism such as Zookeeper in your application to get around this. -Brandon
Re: Cassandra Database Format Compatibility
On Mon, Nov 23, 2009 at 4:47 PM, Jon Graham sjclou...@gmail.com wrote: Hello Everyone, Will the Cassandra database format for the current Cassandra source trunk be compatible with the 0.5 Cassandra release? Yes. Only the commitlog format changes between 0.4 and 0.5, and that isn't a problem since you can just nodeprobe flush to get around it. If there are database version differences, is there a migration path to convert older data formats to the new versions? In the future if the format changes, there will be a migration path. Is there an estimated release date for the 0.5 release? Beta should happen fairly soon -- it's up for a vote in the IPMC right now. -Brandon
GUI/web interfaces to cassandra
The question of GUI/web interfaces to cassandra comes up from time to time on irc, so I thought I'd send a note to the ML to describe the current options. For a web interface, there's contrib/cassandra_browser in trunk. This allows both retrieving and inserting data. For a GTK-based GUI, I've created http://github.com/driftx/chiton but it only allows browsing of the data at the moment since that meets my needs. If anyone knows of others, feel free to chime in. -Brandon
Re: Cassandra backup and restore procedures
On Thu, Nov 19, 2009 at 1:18 PM, Freeman, Tim tim.free...@hp.com wrote: I'm not going to be on Amazon, but I'm planning to use hostnames instead of IP's and a dynamically generated /etc/hosts file and I think that would deal with this problem. I'm sure a private DNS server would be just as good. My real motive in saying this is so someone will scream at me if I'm wrong and save me the time of exploring the bad solution. :-). This is exactly what I do and it has worked great for me. -Brandon
Re: Why cassandra single node so slow?
On Sat, Nov 14, 2009 at 5:47 AM, ruslan usifov ruslan.usi...@gmail.comwrote: Hello! I'm new in cassandra son i can misunderstand some things. In follow benchmark. I have insert 400 records like this *snip* # Define your cluster(s) connection.add_pool('test', ['localhost:9160']) This is your problem, you're only using one connection. Use 20+ and it will be much faster, however it's unlikely that a single python process will be able to truly push Cassandra to the limit. That said, take a look at test/system/stress.py. -Brandon
Re: Re: bandwidth limiting Cassandra's replication and access control
On Wed, Nov 11, 2009 at 9:40 AM, Coe, Robin robin@bluecoat.com wrote: IMO, auth services should be left to the application layer that interfaces to Cassandra and not built into Cassandra. In the tutorial snippet included below, the access being granted is at the codebase level, not the transaction level. Since users of Cassandra will generally be fronted by a service layer, the java security manager isn’t going to suffice. What this snippet could do, though, and may be the rationale for the request, is to ensure that unauthorized users cannot instantiate a new Cassandra server. However, if a user has physical access to the machine on which Cassandra is installed, they could easily bypass that layer of security. What if Cassandra IS the application you're exposing? Imagine a large company that creates one large internal Cassandra deployment, and has multiple departments it wants to create separate keyspaces for. You can do that now, but there's nothing except a gentlemen's agreement to prevent one department from trashing another department's keyspace, and accidents do happen. You can front the service with some kind of application layer, but then you have another API to maintain, and you'll lose some performance this way. -Brandon
Re: important performance note
On Thu, Oct 22, 2009 at 11:08 PM, Igor Katkov ikat...@gmail.com wrote: What OS and what JVM version was it? Debian lenny amd64 Java(TM) SE Runtime Environment (build 1.6.0_12-b04) However, I suspect it affects a wide range of platforms and JVMs. -Brandon
Re: [VOTE] Project Logo
~~{ Ballot }~~ [ 3] 2http://99designs.com/contests/28940/entries/002 [ 9] 30 http://99designs.com/contests/28940/entries/030 [ 4] 32 http://99designs.com/contests/28940/entries/032 [ 6] 33 http://99designs.com/contests/28940/entries/033 [ 7] 90 http://99designs.com/contests/28940/entries/090 [10] 173 http://99designs.com/contests/28940/entries/173 [11] 175 http://99designs.com/contests/28940/entries/175 [ 5] 291 http://99designs.com/contests/28940/entries/291 [ 2] 369 http://99designs.com/contests/28940/entries/369 [ 8] 478 http://99designs.com/contests/28940/entries/478 [12] 576 http://99designs.com/contests/28940/entries/576 [ 1] 598 http://99designs.com/contests/28940/entries/598 [13] NOTA ~~{ Ballot }~~