Re: Direct control over where data is stored?

2011-06-06 Thread Watanabe Maki
You can know endpoints which cassandra will store your key to with getNaturalEndpoints, but you can't specify endpoint you want to use with this API. Partitioner decides which key will go to which node. With OPP, you may be able to predicate which key range will be stored to a node, so you can

Replication-aware compaction

2011-06-06 Thread David Boxenhorn
Is there some deep architectural reason why compaction can't be replication-aware? What I mean is, if one node is doing compaction, its replicas shouldn't be doing compaction at the same time. Or, at least a quorum of nodes should be available at all times. For example, if RF=3, and one node is

Re: [SPAM] Re: slow insertion rate with secondary index

2011-06-06 Thread David Boxenhorn
Is there really a 10x difference between indexed CFs and non-indexed CFs? On Mon, Jun 6, 2011 at 11:05 AM, Donal Zang zan...@ihep.ac.cn wrote: On 06/06/2011 05:38, Jonathan Ellis wrote: Index updates require read-before-write (to find out what the prior version was, if any, and update the

Re: [SPAM] Re: slow insertion rate with secondary index

2011-06-06 Thread Donal Zang
On 06/06/2011 10:15, David Boxenhorn wrote: Is there really a 10x difference between indexed CFs and non-indexed CFs? Well, as for my test, it is! I'm using 0.7.6-2, 9 nodes, 3 replicas, write_consistency_level QUORUM, about 90,000,000 rows (~ 1K per row) I use 20 process, 20rows for each

Migration question

2011-06-06 Thread Eric Czech
Hi, I have a quick question about migrating a cluster. We have a cassandra cluster with 10 nodes that we'd like to move to a new DC and what I was hoping to do is just copy the SSTables for each node to a corresponding node in the new DC (the new cluster will also have 10 nodes). Is there any

Re: problems with many columns on a row

2011-06-06 Thread Mario Micklisch
:-) There are several data Files: # ls -al *-Data.db -rw-r--r-- 1 cassandra cassandra 53785327 2011-06-05 14:44 CFTest-g-21-Data.db -rw-r--r-- 1 cassandra cassandra 56474656 2011-06-05 18:04 CFTest-g-38-Data.db -rw-r--r-- 1 cassandra cassandra 21705904 2011-06-05 20:02 CFTest-g-45-Data.db

Re: Migration question

2011-06-06 Thread aaron morton
Sounds like you are OK to turn off the existing cluster first. Assuming so, deliver any hints using JMX then do a nodetool flush to write out all the memtables and checkpoint the commit logs. You can then copy the data directories. The System data directory contains the nodes token and the

Re: Setting up cluster and nodetool ring in 0.8.0

2011-06-06 Thread David McNelis
Just to close this out, in case anyone was interested... my problem was firewall related, in that I didn't have my messaging/data port (7000) open on my seed node. Allowing traffic on this port resolved my issues. On Fri, Jun 3, 2011 at 1:43 PM, David McNelis dmcne...@agentisenergy.comwrote:

Re: [RELEASE] 0.8.0

2011-06-06 Thread Jonathan Ellis
Has this been running w/ default settings (i.e. relying on the new memtable_total_space_in_mb) or was this an upgrade from 0.7 (or otherwise had the per-CF memtable settings applied?) On Mon, Jun 6, 2011 at 12:00 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: 0.8 under load may turn out

Re: [SPAM] Re: slow insertion rate with secondary index

2011-06-06 Thread Jonathan Ellis
On Mon, Jun 6, 2011 at 6:28 AM, Donal Zang zan...@ihep.ac.cn wrote: Another thing I noticed is : if you first do insertion, and then build the secondary index use update column family ..., and then do select based on the index, the result is not right (seems the index is still being built

Re: [SPAM] Re: slow insertion rate with secondary index

2011-06-06 Thread David Boxenhorn
Jonathan, are Donal Zang's results (10x slowdown) typical? On Mon, Jun 6, 2011 at 3:14 PM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Jun 6, 2011 at 6:28 AM, Donal Zang zan...@ihep.ac.cn wrote: Another thing I noticed is : if you first do insertion, and then build the secondary index

Re: Replication-aware compaction

2011-06-06 Thread David Boxenhorn
Version 0.7.3. Yes, I am talking about minor compactions. I have three nodes, RF=3. 3G data (before replication). Not many users (yet). It seems like 3 nodes should be plenty. But when all 3 nodes are compacting, I sometimes get timeouts on the client, and I see in my logs that each one is full

Re: [SPAM] Re: slow insertion rate with secondary index

2011-06-06 Thread Jonathan Ellis
If the rows you are updating are not cached, yes. (Otherwise maybe 10% slower.) On Mon, Jun 6, 2011 at 7:29 AM, David Boxenhorn da...@citypath.com wrote: Jonathan, are Donal Zang's results (10x slowdown) typical? On Mon, Jun 6, 2011 at 3:14 PM, Jonathan Ellis jbel...@gmail.com wrote: On

Re: [RELEASE] 0.8.0

2011-06-06 Thread Terje Marthinussen
Of course I talked too soon. I saw a corrupted commitlog some days back after killing cassandra and I just came across a committed hints file after a cluster restart for some config changes :( Will look into that. Otherwise, not defaults, but close. The dataset is fed from scratch so yes,

Re: [RELEASE] 0.8.0

2011-06-06 Thread Marcos Ortiz
El 6/6/2011 1:00 AM, Terje Marthinussen escribió: 0.8 under load may turn out to be more stable and well behaving than any release so far Been doing a few test runs stuffing more than 1 billion records into a 12 node cluster and thing looks better than ever. VM's stable and nice at 11GB. No

Re: Troubleshooting IO performance ?

2011-06-06 Thread Philippe
hum..no, it wasn't swapping. cassandra was the only thing running on that server and i was querying the same keys over and over i restarted Cassandra and doing the same thing, io is now down to zero while cpu is up which dosen't surprise me as much. I'll report if it happens again. Le 5 juin

Re: [RELEASE] 0.8.0

2011-06-06 Thread Terje Marthinussen
How did that typo happen... across a committed hints file should be across a corrupted hints file Seems like the last supercolumn in the hints file has 0 subcolumns. This actually seem to be correctly serialized, but my code has a bug and fail to read it. When that is said, I wonder why the hint

working with time uuid

2011-06-06 Thread Patrick Julien
How does this work exactly? If you're using generation 1 time uuids for your keys to get ordering, doesn't this mean the keys need to be generated all on the same host when you either query or insert? Or does cassandra only inspect the bits that represent the time stamp of the UUID when

Re: [RELEASE] 0.8.0

2011-06-06 Thread Sylvain Lebresne
On Mon, Jun 6, 2011 at 4:17 PM, Terje Marthinussen tmarthinus...@gmail.com wrote: How did that typo happen... across a committed hints file should be across a corrupted hints file Seems like the last supercolumn in the hints file has 0 subcolumns. This actually seem to be correctly

Re: working with time uuid

2011-06-06 Thread Paul Loy
private static int compareTimestampBytes(ByteBuffer o1, ByteBuffer o2) { int o1Pos = o1.position(); int o2Pos = o2.position(); int d = (o1.get(o1Pos+6) 0xF) - (o2.get(o2Pos+6) 0xF); if (d != 0) return d; d = (o1.get(o1Pos+7) 0xFF) -

Re: working with time uuid

2011-06-06 Thread Paul Loy
well, to clarify, it first checks the timestamp bytes, then the rest so it doesn;t say they're the same if they came from 2 different servers. On Mon, Jun 6, 2011 at 4:52 PM, Paul Loy ketera...@gmail.com wrote: private static int compareTimestampBytes(ByteBuffer o1, ByteBuffer o2) {

Re: working with time uuid

2011-06-06 Thread Jonathan Ellis
... although it does break ties by comparing the other bytes. On Mon, Jun 6, 2011 at 10:52 AM, Paul Loy ketera...@gmail.com wrote: private static int compareTimestampBytes(ByteBuffer o1, ByteBuffer o2) { int o1Pos = o1.position(); int o2Pos = o2.position();

Re: working with time uuid

2011-06-06 Thread Patrick Julien
thanks On Mon, Jun 6, 2011 at 11:52 AM, Paul Loy ketera...@gmail.com wrote: private static int compareTimestampBytes(ByteBuffer o1, ByteBuffer o2) { int o1Pos = o1.position(); int o2Pos = o2.position(); int d = (o1.get(o1Pos+6) 0xF) - (o2.get(o2Pos+6) 0xF);

Re: [RELEASE] 0.8.0

2011-06-06 Thread Ryan King
On Mon, Jun 6, 2011 at 6:09 AM, Terje Marthinussen tmarthinus...@gmail.com wrote: Of course I talked too soon. I saw a corrupted commitlog some days back after killing cassandra and I just came across a committed hints file after a cluster restart for some config changes :( Will look into

Re: Replication-aware compaction

2011-06-06 Thread aaron morton
You should consider upgrading to 0.7.6 to get a fix to Gossip. Earlier 0.7 releases were prone to marking nodes up and down when they should not have been. See https://github.com/apache/cassandra/blob/cassandra-0.7/CHANGES.txt#L22 Are the TimedOutExceptions to the client for read or write

Re: problems with many columns on a row

2011-06-06 Thread aaron morton
Can you upgrade to the official 0.8 release and try again with logging set to DEBUG ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 6 Jun 2011, at 23:41, Mario Micklisch wrote: :-) There are several data Files: # ls

Re: hector-jpa

2011-06-06 Thread Patrick Julien
It also doesn't run, I get Exception in thread main javax.persistence.PersistenceException: Failed to load provider from META-INF/services at

Re: Troubleshooting IO performance ?

2011-06-06 Thread Philippe
Ok, here it goes again... No swapping at all... procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 1 63 32044 88736 37996 711652400 227156 0 18314 5607 30 5 11 53 1 63 32044

Re: hector-jpa

2011-06-06 Thread Ed Anuff
That's a work in progress and actually represents the next generation of JPA in Hector. There is a more lightweight version present in the release version of Hector called Hector Object Mapper. I'm sure Nate or Todd who've worked more on hector-jpa can elaborate. Ed On Mon, Jun 6, 2011 at 2:58

Re: hector-jpa

2011-06-06 Thread Patrick Julien
So what's recommended for right now? The data nucleus plugin? I don't need the query parts or anything, I just don't want to do have to translate columns to java fields and vice versa On Mon, Jun 6, 2011 at 6:25 PM, Ed Anuff e...@anuff.com wrote: That's a work in progress and actually

Re: hector-jpa

2011-06-06 Thread Ed Anuff
I'd recommend looking into Hector Object Mapper, it provides annotation-based mapping of Java object fields to columns: https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29 On Mon, Jun 6, 2011 at 3:27 PM, Patrick Julien pjul...@gmail.com wrote: So what's recommended for right

Re: hector-jpa

2011-06-06 Thread Patrick Julien
yeah, I saw that one, I was more interested in hector-jpa because I don't think this one supports @OneToMany. Was hoping the test failures were a temporary thing On Mon, Jun 6, 2011 at 6:41 PM, Ed Anuff e...@anuff.com wrote: I'd recommend looking into Hector Object Mapper, it provides

Re: hector-jpa

2011-06-06 Thread Nate McCall
All tests pass and everything compiles on a clean checkout of hector-jpa for me. I'm not sure how you got that error from hector-jpa, there is no reference in that project to me.prettyprint.hom.CassandraPersistenceProvider. Are you sure you don't have a reference to hector-object-mapper (which is

Re: CQL How to do

2011-06-06 Thread Nate McCall
It is specific to the Hector client API, but I just started on a guide that may be of some help, particularly in regards to column configuration and query encoding: https://github.com/rantav/hector/wiki/Using-CQL 2011/6/4 Yonder zy...@yahoo.com.cn: Hi, In Cassandra 0.8, CQL become the primary

multiple clusters communicating

2011-06-06 Thread Jeffrey Wang
Hey all, We're seeing a strange issue in which two completely separate clusters (0.7.3) on the same subnet (X.X.X.146 through X.X.X.150) with 3 machines (146-148) and 2 machines (149-150). Both of them are seeded with the respective machines in their cluster, yet when we run them they end up

Re: multiple clusters communicating

2011-06-06 Thread Jonathan Ellis
Set the internal port to be different. On Mon, Jun 6, 2011 at 7:01 PM, Jeffrey Wang jw...@palantir.com wrote: Hey all, We’re seeing a strange issue in which two completely separate clusters (0.7.3) on the same subnet (X.X.X.146 through X.X.X.150) with 3 machines (146-148) and 2 machines

Re: [RELEASE] 0.8.0

2011-06-06 Thread Terje Marthinussen
Yes, I am aware of it but it was not an alternative for this project which will face production soon. The patch I have is fairly non-intrusive (especially vs. 674) so I think it can be interesting depending on how quickly 674 will be integrated into cassandra releases. I plan to take a closer

Re: Troubleshooting IO performance ?

2011-06-06 Thread aaron morton
There is a big IO queue and reads are spending a lot of time in the queue. Some more questions: - what version are you on ? - what is the concurrent_reads config setting ? - what is nodetool tpstats showing during the slow down ? - exactly how much data are you asking for ? how many rows and

Installing Thrift with Solandra

2011-06-06 Thread Jean-Nicolas Boulay Desjardins
I am trying to install Thrift with Solandra. Normally when I just want to install Thrift with Cassandra, I followed this tutorial:https://wiki.fourkitchens.com/display/PF/Using+Cassandra+with+PHP But how can I do the same for Solandra? Thrift with PHP... Using Ubuntu Server. Thanks in

Re: Installing Thrift with Solandra

2011-06-06 Thread Jake Luciani
To access Cassandra in Solandra it's the same as regular cassandra. To access Solr you use one of the Php Solr libraries http://wiki.apache.org/solr/SolPHP On Mon, Jun 6, 2011 at 11:04 PM, Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com wrote: I am trying to install Thrift with

Backups, Snapshots, SSTable Data Files, Compaction

2011-06-06 Thread AJ
Hi, I am working on a backup strategy and am trying to understand what is going on in the data directory. I notice that after a write to a CF and then flush, a new set of data files are created with an index number incremented in their names, such as: Initially: Users-e-1-Filter.db

Re: Backups, Snapshots, SSTable Data Files, Compaction

2011-06-06 Thread Benjamin Coverston
Hi AJ, inline: On 6/6/11 11:03 PM, AJ wrote: Hi, I am working on a backup strategy and am trying to understand what is going on in the data directory. I notice that after a write to a CF and then flush, a new set of data files are created with an index number incremented in their names,

Re: Installing Thrift with Solandra

2011-06-06 Thread Jean-Nicolas Boulay Desjardins
Thanks again :) Ok... But in the tutorial it says that I need to build a Thrift interface for Cassandra: ./compiler/cpp/thrift -gen php ../PATH-TO-CASSANDRA/interface/cassandra.thrift How do I do this? Where is the interface folder? Again, tjake thanks allot for your time and help. On Mon,

Re: Installing Thrift with Solandra

2011-06-06 Thread Jean-Nicolas Boulay Desjardins
I just saw a post you made on Stackoverflow, where you said: The Solandra project which is replacing Lucandra no longer uses thrift, only Solr. So I use Solr to access my data in Cassandra? Thanks again... On Tue, Jun 7, 2011 at 1:39 AM, Jean-Nicolas Boulay Desjardins jnbdzjn...@gmail.com