Re: 15 seconds to increment 17k keys?

2011-09-06 Thread Oleg Anastastasyev
 in the family. There are millions of rows. Each operation consists of
 doing a batch_insert through pycassa, which increments ~17k keys. A
 majority of these keys are new in each batch.
 
  Each operation is taking up to 15 seconds. For our system this is a
 significant bottleneck.
 

Try to split your batch to smaller pieces and launch them in parallel. This way
you may get better performance, because all cores are employed and there will be
less copying/rebuilding of large structures inside thrift  cassandra. I found
that 1k rows in a batch is behaving better than 10k.

It is also a good idea to split batch to slices according to replication
strategy and communicate appropriate slice directly to its natural endpoint.
This will reduce neccessary intercommunication between nodes.






Re: Using 5-6 bytes for cassandra timestamps vs 8…

2011-09-06 Thread Oleg Anastastasyev

 
 I have a patch for trunk which I just have to get time to test a bit before I
submit.
 It is for super columns and will use the super columns timestamp as the base
and only store variant encoded offsets in the underlying columns. 
 

Could you please measure how much real benefit it brings (in real RAM
consumption by JVM). It is hard to tell will it give noticeable results or not.
AFAIK memory structures used for memtable consume much more memory. And 64-bit
JVM allocates memory aligned to 64-bit word boundary. So 37% of memory
consumption reduction looks doubtful.




[RELEASE] Apache Cassandra 0.7.9

2011-09-06 Thread Eric Evans
I'm pleased to announce the release of Cassandra 0.7.9.

0.7.9 contains a number of important bug-fixes (full list here[1]),
and should be an easy upgrade from previous 0.7 releases.

Source and binary distributions are available from the Downloads
page[3], and users of Debian and derivative distros can
install/upgrade in the usual manner[4].

If you spot any problems be sure to submit an issue[5], and if you
have any questions, don't hesitate to ask[6].

Thanks!

[1]: http://goo.gl/uenGn (CHANGES.txt)
[2]: http://goo.gl/AQ2KY (NEWS.txt)
[3]: http://cassandra.apache.org/download
[4]: http://wiki.apache.org/cassandra/DebianPackaging
[5]: https://issues.apache.org/jira/browse/CASSANDRA
[6]: user-subscr...@cassandra.apache.org

-- 
Eric Evans
Acunu | http://www.acunu.com | @acunu


Re: Why no need to query all nodes on secondary index lookup?

2011-09-06 Thread Kaj Magnus Lindberg
Hi Martin

Yes that was helpful, thanks

(I had no idea you were reading the Cassandra users list!  :-)  )

Thanks, (Kaj) Magnus L


On Mon, Sep 5, 2011 at 10:57 PM, Martin von Zweigbergk
martin.von.zweigbe...@gmail.com wrote:
 Hi Magnus,

 I think the answer might be on
 https://issues.apache.org/jira/browse/CASSANDRA-749. For example,
 Jonathan writes:

 quote
 Is it worth creating a secondary index that only contains local data, versus 
 a distributed secondary index (a normal ColumnFamily?)

 I think my initial reasoning was wrong here. I was anti-local-indexes
 because we have to query the full cluster for any index lookup, since
 we are throwing away our usual partitioning scheme.

 Which is true, but it ignores the fact that, in most cases, you will
 have to query the full cluster to get the actual matching rows, b/c
 the indexed rows will be spread across all machines. So, having local
 indexes is better in the common case, since it actually saves a round
 trip from querying a the index to querying the rows.

 Also, having each node index the rows it has locally means you don't
 have to worry about sharding a very large index since it happens
 automatically.

 Finally, it lets us use the local commitlog to keep index + data in sync.
 /quote

 Hope that helps,
 Martin

 On Mon, Sep 5, 2011 at 1:52 AM, Kaj Magnus Lindberg
 kajmagnu...@gmail.com wrote:
 Hi,

 (This is the 2nd time I'm sending this message. I sent it the first
 time a few days ago but it does not appear in the archives.)

 I have a follow up question on a question from February 2011. In
 short, I wonder why one won't have to query all Cassandra nodes when
 doing a secondary index lookup -- although each node only indexes data
 that it holds locally.

 The question and answer was:
  ( http://www.mail-archive.com/user@cassandra.apache.org/msg10506.html  )
 === Question ===
 As far as I understand automatic secondary indexes are generated for
 node local data.
   In this case query by secondary index involve all nodes storing part of
 column family to get results (?) so (if i am right) if data is spread across
 50 nodes then 50 nodes are involved in single query?
 [...]
 === Answer ===
 In practice, local secondary indexes scale to {RF * the limit of a single
 machine} for -low cardinality- values (ex: users living in a certain state)
 since the first node is likely to be able to answer your question. This also
 means they are good for performing filtering for analytics.
 [...]

 === Now I wonder ===
 Why would the first node be likely to be able to answer the question?
 It stores only index entries for users on that particular machine,
     (says http://wiki.apache.org/cassandra/SecondaryIndexes:
     Each node only indexes data that it holds locally )
 but users might be stored by user name? And would thus be stored on
 many different machines? Even if they happen to live in the same
 state?

 Why won't the client need to query the indexes of [all servers that
 store info on users] to find all relevant users, when doing a user
 property lookup?


 Best regards, KajMagnus




CQL and schema-less column family

2011-09-06 Thread osishkin osishkin
Sorry for the newbie question but I failed to find a clear answer.
Can CQL be used to query a schema-less column family? can they be indexed?
That is, query for column names that do not necessarily exist in all
rows, and were not defined in advance when the column family was
created.

Thank you


Reg Cassandra load balance

2011-09-06 Thread Thamizh

Hi All,
I am using Cassandra-0.7.8 on cluster of 4 machines. I have uploaded some files 
using Map/Reduce.
It looks files got distributed only among 2 nodes. When I used RF=3 it had got 
distributed to equally 4 nodes on below configuration.

Here are some config info's:

1. ByteOrderedPartitioner
2. Replication Factor = 1 (since, I have storage issue  RF will be increased 
later )
3. initial token - value has not been set.
4. create keyspace ipinfo with replication_factor = 1 and placement_strategy = 
'org.apache.cassandra.locator.SimpleStrategy';

[cassandra@cassandra01 apache-cassandra-0.7.8]$ bin/nodetool -h 172.27.10.131 
ring
Address Status State   Load    Owns    
Token   
   
Token(bytes[fddfd9bae90f0836cd9bff20b27e3c04])
172.27.10.132   Up Normal  11.92 GB    25.00%  
Token(bytes[3ddfd9bae90f0836cd9bff20b27e3c04])
172.27.15.80    Up Normal  10.21 GB    25.00%  
Token(bytes[7ddfd9bae90f0836cd9bff20b27e3c04])
172.27.10.131   Up Normal  54.34 KB    25.00%  
Token(bytes[bddfd9bae90f0836cd9bff20b27e3c04])
172.27.15.78    Up Normal  58.79 KB    25.00%  
Token(bytes[fddfd9bae90f0836cd9bff20b27e3c04])


Can you suggest me how shall I make load balance on my cluster.

Regards,
Thamizhannal

Re: Reg Cassandra load balance

2011-09-06 Thread Radim Kolar

switch to random (hash) partitioner

OR

move tokens from your empty nodes into different position in ring; split 
your full nodes in half.  ring will then look like: owns 14% 14% 14% 
rest of ring.


new to github: Casbase: distributed secondary indexes for Cassandra

2011-09-06 Thread Edward Capriolo
https://github.com/edwardcapriolo/casbase

What is it?

There are many great articles about building secondary Cassandra indexes
such as http://www.anuff.com/2011/02/indexing-in-cassandra.html. In a
nutshell, index building boils down to turning a single insert into multiple
inserts to support different types of searches. Casbase attempts to make
this 'friendly' and reusable. It is made friendly by allowing the user to
define Tables and Indexes, then when the insert method is called, Casbase
takes care of updating all the indexes.

String tablename=ncars;
Table t = new Table();
t.name = tablename;
t.columns.add(new Col(vin.getBytes(),Col.ColType.LONG,false));
Index i = new Index();
i.columns.add(vin.getBytes());
i.it= Index.IndexType.ORDERED_BUCKETS;
i.indexOptions=3;
i.name = vinidx;
t.key = new Col(key.getBytes(),Col.ColType.BYTES,false);

db.create(t);

for (int k=0;k7;k++){
   Mapbyte [],byte[] cols = new HashMapbyte[],byte[]();
   cols.put(make.getBytes(),honda.getBytes());
   cols.put(model.getBytes(),civic.getBytes());
   cols.put(vin.getBytes(),CasBaseUtil.longToBytes(k));
   db.insert(tablename, (car+k).getBytes(), cols);
}

Casbase is related to/a hybrid of:

https://github.com/edanuff/CassandraIndexedCollections
https://github.com/riptano/Cassandra-EZ-Client
https://github.com/rantav/hector/wiki/Hector-Object-Mapper-%28HOM%29

There are currently two secondary index implementations HASH and
ORDERED_BUCKETS. The ORDERED_BUCKETS implementation uses composite columns
and sharding to allow !distributed ranged queries! on a index (ie something
like. 'where column  5 and column  7' ).
Dragons: Yes, distributed secondary via ORDERED_BUCKETS involves get_slice
on N buckets on read path (you can also multi_getslice as well). Yes,
distributed indexes are not fast like local indexes are, but they are what
they are.

Status:
Code is still in an academic phase. It started in the last week and as
evidenced by my 50 commits this holiday it is not stable either. Have fun.
Stay tuned.


Re: commodity server spec

2011-09-06 Thread China Stoffen
In general, more smaller is better than fewer big. Probably go for
what's cost-effective.

Cost effective solution is few and fat servers because it also saves hosting 
cost.



The exception to that would be if you're truly only caring about
writes and have *very* few reads that are not latency critical (so
you're okay with waiting for several disk seeks on reads and the
number of reads is low enough that serving them from platters will
work). In such cases it might make sense to have fewer Big Fat
Machines with lots of memory and a lot of disk space. But... even so.
I would not recommend huge 48 tb nodes... unless you really know what
you're doing.

I want writes should be as fast as possible but reads are not necessary to be 
in milliseconds. 
If you don't recommend 48tb then how much max disk space I can go with?






- Original Message -
From: Peter Schuller peter.schul...@infidyne.com
To: user@cassandra.apache.org; China Stoffen chinastof...@yahoo.com
Cc: 
Sent: Saturday, September 3, 2011 1:08 PM
Subject: Re: commodity server spec

 Is there any recommendation about commodity server hardware specs if 100TB
 database size is expected and its heavily write application.
 Should I got with high powered CPU (12 cores) and 48TB HDD and 640GB RAM and
 total of 3 servers of this spec. Or many smaller commodity servers are
 recommended?

In general, more smaller is better than fewer big. Probably go for
what's cost-effective.

In your case, 100 TB is *quite* big. I would definitely recommend
against doing anything like your 3 server setup. You'll probably want
100-1000 small servers.

The exception to that would be if you're truly only caring about
writes and have *very* few reads that are not latency critical (so
you're okay with waiting for several disk seeks on reads and the
number of reads is low enough that serving them from platters will
work). In such cases it might make sense to have fewer Big Fat
Machines with lots of memory and a lot of disk space. But... even so.
I would not recommend huge 48 tb nodes... unless you really know what
you're doing.

In reality, more information about your use-case would be required to
offer terribly useful advice.

-- 
/ Peter Schuller (@scode on twitter)


Professional Support

2011-09-06 Thread China Stoffen
There is a link to a page which lists few professional support providers on 
Cassandra homepage. I have contacted few of them and couple are just out of 
providing support and others didn't reply. So, do you know about any 
professional support provider for Cassandra solutions and how much they charge 
per year?


Re: Professional Support

2011-09-06 Thread Jim Ancona
We use Datastax (http://www.datastax.com) and we have been very happy
with the support we've received.

We haven't tried any of the other providers on that page, so I can't
comment on them.

Jim
(Disclaimer: no connection with Datastax other than as a satisfied customer.)

On Tue, Sep 6, 2011 at 1:15 PM, China Stoffen chinastof...@yahoo.com wrote:
 There is a link to a page which lists few professional support providers on
 Cassandra homepage. I have contacted few of them and couple are just out of
 providing support and others didn't reply. So, do you know about any
 professional support provider for Cassandra solutions and how much they
 charge per year?



Re: Professional Support

2011-09-06 Thread William Oberman
I also have used datastax with great success (same disclaimer).

A specific example:
-I setup a one-on-one call to talk through an issue, in my case a server
reconfiguration.  It took 2 days to find a time to meet, though that was my
fault as I believe they could have worked me in within a day.  I wanted to
split an existing cluster into 'oltp' and 'analytics', similar to what brisk
does now out of the box.
-During the call they walked me through all of the steps I'd have to do,
answered any questions I had, and filled in the blanks for some of the
reasoning behind their recommendations.
-After the call I recieved constant support through the reconfiguration.
 For example: I found out that Ec2Snitch doesn't play nicely with
PropertyFileSnitch in a rolling restart (all of the Ec2Snitch based servers
stopped working immediately as soon as a PropFileSnitch server joined the
ring, this is in 0.8.4), and they wrote a custom patch for me that made it
work within a day.
-In particular, Ben and Jackson helped me, so if either of you read the user
list, thanks again!

will

On Tue, Sep 6, 2011 at 1:25 PM, Jim Ancona j...@anconafamily.com wrote:

 We use Datastax (http://www.datastax.com) and we have been very happy
 with the support we've received.

 We haven't tried any of the other providers on that page, so I can't
 comment on them.

 Jim
 (Disclaimer: no connection with Datastax other than as a satisfied
 customer.)

 On Tue, Sep 6, 2011 at 1:15 PM, China Stoffen chinastof...@yahoo.com
 wrote:
  There is a link to a page which lists few professional support providers
 on
  Cassandra homepage. I have contacted few of them and couple are just out
 of
  providing support and others didn't reply. So, do you know about any
  professional support provider for Cassandra solutions and how much they
  charge per year?
 



Re: Professional Support

2011-09-06 Thread China Stoffen
Thanks for sharing the info.. though I contacted datastax using contact form 
but no reply yet after more than a week.
Probably I need to contact Ben directly.




From: Ben Coverston ben.covers...@datastax.com
To: user@cassandra.apache.org
Sent: Tuesday, September 6, 2011 11:15 PM
Subject: Re: Professional Support


We were glad to help William, thanks for sharing!


On Tue, Sep 6, 2011 at 12:13 PM, William Oberman ober...@civicscience.com 
wrote:

I also have used datastax with great success (same disclaimer).  


A specific example:

-I setup a one-on-one call to talk through an issue, in my case a server 
reconfiguration.  It took 2 days to find a time to meet, though that was my 
fault as I believe they could have worked me in within a day.  I wanted to 
split an existing cluster into 'oltp' and 'analytics', similar to what brisk 
does now out of the box.
-During the call they walked me through all of the steps I'd have to do, 
answered any questions I had, and filled in the blanks for some of the 
reasoning behind their recommendations.
-After the call I recieved constant support through the reconfiguration.  For 
example: I found out that Ec2Snitch doesn't play nicely with 
PropertyFileSnitch in a rolling restart (all of the Ec2Snitch based servers 
stopped working immediately as soon as a PropFileSnitch server joined the 
ring, this is in 0.8.4), and they wrote a custom patch for me that made it 
work within a day.
-In particular, Ben and Jackson helped me, so if either of you read the user 
list, thanks again!


will



On Tue, Sep 6, 2011 at 1:25 PM, Jim Ancona j...@anconafamily.com wrote:

We use Datastax (http://www.datastax.com) and we have been very happy
with the support we've received.

We haven't tried any of the other providers on that page, so I can't
comment on them.

Jim
(Disclaimer: no connection with Datastax other than as a satisfied customer.)


On Tue, Sep 6, 2011 at 1:15 PM, China Stoffen chinastof...@yahoo.com wrote:
 There is a link to a page which lists few professional support providers on
 Cassandra homepage. I have contacted few of them and couple are just out of
 providing support and others didn't reply. So, do you know about any
 professional support provider for Cassandra solutions and how much they
 charge per year?





Calculate number of nodes required based on data

2011-09-06 Thread Hefeng Yuan
Hi,

Is there any suggested way of calculating number of nodes needed based on data? 

We currently have 6 nodes (each has 8G memory) with RF5 (because we want to be 
able to survive loss of 2 nodes).
The flush of memtable happens around every 30 min (while not doing compaction), 
with ~9m serialized bytes.

The problem is that we see more than 3 nodes doing compaction at the same time, 
which slows down the application.
(tried to increase/decrease compaction_throughput_mb_per_sec, not helping much)

So I'm thinking probably we should add more nodes, but not sure how many more 
to add. 
Based on the data rate, is there any suggested way of calculating number of 
nodes required?

Thanks,
Hefeng

Re: commodity server spec

2011-09-06 Thread Bill

Mongodb, last time I looked does not scale horizontally.

I've seen reasonable behavour putting Cassandra database tables onto 
remote filers, but you absolutely have to test against the SAN 
configuration and carefully manage things like concurrent reader/writer 
settings, the fs and cassandra caches, etc. You generally won't be 
recommended to use a NAS/SAN for this class of system.


The commitlogs work best on attached (dedicated) disk.

Bill

On 04/09/11 14:08, China Stoffen wrote:

Then what will be the sweetspot for Cassandra? I am more interested in
Cassandra because my application is write heavy.

Till now what I have understood is that Cassandra will not work best for
SANs too?

P.S
Mongodb is also a nosql database and designed for horizontal scaling
then how its good for the same hardware for which Cassandra is not a
good candidate?


- Original Message -
From: Bill b...@dehora.net
To: user@cassandra.apache.org
Cc:
Sent: Sunday, September 4, 2011 4:34 AM
Subject: Re: commodity server spec

[100% agree with Chris]

China, the machines you're describing sound nice for
mongodb/postgres/mysql, but probably not the sweetspot for Cassandra.

Obviously (well depending on near term load) you don't want to get
burned on excess footprint. But a realistic, don't lose data, be fairly
available deployment is going to span at least 2 racks/power supplies
and have data replicated offsite (at least as passive for DR). So I
would consider 6-9 relatively weaker servers rather than 3 scale up
joints. You'll save some capex, and the amount of opex overhead is
probably worth it traded off against the operational risk. 3 is an
awkward number to operate for anything that needs to be available
(although many people seem to start with that, I am guessing because
triplication is traditionally understood under failure) as it
immediately puts 50% extra load on the remaining 2 when one node goes
away. One will go away, even transiently, when it is upgraded, crashes,
gets into a funk due to compaction or garbage collection, and load will
then be shunted onto the other 2 - remember Cassandra has no
backoff/throttling in place. I'd allow for something breaking at some
point (dbs even the mature ones, fail from time to time) and 2 doesn't
give you much room to maneuver in production.

Bill


On 03/09/11 23:05, Chris Goffinet wrote:
  It will also depend on how long you can handle recovery time. So imagine
  this case:
 
  3 nodes w/ RF of 3
  Each node has 30TB of space used (you never want to fill up entire node).
  If one node fails and you must recover, that will take over 3.6 days in
  just transferring data alone. That's with a sustained 800megabit/s
  (100MB/s). In the real world it's going to fluctuate so add some
  padding. Also, since you will be saturating one of the other nodes, now
  you're network latency performance is suffering and you only have 1
  machine to handle the remaining traffic while you're recovering. And if
  you want to expand the cluster in the future (more nodes), the amount of
  data to transfer is going to be very large and most likely days to add
  machines. From my experience it's must better to have a larger cluster
  setup upfront for future growth than getting by with 6-12 nodes at the
  start. You will feel less pain, easier to manage node failures (bad
  disks, mem, etc).
 
  3 nodes with RF of 1 wouldn't make sense.
 
 
  On Sat, Sep 3, 2011 at 4:05 AM, China Stoffen chinastof...@yahoo.com
mailto:chinastof...@yahoo.com
  mailto:chinastof...@yahoo.com mailto:chinastof...@yahoo.com wrote:
 
  Many small servers would drive up the hosting cost way too high so
  want to avoid this solution if we can.
 
  - Original Message -
  From: Radim Kolar h...@sendmail.cz mailto:h...@sendmail.cz
mailto:h...@sendmail.cz mailto:h...@sendmail.cz
  To: user@cassandra.apache.org mailto:user@cassandra.apache.org
mailto:user@cassandra.apache.org mailto:user@cassandra.apache.org
  Cc:
  Sent: Saturday, September 3, 2011 9:37 AM
  Subject: Re: commodity server spec
 
  many smaller servers way better
 
 





Re: Cassandra client loses connectivity to cluster

2011-09-06 Thread Jim Ancona
Since we finally fixed this issue, I thought I'd document the
solution, with the hope that it makes it easier for others who might
run into it.

During the time this issue was occurring Anthony Ikeda reported a very
similar issue, although without the strange pattern of occurrences we
saw: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Trying-to-find-the-problem-with-a-broken-pipe-td6645526.html

It turns out that our problem was the same as Anthony's: exceeding
Thrift's maximum frame size, as set by
thrift_framed_transport_size_in_mb in cassandra.yaml. This problem was
extremely hard to troubleshoot, for the following reasons:

* TFramedTransport responds to an oversized frame by throwing a
TTransportException, which is a generic exception thrown for various
types of network or protocol errors. Because such errors are common,
many servers (TSimpleServer, TThreadPoolServer, and Cassandra's
CustomTThreadPoolServer) swallow TTransportException without a log
message. I've filed https://issues.apache.org/jira/browse/THRIFT-1323
and https://issues.apache.org/jira/browse/CASSANDRA-3142 to address
the lack of logging. (We finally found the issue by adding logging
code to our production Cassandra deploy. The fact that we could do
that it a big win for open source.)
* After the TTransportException occurs, the server closes the
underlying socket. To the client (Hector in our case), this appears as
broken socket, most likely caused by a network problem or failed
server node. Hector responds by marking the server node down and
retrying the too-large request on another node, where it also fails.
This process repeated leads to the entire cluster being marked down
(see https://github.com/rantav/hector/issues/212).
* Ideally, sending an oversized frame should trigger a recognizable
error on the client, so that the client knows that it has made a error
and avoids compounding the mistake by retrying. Thrift's framed
transport is pretty simple and I assume there isn't a good way for the
server to communicate the error to the client. As a second-best
solution, I've logged a bug against Thrift
(https://issues.apache.org/jira/browse/THRIFT-1324) saying that
TFramedTransport should enforce the configured frame size limit on
writes. At least that way people can avoid the issue by configuring a
client frame size to match their servers'. If that is implemented then
clients like Hector will be able to detect the frame too large case
and return an error instead of retrying it.

In addition to the issues above, some other things made this issue
difficult to solve:
* The pattern of occurrences (only occurring at a certain time of day,
on a single server at a time, only on weekdays, etc.) was something of
a distraction.
* Even after finding out that Anthony's similar problem was caused by
an oversized frame, I was convinced that we could not be generating an
operation large enough to exceed the configured value (15 mb).

It turns out that I was almost right: out of our hundreds of thousands
of customers, exactly one was working with data that large, and that
customer was doing so not because of anomalous behavior on their part,
but because of a bug in our system. So the fact that it was a single
customer explained the regular occurrences, and the bug explained the
unexpectedly large data size. Of course in this case almost right
wasn't good enough, my BOTE calculation failed to take the bug into
account. Plus, as I tweeted immediately after I figured out what was
going on, Lesson: when you have millions of users it becomes easier
to say things about averages, but harder to do the same for extremes.

Jim

On Wed, Jun 29, 2011 at 5:42 PM, Jim Ancona j...@anconafamily.com wrote:
 In reviewing client logs as part of our Cassandra testing, I noticed
 several Hector All host pools marked down exceptions in the logs.
 Further investigation showed a consistent pattern of
 java.net.SocketException: Broken pipe and java.net.SocketException:
 Connection reset messages. These errors occur for all 36 hosts in the
 cluster over a period of seconds, as Hector tries to find a working
 host to connect to. Failing to find a host results in the All host
 pools marked down messages. These messages recur for a period ranging
 from several seconds up to almost 15 minutes, clustering around two to
 three minutes. Then connectivity returns and when Hector tries to
 reconnect it succeeds.

 The clients are instances of a JBoss 5 web application. We use Hector
 0.7.0-29 (plus a patch that was pulled in advance of -30) The
 Cassandra cluster has 72 nodes split between two datacenters. It's
 running 0.7.5 plus a couple of bug fixes pulled in advance of 0.7.6.
 The keyspace uses NetworkTopologyStrategy and RF=6 (3 in each
 datacenter). The clients are reading and writing at LOCAL_QUORUM to
 the 36 nodes in their own data center. Right now the second datacenter
 is for failover only, so there are no clients actually writing there.

 There's 

Re: Cassandra client loses connectivity to cluster

2011-09-06 Thread Jonathan Ellis
Thanks for the followup, Jim!

We'll review https://issues.apache.org/jira/browse/CASSANDRA-3142 shortly.

On Tue, Sep 6, 2011 at 2:58 PM, Jim Ancona j...@anconafamily.com wrote:
 Since we finally fixed this issue, I thought I'd document the
 solution, with the hope that it makes it easier for others who might
 run into it.

 During the time this issue was occurring Anthony Ikeda reported a very
 similar issue, although without the strange pattern of occurrences we
 saw: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Trying-to-find-the-problem-with-a-broken-pipe-td6645526.html

 It turns out that our problem was the same as Anthony's: exceeding
 Thrift's maximum frame size, as set by
 thrift_framed_transport_size_in_mb in cassandra.yaml. This problem was
 extremely hard to troubleshoot, for the following reasons:

 * TFramedTransport responds to an oversized frame by throwing a
 TTransportException, which is a generic exception thrown for various
 types of network or protocol errors. Because such errors are common,
 many servers (TSimpleServer, TThreadPoolServer, and Cassandra's
 CustomTThreadPoolServer) swallow TTransportException without a log
 message. I've filed https://issues.apache.org/jira/browse/THRIFT-1323
 and https://issues.apache.org/jira/browse/CASSANDRA-3142 to address
 the lack of logging. (We finally found the issue by adding logging
 code to our production Cassandra deploy. The fact that we could do
 that it a big win for open source.)
 * After the TTransportException occurs, the server closes the
 underlying socket. To the client (Hector in our case), this appears as
 broken socket, most likely caused by a network problem or failed
 server node. Hector responds by marking the server node down and
 retrying the too-large request on another node, where it also fails.
 This process repeated leads to the entire cluster being marked down
 (see https://github.com/rantav/hector/issues/212).
 * Ideally, sending an oversized frame should trigger a recognizable
 error on the client, so that the client knows that it has made a error
 and avoids compounding the mistake by retrying. Thrift's framed
 transport is pretty simple and I assume there isn't a good way for the
 server to communicate the error to the client. As a second-best
 solution, I've logged a bug against Thrift
 (https://issues.apache.org/jira/browse/THRIFT-1324) saying that
 TFramedTransport should enforce the configured frame size limit on
 writes. At least that way people can avoid the issue by configuring a
 client frame size to match their servers'. If that is implemented then
 clients like Hector will be able to detect the frame too large case
 and return an error instead of retrying it.

 In addition to the issues above, some other things made this issue
 difficult to solve:
 * The pattern of occurrences (only occurring at a certain time of day,
 on a single server at a time, only on weekdays, etc.) was something of
 a distraction.
 * Even after finding out that Anthony's similar problem was caused by
 an oversized frame, I was convinced that we could not be generating an
 operation large enough to exceed the configured value (15 mb).

 It turns out that I was almost right: out of our hundreds of thousands
 of customers, exactly one was working with data that large, and that
 customer was doing so not because of anomalous behavior on their part,
 but because of a bug in our system. So the fact that it was a single
 customer explained the regular occurrences, and the bug explained the
 unexpectedly large data size. Of course in this case almost right
 wasn't good enough, my BOTE calculation failed to take the bug into
 account. Plus, as I tweeted immediately after I figured out what was
 going on, Lesson: when you have millions of users it becomes easier
 to say things about averages, but harder to do the same for extremes.

 Jim

 On Wed, Jun 29, 2011 at 5:42 PM, Jim Ancona j...@anconafamily.com wrote:
 In reviewing client logs as part of our Cassandra testing, I noticed
 several Hector All host pools marked down exceptions in the logs.
 Further investigation showed a consistent pattern of
 java.net.SocketException: Broken pipe and java.net.SocketException:
 Connection reset messages. These errors occur for all 36 hosts in the
 cluster over a period of seconds, as Hector tries to find a working
 host to connect to. Failing to find a host results in the All host
 pools marked down messages. These messages recur for a period ranging
 from several seconds up to almost 15 minutes, clustering around two to
 three minutes. Then connectivity returns and when Hector tries to
 reconnect it succeeds.

 The clients are instances of a JBoss 5 web application. We use Hector
 0.7.0-29 (plus a patch that was pulled in advance of -30) The
 Cassandra cluster has 72 nodes split between two datacenters. It's
 running 0.7.5 plus a couple of bug fixes pulled in advance of 0.7.6.
 The keyspace uses NetworkTopologyStrategy and RF=6 

Re: UnavailableException while storing with EACH_QUORUM and RF=3

2011-09-06 Thread Anthony Ikeda
Jonathan, do you know when 0.8.5 will be released? We are looking at a
production deployment soon and this fix is something that we would need.

Alternatively, what is the stability of the trunk for a production
deployment.

Anthony

On Mon, Sep 5, 2011 at 3:35 PM, Evgeniy Ryabitskiy 
evgeniy.ryabits...@wikimart.ru wrote:

 great thanks!

 Evgeny.



Re: UnavailableException while storing with EACH_QUORUM and RF=3

2011-09-06 Thread Jonathan Ellis
0.8.5 is being voted on now on the dev list.  I'd encourage you to test it.

I do not recommend running trunk.

On Tue, Sep 6, 2011 at 5:32 PM, Anthony Ikeda
anthony.ikeda@gmail.com wrote:
 Jonathan, do you know when 0.8.5 will be released? We are looking at a
 production deployment soon and this fix is something that we would need.
 Alternatively, what is the stability of the trunk for a production
 deployment.
 Anthony
 On Mon, Sep 5, 2011 at 3:35 PM, Evgeniy Ryabitskiy
 evgeniy.ryabits...@wikimart.ru wrote:

 great thanks!

 Evgeny.





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Re: UnavailableException while storing with EACH_QUORUM and RF=3

2011-09-06 Thread Anthony Ikeda
Thanks Jonathan, I'll consult with the team.

Anthony

On Tue, Sep 6, 2011 at 3:34 PM, Jonathan Ellis jbel...@gmail.com wrote:

 0.8.5 is being voted on now on the dev list.  I'd encourage you to test it.

 I do not recommend running trunk.

 On Tue, Sep 6, 2011 at 5:32 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
  Jonathan, do you know when 0.8.5 will be released? We are looking at a
  production deployment soon and this fix is something that we would need.
  Alternatively, what is the stability of the trunk for a production
  deployment.
  Anthony
  On Mon, Sep 5, 2011 at 3:35 PM, Evgeniy Ryabitskiy
  evgeniy.ryabits...@wikimart.ru wrote:
 
  great thanks!
 
  Evgeny.
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



Re: UnavailableException while storing with EACH_QUORUM and RF=3

2011-09-06 Thread Anthony Ikeda
Do you have a link to the downloadable?

Anthony


On Tue, Sep 6, 2011 at 3:38 PM, Anthony Ikeda
anthony.ikeda@gmail.comwrote:

 Thanks Jonathan, I'll consult with the team.

 Anthony


 On Tue, Sep 6, 2011 at 3:34 PM, Jonathan Ellis jbel...@gmail.com wrote:

 0.8.5 is being voted on now on the dev list.  I'd encourage you to test
 it.

 I do not recommend running trunk.

 On Tue, Sep 6, 2011 at 5:32 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
  Jonathan, do you know when 0.8.5 will be released? We are looking at a
  production deployment soon and this fix is something that we would need.
  Alternatively, what is the stability of the trunk for a production
  deployment.
  Anthony
  On Mon, Sep 5, 2011 at 3:35 PM, Evgeniy Ryabitskiy
  evgeniy.ryabits...@wikimart.ru wrote:
 
  great thanks!
 
  Evgeny.
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com





Re: UnavailableException while storing with EACH_QUORUM and RF=3

2011-09-06 Thread Jonathan Ellis
It's linked from the vote thread:
http://mail-archives.apache.org/mod_mbox/cassandra-dev/201109.mbox/%3ccakkz8q12k2o7zm5uy9hxnk7kyesqidwcyxbq_uzfna+yaty...@mail.gmail.com%3E

On Tue, Sep 6, 2011 at 5:41 PM, Anthony Ikeda
anthony.ikeda@gmail.com wrote:
 Do you have a link to the downloadable?
 Anthony

 On Tue, Sep 6, 2011 at 3:38 PM, Anthony Ikeda anthony.ikeda@gmail.com
 wrote:

 Thanks Jonathan, I'll consult with the team.
 Anthony

 On Tue, Sep 6, 2011 at 3:34 PM, Jonathan Ellis jbel...@gmail.com wrote:

 0.8.5 is being voted on now on the dev list.  I'd encourage you to test
 it.

 I do not recommend running trunk.

 On Tue, Sep 6, 2011 at 5:32 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
  Jonathan, do you know when 0.8.5 will be released? We are looking at a
  production deployment soon and this fix is something that we would
  need.
  Alternatively, what is the stability of the trunk for a production
  deployment.
  Anthony
  On Mon, Sep 5, 2011 at 3:35 PM, Evgeniy Ryabitskiy
  evgeniy.ryabits...@wikimart.ru wrote:
 
  great thanks!
 
  Evgeny.
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com


Solved: NoSuchMethodError with google guava/collections starting embedded cassandra service

2011-09-06 Thread David Hawthorne
I ran into this problem today.  It's common enough that it shows up in google, 
but not common enough to have a documented resolution, so here's one.

[junit] 
com.google.common.collect.ImmutableSet.copyOf(Ljava/util/Collection;)Lcom/google/common/collect/ImmutableSet;
[junit] java.lang.NoSuchMethodError: 
com.google.common.collect.ImmutableSet.copyOf(Ljava/util/Collection;)Lcom/google/common/collect/ImmutableSet;
[junit] at 
org.apache.cassandra.db.DataTracker$View.replace(DataTracker.java:479)
[junit] at 
org.apache.cassandra.db.DataTracker.replace(DataTracker.java:248)
[junit] at 
org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:219)
[junit] at 
org.apache.cassandra.db.ColumnFamilyStore.init(ColumnFamilyStore.java:294)
[junit] at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:466)
[junit] at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:436)
[junit] at org.apache.cassandra.db.Table.initCf(Table.java:369)
[junit] at org.apache.cassandra.db.Table.init(Table.java:306)
[junit] at org.apache.cassandra.db.Table.open(Table.java:111)
[junit] at 
org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:212)
[junit] at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:149)
[junit] at 
org.apache.cassandra.service.AbstractCassandraDaemon.init(AbstractCassandraDaemon.java:237)
[junit] at 
org.apache.cassandra.service.EmbeddedCassandraService.start(EmbeddedCassandraService.java:57)

Resolution:

I was creating a monolithic jar file that takes the contents of all component 
jar files and puts them into one big jar, to simplify deployment of the 
cassandra client I'm writing.  One of the component jars was also a monolithic 
jar and had the contents of google-collections in it. guava is a superset of 
google collections, and that google-collections contents was overwriting the 
expansion of the guava jar.

Removing the component jar file (which was an option in this case) solved the 
problem.




Re: UnavailableException while storing with EACH_QUORUM and RF=3

2011-09-06 Thread Anthony Ikeda
Thanks Jonathan.

On Tue, Sep 6, 2011 at 3:53 PM, Jonathan Ellis jbel...@gmail.com wrote:

 It's linked from the vote thread:

 http://mail-archives.apache.org/mod_mbox/cassandra-dev/201109.mbox/%3ccakkz8q12k2o7zm5uy9hxnk7kyesqidwcyxbq_uzfna+yaty...@mail.gmail.com%3E

 On Tue, Sep 6, 2011 at 5:41 PM, Anthony Ikeda
 anthony.ikeda@gmail.com wrote:
  Do you have a link to the downloadable?
  Anthony
 
  On Tue, Sep 6, 2011 at 3:38 PM, Anthony Ikeda 
 anthony.ikeda@gmail.com
  wrote:
 
  Thanks Jonathan, I'll consult with the team.
  Anthony
 
  On Tue, Sep 6, 2011 at 3:34 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  0.8.5 is being voted on now on the dev list.  I'd encourage you to test
  it.
 
  I do not recommend running trunk.
 
  On Tue, Sep 6, 2011 at 5:32 PM, Anthony Ikeda
  anthony.ikeda@gmail.com wrote:
   Jonathan, do you know when 0.8.5 will be released? We are looking at
 a
   production deployment soon and this fix is something that we would
   need.
   Alternatively, what is the stability of the trunk for a production
   deployment.
   Anthony
   On Mon, Sep 5, 2011 at 3:35 PM, Evgeniy Ryabitskiy
   evgeniy.ryabits...@wikimart.ru wrote:
  
   great thanks!
  
   Evgeny.
  
  
 
 
 
  --
  Jonathan Ellis
  Project Chair, Apache Cassandra
  co-founder of DataStax, the source for professional Cassandra support
  http://www.datastax.com
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of DataStax, the source for professional Cassandra support
 http://www.datastax.com



create ks failure with latest github source

2011-09-06 Thread Yang
I pulled the latest code from
github 3e77792d31344be0253c89355c1d96ffe03c0659

and used my old commands to create a regular KS, and it failed.
quick debugging shows that the client hits an NPE somewhere.

am I missing some new mandatory args ?

Thanks
Yang


[default@unknown] connect localhost/9160;
Connected to: Test Cluster on localhost/9160
[default@unknown] create keyspace blah  with strategy_options =
[{replication_factor:1}] and placement_strategy =
'org.apache.cassandra.locator.SimpleStrategy';
null
[default@unknown]


Re: create ks failure with latest github source

2011-09-06 Thread Yang
ok, it seems that the '[ ]' should not be there now


On Tue, Sep 6, 2011 at 4:56 PM, Yang tedd...@gmail.com wrote:

 I pulled the latest code from
 github 3e77792d31344be0253c89355c1d96ffe03c0659

 and used my old commands to create a regular KS, and it failed.
 quick debugging shows that the client hits an NPE somewhere.

 am I missing some new mandatory args ?

 Thanks
 Yang


 [default@unknown] connect localhost/9160;
 Connected to: Test Cluster on localhost/9160
 [default@unknown] create keyspace blah  with strategy_options =
 [{replication_factor:1}] and placement_strategy =
 'org.apache.cassandra.locator.SimpleStrategy';
 null
 [default@unknown]




Cassandra 0.8.4 - doesn't support defining keyspaces in cassandra.yaml?

2011-09-06 Thread Roshan Dawrani
Hi,

I have just started the process of upgrading Cassandra from 0.7.2 to 0.8.4,
and I am facing some issues with embedded cassandra that we utilize in our
application.

With 0.7.2, we define our keyspace in cassandra.yaml and use Hector to give
us an embedded cassandra instance loaded with schema from cassandra.yaml. Is
it not possible to do the same with Cassandra / Hector 0.8.x?

Can someone throw some light please?

Thanks.

-- 
Roshan
Blog: http://roshandawrani.wordpress.com/
Twitter: @roshandawrani http://twitter.com/roshandawrani
Skype: roshandawrani