Re: Questions while evaluating Cassandra

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 6:43 AM, Eran Kutner e...@gigya.com wrote:
 Is the procedure described in the description of ticket CASSANDRA-44 really
 the way to do schema changes in the latest release? I'm not sure what's your
 thoughts about this but our experience is that every release of our software
 requires schema changes because we add new column families for indexes.

Yes, that is how it is for 0.5 and 0.6.  0.7 will add online schema
changes (i.e., fix -44), Gary is working on that now.

 Any idea on the timeframe for 0.7?

We are trying for 3-4 months, i.e. roughly the same as as our last 4 releases.

 Our application needs a lot of range scans. Is there anything being done to
 improve the poor range scan performance as reflected here:
 http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf ?

https://issues.apache.org/jira/browse/CASSANDRA-821 is open, also for
the 0.7 release.  Johan is working on this.

 What is the reason for the replication strategy with two DCs? As far as I
 understand it means that only one replica will exist in the second DC. It
 also means that quorum reads will fail when attempted on the second DC while
 the first DC is down. Am I missing something?

Yes:
 - That strategy is meant for doing reads w/ CL.ONE; it guarantees at
least one replica in each DC, for low latency with that CL
 -  Quorum is based on the whole cluster, not per-DC.
DatacenterShardStrategy will put multiple replicas in each DC, for use
with CL.DCQUORUM, that is, a majority of replicas in the same DC as
the coordinator node for the current request.  DCQOURUM is not yet
finished, though; currently it behaves the same as CL.ALL.

 Are there any plans to have a inter-cluster replication option? I mean
 having two clusters running in two DCs, each will be stand alone but they
 will replicate data between themselves.

No.  This is worse in every respect, since it means you get to
reinvent the existing repair, hinted handoff, etc code for when
replication breaks, poorly.

 This can avoid the problem mentioned
 above, as well as avoid the high cost of inter-DC traffic when doing
 Read-Repairs for every read.

Of course if you don't RR then you can read inconsistent data until
your next full repair.   Not a good trade.  Remember RR is done in the
background so the latency doesn't matter.

 From everything I've read I didn't understand if load balancing is local or
 global. In other words, what happens exactly when a new node is added? Will
 it only balance its two neighbors on the ring or will the re-balance
 propagate through the ring and all the nodes will be rebalanced evenly?

The former.  Cascading data moves around the ring is a Bad Idea.
(Since you read the Yahoo hbase/cassandra paper -- if hbase does this,
maybe that is why adding a new node basically kills their cluster for
several minutes?)

 I see that Hadoop support is coming in 0.6 but from following the ticket on
 Jira (CASSANDRA-342) I didn't understand if it will support the
 orderPreservingPartitioner or not.

It supports all partitioners.

 Do the clients have to be recompiled and deployed when a new version of
 Cassandra is deployed, or are new releases backward compatible?

The short answer is, we maintained backwards compatibility for 0.4 -
0.5 - 0.6, but we are going to break things in 0.7 moving from String
keys to byte[] and possibly other changes.

-Jonathan


Re: Adjusting Token Spaces and Rebalancing Data

2010-03-02 Thread Jon Graham
Hello,

I am running a 32-bit linux version 2.6.27.24. My original data set was
copied from a 64-bit cassandra cluster to a 32-bit cassandra cluster. I am
trying to load balance the data on a 32-bit cluster.

Is the cassandra-795 issue applicable for 32-linux too for the 0.5.0
release?

Thanks,
Jon
On Mon, Mar 1, 2010 at 4:55 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote:
  Reached an EOL or something bizzare occured. Reading from: /192.168.2.13
  BufferSizeRemaining: 16

 This one is harmless

  java.io.IOException: Value too large for defined data type
  at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
  at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
  at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
  at
 org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226)
  at
 org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
  Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)

 This one is killing you.

 Are you on windows?  If so
 https://issues.apache.org/jira/browse/CASSANDRA-795 should fix it.
 That's in both 0.5.1 and 0.6 beta.

 -Jonathan



Re: Adjusting Token Spaces and Rebalancing Data

2010-03-02 Thread Jonathan Ellis
Doing some googling, this is a different JRE bug than the on addressed
by 795: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6253145.
It is marked fixed in JDK 6u18, so try upgrading to that.

-Jonathan

On Tue, Mar 2, 2010 at 10:46 AM, Jon Graham sjclou...@gmail.com wrote:
 Hello,

 I am running a 32-bit linux version 2.6.27.24. My original data set was
 copied from a 64-bit cassandra cluster to a 32-bit cassandra cluster. I am
 trying to load balance the data on a 32-bit cluster.

 Is the cassandra-795 issue applicable for 32-linux too for the 0.5.0
 release?

 Thanks,
 Jon
 On Mon, Mar 1, 2010 at 4:55 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote:
  Reached an EOL or something bizzare occured. Reading from: /192.168.2.13
  BufferSizeRemaining: 16

 This one is harmless

  java.io.IOException: Value too large for defined data type
      at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
      at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
      at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
      at
  org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226)
      at
  org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55)
      at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
  Source)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
  Source)
      at java.lang.Thread.run(Unknown Source)

 This one is killing you.

 Are you on windows?  If so
 https://issues.apache.org/jira/browse/CASSANDRA-795 should fix it.
 That's in both 0.5.1 and 0.6 beta.

 -Jonathan




Re: Adjusting Token Spaces and Rebalancing Data

2010-03-02 Thread Jon Graham
Thanks Jonathan,

My 32-bit java version is at: 1.6.0_13-b03. I'll try a java upgrade.
This tracks well with the exact MaxInt -tmp- Data file size

Jon

On Tue, Mar 2, 2010 at 9:15 AM, Jonathan Ellis jbel...@gmail.com wrote:

 Doing some googling, this is a different JRE bug than the on addressed
 by 795: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6253145.
 It is marked fixed in JDK 6u18, so try upgrading to that.

 -Jonathan

 On Tue, Mar 2, 2010 at 10:46 AM, Jon Graham sjclou...@gmail.com wrote:
  Hello,
 
  I am running a 32-bit linux version 2.6.27.24. My original data set was
  copied from a 64-bit cassandra cluster to a 32-bit cassandra cluster. I
 am
  trying to load balance the data on a 32-bit cluster.
 
  Is the cassandra-795 issue applicable for 32-linux too for the 0.5.0
  release?
 
  Thanks,
  Jon
  On Mon, Mar 1, 2010 at 4:55 PM, Jonathan Ellis jbel...@gmail.com
 wrote:
 
  On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote:
   Reached an EOL or something bizzare occured. Reading from: /
 192.168.2.13
   BufferSizeRemaining: 16
 
  This one is harmless
 
   java.io.IOException: Value too large for defined data type
   at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
   at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
   at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
   at
   org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226)
   at
   org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55)
   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
   Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
   Source)
   at java.lang.Thread.run(Unknown Source)
 
  This one is killing you.
 
  Are you on windows?  If so
  https://issues.apache.org/jira/browse/CASSANDRA-795 should fix it.
  That's in both 0.5.1 and 0.6 beta.
 
  -Jonathan
 
 



Connect during bootstrapping?

2010-03-02 Thread Brian Frank Cooper
Hi folks,

I'm running 0.5 and I had 2 nodes up and running, then added a 3rd node in 
bootstrap mode. I understand from other discussion list threads that the new 
node doesn't serve reads while it is bootstrapping, but does that mean it won't 
connect at all? When I try to connect from my java client, or cassandra-cli, I 
get the exception below. Is it the expected behavior? (Also, cassandra-cli says 
Connected to xxx.yahoo.com even though it isn't really connected...)

Thanks!

brian

Exception java.net.ConnectException: Connection refused
org.apache.thrift.transport.TTransportException: java.net.ConnectException: 
Connection refused
at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
at org.apache.cassandra.cli.CliMain.connect(CliMain.java:65)
at org.apache.cassandra.cli.CliClient.executeConnect(CliClient.java:464)
at org.apache.cassandra.cli.CliClient.executeCLIStmt(CliClient.java:87)
at org.apache.cassandra.cli.CliMain.processCLIStmt(CliMain.java:131)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:172)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:525)
at java.net.Socket.connect(Socket.java:475)
at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
... 5 more


--
Brian Cooper
Principal Research Scientist
Yahoo! Research



Re: Connect during bootstrapping?

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper
coop...@yahoo-inc.com wrote:
 Hi folks,

 I’m running 0.5 and I had 2 nodes up and running, then added a 3rd node in
 bootstrap mode. I understand from other discussion list threads that the new
 node doesn’t serve reads while it is bootstrapping, but does that mean it
 won’t connect at all?

it doesn't start the thrift listener until it is bootstrapped, so yes.

(you can tell when it's bootstrapped by when it appears in nodeprobe
ring.  0.6 also adds bootstrap progress reporting via jmx.)

 When I try to connect from my java client, or
 cassandra-cli, I get the exception below. Is it the expected behavior?
 (Also, cassandra-cli says “Connected to xxx.yahoo.com” even though it isn’t
 really connected...)

This is fixed in https://issues.apache.org/jira/browse/CASSANDRA-807
for 0.6, fwiw.

-Jonathan


Index values: data or pointers?

2010-03-02 Thread Jeremey.Barrett
I'm exploring data layouts and it seems like the common practice is to store an 
index in one CF (e.g. userid for row key and thingid for column name) and then 
to fetch all the things by their thingids separately... so get index, and then 
get each key in the index.

If a thing changes relatively infrequently but gets read often, seems like it 
would be more performant (especially with writes being very fast) to just stuff 
whole objects into indexes rather than simply ids. A whole object could be a 
JSON object or a serialized class or who knows what.

Are there drawbacks to that approach, other than space?

Thanks,
Jeremey.





Re: Index values: data or pointers?

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 4:13 PM,  jeremey.barr...@nokia.com wrote:
 I'm exploring data layouts and it seems like the common practice is to store 
 an index in one CF (e.g. userid for row key and thingid for column name) and 
 then to fetch all the things by their thingids separately... so get index, 
 and then get each key in the index.

 If a thing changes relatively infrequently but gets read often, seems like it 
 would be more performant (especially with writes being very fast) to just 
 stuff whole objects into indexes rather than simply ids. A whole object 
 could be a JSON object or a serialized class or who knows what.

Yes.  This is one place supercolumns can be very useful, since it
allows doing this w/o nasty hacks like you mention. :)

-Jonathan


Re: Index values: data or pointers?

2010-03-02 Thread Jeremey.Barrett
On Mar 2, 2010, at 4:17 PM, ext Jonathan Ellis wrote:

 On Tue, Mar 2, 2010 at 4:13 PM,  jeremey.barr...@nokia.com wrote:
 I'm exploring data layouts and it seems like the common practice is to store 
 an index in one CF (e.g. userid for row key and thingid for column name) and 
 then to fetch all the things by their thingids separately... so get index, 
 and then get each key in the index.
 
 If a thing changes relatively infrequently but gets read often, seems like 
 it would be more performant (especially with writes being very fast) to just 
 stuff whole objects into indexes rather than simply ids. A whole object 
 could be a JSON object or a serialized class or who knows what.
 
 Yes.  This is one place supercolumns can be very useful, since it
 allows doing this w/o nasty hacks like you mention. :)

Good point. :)

I got it in my head that supercolumns aren't indexed (from the ticket of that 
name http://issues.apache.org/jira/browse/CASSANDRA-598), but actually it's the 
subcolumns that aren't indexed, correct? (the former never made any sense to me)

Thanks again,
Jeremey.



Looking for work

2010-03-02 Thread Peter Halliday
I'm looking for work.  My previous employer was a non-profit that lost
funding and my position was cut.  I would love to find a position that
utilizes Cassandra.  I have experience in programming using Python, Perl,
PHP, and C/C++ (mostly Python and Perl).  I have experiencing with system
and network administration as well.  I certainly would be willing to send a
resume talking about my experience more.


Peter Halliday
Excelsior Systems
(Phone:) 607-936-2172
(Cell:) 607-329-6905
(Fax:) 607-398-7928


Re: Looking for work

2010-03-02 Thread Anthony Molinaro
If you are willing to relocate to Pasadena, OpenX is hiring (feel free to
forward me a resume if interested).  I may have to question Digg's claim
to first to production.  We were running 0.3.0 in production last August
and currently have 3 deployments spread over 30 machines in EC2, so heavy
users.

-Anthony

On Tue, Mar 02, 2010 at 06:11:16PM -0800, Chris Goffinet wrote:
 Ditto here at Digg as well. We were the first to production using the open
 source version and have a major investment in the project.
 
 -Chris
 
 On Tue, Mar 2, 2010 at 6:01 PM, Peter Halliday 
 phalli...@excelsiorsystems.net wrote:
 
  I'm looking for work.  My previous employer was a non-profit that lost
  funding and my position was cut.  I would love to find a position that
  utilizes Cassandra.  I have experience in programming using Python, Perl,
  PHP, and C/C++ (mostly Python and Perl).  I have experiencing with system
  and network administration as well.  I certainly would be willing to send a
  resume talking about my experience more.
 
 
  Peter Halliday
  Excelsior Systems
  (Phone:) 607-936-2172
  (Cell:) 607-329-6905
  (Fax:) 607-398-7928
 
 
 
 
 -- 
 Chris Goffinet

-- 

Anthony Molinaro   antho...@alumni.caltech.edu


Re: Looking for work

2010-03-02 Thread Joe Stump
Us too at SimpleGeo! We're Python, Cassandra, Erlang, and a smattering  
of Java and C++.


We have offices in Boulder, CO and SF.

--Joe

--
Typed with big fingers on a small keyboard.

On Mar 2, 2010, at 19:01, Peter Halliday  
phalli...@excelsiorsystems.net wrote:


I'm looking for work.  My previous employer was a non-profit that  
lost funding and my position was cut.  I would love to find a  
position that utilizes Cassandra.  I have experience in programming  
using Python, Perl, PHP, and C/C++ (mostly Python and Perl).  I have  
experiencing with system and network administration as well.  I  
certainly would be willing to send a resume talking about my  
experience more.



Peter Halliday
Excelsior Systems
(Phone:) 607-936-2172
(Cell:) 607-329-6905
(Fax:) 607-398-7928


Re: Looking for work

2010-03-02 Thread Ryan Daum
Maybe the wiki needs a job board ?

On Tue, Mar 2, 2010 at 10:15 PM, Joe Stump j...@joestump.net wrote:

 Us too at SimpleGeo! We're Python, Cassandra, Erlang, and a smattering of
 Java and C++.

 We have offices in Boulder, CO and SF.

 --Joe

 --
 Typed with big fingers on a small keyboard.


 On Mar 2, 2010, at 19:01, Peter Halliday phalli...@excelsiorsystems.net
 wrote:

  I'm looking for work.  My previous employer was a non-profit that lost
 funding and my position was cut.  I would love to find a position that
 utilizes Cassandra.  I have experience in programming using Python, Perl,
 PHP, and C/C++ (mostly Python and Perl).  I have experiencing with system
 and network administration as well.  I certainly would be willing to send a
 resume talking about my experience more.


 Peter Halliday
 Excelsior Systems
 (Phone:) 607-936-2172
 (Cell:) 607-329-6905
 (Fax:) 607-398-7928




Re: Looking for work

2010-03-02 Thread Jonathan Ellis
(This is not to say that I think job posts are off-topic here, because
they are not.)

On Tue, Mar 2, 2010 at 10:43 PM, Jonathan Ellis jbel...@gmail.com wrote:
 If there's one thing that's worse than a mailing list as a job board,
 it's a wiki. :)

 On Tue, Mar 2, 2010 at 10:39 PM, Ryan Daum r...@thimbleware.com wrote:
 Maybe the wiki needs a job board ?
 On Tue, Mar 2, 2010 at 10:15 PM, Joe Stump j...@joestump.net wrote:

 Us too at SimpleGeo! We're Python, Cassandra, Erlang, and a smattering of
 Java and C++.

 We have offices in Boulder, CO and SF.

 --Joe

 --
 Typed with big fingers on a small keyboard.

 On Mar 2, 2010, at 19:01, Peter Halliday phalli...@excelsiorsystems.net
 wrote:

 I'm looking for work.  My previous employer was a non-profit that lost
 funding and my position was cut.  I would love to find a position that
 utilizes Cassandra.  I have experience in programming using Python, Perl,
 PHP, and C/C++ (mostly Python and Perl).  I have experiencing with system
 and network administration as well.  I certainly would be willing to send a
 resume talking about my experience more.


 Peter Halliday
 Excelsior Systems
 (Phone:) 607-936-2172
 (Cell:) 607-329-6905
 (Fax:) 607-398-7928





Re: Connect during bootstrapping?

2010-03-02 Thread Brian Frank Cooper
Thanks for the note.

Can you help me with something else? I can't seem to get any data to transfer 
during bootstrapping...I must be doing something wrong.

Here is what I did: I took 0.6.0-beta2, loaded 2 machines with 60-70GB each. 
Then I started a third node, with AutoBootstrap true. The node claims it is 
bootstrapping:

INFO - Auto DiskAccessMode determined to be mmap
INFO - Saved Token not found. Using Rb0mePN3PheW3haA
INFO - Creating new commitlog segment 
/home/cooperb/cassandra/commitlog/CommitLog-1267594407761.log
INFO - Starting up server gossip
INFO - Joining: getting load information
INFO - Sleeping 9 ms to wait for load information...
INFO - Node /98.137.30.37 is now part of the cluster
INFO - Node /98.137.30.38 is now part of the cluster
INFO - InetAddress /98.137.30.37 is now UP
INFO - InetAddress /98.137.30.38 is now UP
INFO - Joining: getting bootstrap token
INFO - New token will be user148315419 to assume load from /98.137.30.38
INFO - Joining: sleeping 3 for pending range setup
INFO - Bootstrapping

But when I run nodetool streams, no streams are transferring:

Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

And it doesn't look like the node is getting any data. Any ideas?

Thanks for the help...

Brian


On 3/2/10 12:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper
coop...@yahoo-inc.com wrote:
 Hi folks,

 I'm running 0.5 and I had 2 nodes up and running, then added a 3rd node in
 bootstrap mode. I understand from other discussion list threads that the new
 node doesn't serve reads while it is bootstrapping, but does that mean it
 won't connect at all?

it doesn't start the thrift listener until it is bootstrapped, so yes.

(you can tell when it's bootstrapped by when it appears in nodeprobe
ring.  0.6 also adds bootstrap progress reporting via jmx.)

 When I try to connect from my java client, or
 cassandra-cli, I get the exception below. Is it the expected behavior?
 (Also, cassandra-cli says Connected to xxx.yahoo.com even though it isn't
 really connected...)

This is fixed in https://issues.apache.org/jira/browse/CASSANDRA-807
for 0.6, fwiw.

-Jonathan


--
Brian Cooper
Principal Research Scientist
Yahoo! Research



What's the ideal size of a column?

2010-03-02 Thread Cool BSD
Be short - what's the ideal column size in real world?

Long description - I'm working on a prototype, the application is a data
store that holding blobs sizing from couple of KB to hundreds of MB, close
to 1GB in the worst case. The data model is really simple - key is a string
(UUID-like thing), and value is the blob, the only operations are set,
get, and delete.

The reason I pick up Cassandra is the feature of high availability and
dynamic growth, also high write throughput is a great advantage since
read/write ratio is about 1:100. Another idea is using a simple key-value
store to keep UUID to location mapping, and store blob data as file in a NFS
server, but managing growth is not that straightforward.

If the blob size is too big to fit into Cassandra, what's the ideal size?
And if this is the case, I will try to cut it into slices but still keep
everything in Cassandra, is this better than NFS solution?

Thanks,

CB

P.S. The real reason I want to try Cassandra is that I want to play with
something new


Re: Connect during bootstrapping?

2010-03-02 Thread Jonathan Ellis
What are they other nodes doing?  The first step is for them to copy
out locally the data they will send to the new one, that usually takes
a while.  (They will log AntiCompacting ... AntiCompacted when doing
this.)

On Tue, Mar 2, 2010 at 11:50 PM, Brian Frank Cooper
coop...@yahoo-inc.com wrote:
 Thanks for the note.

 Can you help me with something else? I can’t seem to get any data to
 transfer during bootstrapping...I must be doing something wrong.

 Here is what I did: I took 0.6.0-beta2, loaded 2 machines with 60-70GB each.
 Then I started a third node, with AutoBootstrap true. The node claims it is
 bootstrapping:

 INFO - Auto DiskAccessMode determined to be mmap
 INFO - Saved Token not found. Using Rb0mePN3PheW3haA
 INFO - Creating new commitlog segment
 /home/cooperb/cassandra/commitlog/CommitLog-1267594407761.log
 INFO - Starting up server gossip
 INFO - Joining: getting load information
 INFO - Sleeping 9 ms to wait for load information...
 INFO - Node /98.137.30.37 is now part of the cluster
 INFO - Node /98.137.30.38 is now part of the cluster
 INFO - InetAddress /98.137.30.37 is now UP
 INFO - InetAddress /98.137.30.38 is now UP
 INFO - Joining: getting bootstrap token
 INFO - New token will be user148315419 to assume load from /98.137.30.38
 INFO - Joining: sleeping 3 for pending range setup
 INFO - Bootstrapping

 But when I run nodetool streams, no streams are transferring:

 Mode: Bootstrapping
 Not sending any streams.
 Not receiving any streams.

 And it doesn’t look like the node is getting any data. Any ideas?

 Thanks for the help...

 Brian


 On 3/2/10 12:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

 On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper
 coop...@yahoo-inc.com wrote:
 Hi folks,

 I’m running 0.5 and I had 2 nodes up and running, then added a 3rd node in
 bootstrap mode. I understand from other discussion list threads that the
 new
 node doesn’t serve reads while it is bootstrapping, but does that mean it
 won’t connect at all?

 it doesn't start the thrift listener until it is bootstrapped, so yes.

 (you can tell when it's bootstrapped by when it appears in nodeprobe
 ring.  0.6 also adds bootstrap progress reporting via jmx.)

 When I try to connect from my java client, or
 cassandra-cli, I get the exception below. Is it the expected behavior?
 (Also, cassandra-cli says “Connected to xxx.yahoo.com” even though it
 isn’t
 really connected...)

 This is fixed in https://issues.apache.org/jira/browse/CASSANDRA-807
 for 0.6, fwiw.

 -Jonathan


 --
 Brian Cooper
 Principal Research Scientist
 Yahoo! Research




Re: Connect during bootstrapping?

2010-03-02 Thread Stu Hood
You are probably in the portion of bootstrap where data to be transferred is 
split out to disk, which can take a while: see 
https://issues.apache.org/jira/browse/CASSANDRA-579

Look for a 'streaming' subdirectory in your data directories to confirm.

-Original Message-
From: Brian Frank Cooper coop...@yahoo-inc.com
Sent: Tuesday, March 2, 2010 11:50pm
To: cassandra-user@incubator.apache.org cassandra-user@incubator.apache.org
Subject: Re: Connect during bootstrapping?

Thanks for the note.

Can you help me with something else? I can't seem to get any data to transfer 
during bootstrapping...I must be doing something wrong.

Here is what I did: I took 0.6.0-beta2, loaded 2 machines with 60-70GB each. 
Then I started a third node, with AutoBootstrap true. The node claims it is 
bootstrapping:

INFO - Auto DiskAccessMode determined to be mmap
INFO - Saved Token not found. Using Rb0mePN3PheW3haA
INFO - Creating new commitlog segment 
/home/cooperb/cassandra/commitlog/CommitLog-1267594407761.log
INFO - Starting up server gossip
INFO - Joining: getting load information
INFO - Sleeping 9 ms to wait for load information...
INFO - Node /98.137.30.37 is now part of the cluster
INFO - Node /98.137.30.38 is now part of the cluster
INFO - InetAddress /98.137.30.37 is now UP
INFO - InetAddress /98.137.30.38 is now UP
INFO - Joining: getting bootstrap token
INFO - New token will be user148315419 to assume load from /98.137.30.38
INFO - Joining: sleeping 3 for pending range setup
INFO - Bootstrapping

But when I run nodetool streams, no streams are transferring:

Mode: Bootstrapping
Not sending any streams.
Not receiving any streams.

And it doesn't look like the node is getting any data. Any ideas?

Thanks for the help...

Brian


On 3/2/10 12:22 PM, Jonathan Ellis jbel...@gmail.com wrote:

On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper
coop...@yahoo-inc.com wrote:
 Hi folks,

 I'm running 0.5 and I had 2 nodes up and running, then added a 3rd node in
 bootstrap mode. I understand from other discussion list threads that the new
 node doesn't serve reads while it is bootstrapping, but does that mean it
 won't connect at all?

it doesn't start the thrift listener until it is bootstrapped, so yes.

(you can tell when it's bootstrapped by when it appears in nodeprobe
ring.  0.6 also adds bootstrap progress reporting via jmx.)

 When I try to connect from my java client, or
 cassandra-cli, I get the exception below. Is it the expected behavior?
 (Also, cassandra-cli says Connected to xxx.yahoo.com even though it isn't
 really connected...)

This is fixed in https://issues.apache.org/jira/browse/CASSANDRA-807
for 0.6, fwiw.

-Jonathan


--
Brian Cooper
Principal Research Scientist
Yahoo! Research





Re: What's the ideal size of a column?

2010-03-02 Thread Jonathan Ellis
On Tue, Mar 2, 2010 at 11:57 PM, Cool BSD c...@coolbsd.com wrote:
 Be short - what's the ideal column size in real world?

 Long description - I'm working on a prototype, the application is a data
 store that holding blobs sizing from couple of KB to hundreds of MB, close
 to 1GB in the worst case.

You should be fine.  Single digits of MB is a good rule of thumb.