Re: Questions while evaluating Cassandra
On Tue, Mar 2, 2010 at 6:43 AM, Eran Kutner e...@gigya.com wrote: Is the procedure described in the description of ticket CASSANDRA-44 really the way to do schema changes in the latest release? I'm not sure what's your thoughts about this but our experience is that every release of our software requires schema changes because we add new column families for indexes. Yes, that is how it is for 0.5 and 0.6. 0.7 will add online schema changes (i.e., fix -44), Gary is working on that now. Any idea on the timeframe for 0.7? We are trying for 3-4 months, i.e. roughly the same as as our last 4 releases. Our application needs a lot of range scans. Is there anything being done to improve the poor range scan performance as reflected here: http://www.brianfrankcooper.net/pubs/ycsb-v4.pdf ? https://issues.apache.org/jira/browse/CASSANDRA-821 is open, also for the 0.7 release. Johan is working on this. What is the reason for the replication strategy with two DCs? As far as I understand it means that only one replica will exist in the second DC. It also means that quorum reads will fail when attempted on the second DC while the first DC is down. Am I missing something? Yes: - That strategy is meant for doing reads w/ CL.ONE; it guarantees at least one replica in each DC, for low latency with that CL - Quorum is based on the whole cluster, not per-DC. DatacenterShardStrategy will put multiple replicas in each DC, for use with CL.DCQUORUM, that is, a majority of replicas in the same DC as the coordinator node for the current request. DCQOURUM is not yet finished, though; currently it behaves the same as CL.ALL. Are there any plans to have a inter-cluster replication option? I mean having two clusters running in two DCs, each will be stand alone but they will replicate data between themselves. No. This is worse in every respect, since it means you get to reinvent the existing repair, hinted handoff, etc code for when replication breaks, poorly. This can avoid the problem mentioned above, as well as avoid the high cost of inter-DC traffic when doing Read-Repairs for every read. Of course if you don't RR then you can read inconsistent data until your next full repair. Not a good trade. Remember RR is done in the background so the latency doesn't matter. From everything I've read I didn't understand if load balancing is local or global. In other words, what happens exactly when a new node is added? Will it only balance its two neighbors on the ring or will the re-balance propagate through the ring and all the nodes will be rebalanced evenly? The former. Cascading data moves around the ring is a Bad Idea. (Since you read the Yahoo hbase/cassandra paper -- if hbase does this, maybe that is why adding a new node basically kills their cluster for several minutes?) I see that Hadoop support is coming in 0.6 but from following the ticket on Jira (CASSANDRA-342) I didn't understand if it will support the orderPreservingPartitioner or not. It supports all partitioners. Do the clients have to be recompiled and deployed when a new version of Cassandra is deployed, or are new releases backward compatible? The short answer is, we maintained backwards compatibility for 0.4 - 0.5 - 0.6, but we are going to break things in 0.7 moving from String keys to byte[] and possibly other changes. -Jonathan
Re: Adjusting Token Spaces and Rebalancing Data
Hello, I am running a 32-bit linux version 2.6.27.24. My original data set was copied from a 64-bit cassandra cluster to a 32-bit cassandra cluster. I am trying to load balance the data on a 32-bit cluster. Is the cassandra-795 issue applicable for 32-linux too for the 0.5.0 release? Thanks, Jon On Mon, Mar 1, 2010 at 4:55 PM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote: Reached an EOL or something bizzare occured. Reading from: /192.168.2.13 BufferSizeRemaining: 16 This one is harmless java.io.IOException: Value too large for defined data type at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source) at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source) at org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226) at org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) This one is killing you. Are you on windows? If so https://issues.apache.org/jira/browse/CASSANDRA-795 should fix it. That's in both 0.5.1 and 0.6 beta. -Jonathan
Re: Adjusting Token Spaces and Rebalancing Data
Doing some googling, this is a different JRE bug than the on addressed by 795: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6253145. It is marked fixed in JDK 6u18, so try upgrading to that. -Jonathan On Tue, Mar 2, 2010 at 10:46 AM, Jon Graham sjclou...@gmail.com wrote: Hello, I am running a 32-bit linux version 2.6.27.24. My original data set was copied from a 64-bit cassandra cluster to a 32-bit cassandra cluster. I am trying to load balance the data on a 32-bit cluster. Is the cassandra-795 issue applicable for 32-linux too for the 0.5.0 release? Thanks, Jon On Mon, Mar 1, 2010 at 4:55 PM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote: Reached an EOL or something bizzare occured. Reading from: /192.168.2.13 BufferSizeRemaining: 16 This one is harmless java.io.IOException: Value too large for defined data type at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source) at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source) at org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226) at org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) This one is killing you. Are you on windows? If so https://issues.apache.org/jira/browse/CASSANDRA-795 should fix it. That's in both 0.5.1 and 0.6 beta. -Jonathan
Re: Adjusting Token Spaces and Rebalancing Data
Thanks Jonathan, My 32-bit java version is at: 1.6.0_13-b03. I'll try a java upgrade. This tracks well with the exact MaxInt -tmp- Data file size Jon On Tue, Mar 2, 2010 at 9:15 AM, Jonathan Ellis jbel...@gmail.com wrote: Doing some googling, this is a different JRE bug than the on addressed by 795: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6253145. It is marked fixed in JDK 6u18, so try upgrading to that. -Jonathan On Tue, Mar 2, 2010 at 10:46 AM, Jon Graham sjclou...@gmail.com wrote: Hello, I am running a 32-bit linux version 2.6.27.24. My original data set was copied from a 64-bit cassandra cluster to a 32-bit cassandra cluster. I am trying to load balance the data on a 32-bit cluster. Is the cassandra-795 issue applicable for 32-linux too for the 0.5.0 release? Thanks, Jon On Mon, Mar 1, 2010 at 4:55 PM, Jonathan Ellis jbel...@gmail.com wrote: On Mon, Mar 1, 2010 at 5:39 PM, Jon Graham sjclou...@gmail.com wrote: Reached an EOL or something bizzare occured. Reading from: / 192.168.2.13 BufferSizeRemaining: 16 This one is harmless java.io.IOException: Value too large for defined data type at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source) at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source) at org.apache.cassandra.net.TcpConnection.stream(TcpConnection.java:226) at org.apache.cassandra.net.FileStreamTask.run(FileStreamTask.java:55) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) This one is killing you. Are you on windows? If so https://issues.apache.org/jira/browse/CASSANDRA-795 should fix it. That's in both 0.5.1 and 0.6 beta. -Jonathan
Connect during bootstrapping?
Hi folks, I'm running 0.5 and I had 2 nodes up and running, then added a 3rd node in bootstrap mode. I understand from other discussion list threads that the new node doesn't serve reads while it is bootstrapping, but does that mean it won't connect at all? When I try to connect from my java client, or cassandra-cli, I get the exception below. Is it the expected behavior? (Also, cassandra-cli says Connected to xxx.yahoo.com even though it isn't really connected...) Thanks! brian Exception java.net.ConnectException: Connection refused org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused at org.apache.thrift.transport.TSocket.open(TSocket.java:185) at org.apache.cassandra.cli.CliMain.connect(CliMain.java:65) at org.apache.cassandra.cli.CliClient.executeConnect(CliClient.java:464) at org.apache.cassandra.cli.CliClient.executeCLIStmt(CliClient.java:87) at org.apache.cassandra.cli.CliMain.processCLIStmt(CliMain.java:131) at org.apache.cassandra.cli.CliMain.main(CliMain.java:172) Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) at java.net.Socket.connect(Socket.java:525) at java.net.Socket.connect(Socket.java:475) at org.apache.thrift.transport.TSocket.open(TSocket.java:180) ... 5 more -- Brian Cooper Principal Research Scientist Yahoo! Research
Re: Connect during bootstrapping?
On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: Hi folks, I’m running 0.5 and I had 2 nodes up and running, then added a 3rd node in bootstrap mode. I understand from other discussion list threads that the new node doesn’t serve reads while it is bootstrapping, but does that mean it won’t connect at all? it doesn't start the thrift listener until it is bootstrapped, so yes. (you can tell when it's bootstrapped by when it appears in nodeprobe ring. 0.6 also adds bootstrap progress reporting via jmx.) When I try to connect from my java client, or cassandra-cli, I get the exception below. Is it the expected behavior? (Also, cassandra-cli says “Connected to xxx.yahoo.com” even though it isn’t really connected...) This is fixed in https://issues.apache.org/jira/browse/CASSANDRA-807 for 0.6, fwiw. -Jonathan
Index values: data or pointers?
I'm exploring data layouts and it seems like the common practice is to store an index in one CF (e.g. userid for row key and thingid for column name) and then to fetch all the things by their thingids separately... so get index, and then get each key in the index. If a thing changes relatively infrequently but gets read often, seems like it would be more performant (especially with writes being very fast) to just stuff whole objects into indexes rather than simply ids. A whole object could be a JSON object or a serialized class or who knows what. Are there drawbacks to that approach, other than space? Thanks, Jeremey.
Re: Index values: data or pointers?
On Tue, Mar 2, 2010 at 4:13 PM, jeremey.barr...@nokia.com wrote: I'm exploring data layouts and it seems like the common practice is to store an index in one CF (e.g. userid for row key and thingid for column name) and then to fetch all the things by their thingids separately... so get index, and then get each key in the index. If a thing changes relatively infrequently but gets read often, seems like it would be more performant (especially with writes being very fast) to just stuff whole objects into indexes rather than simply ids. A whole object could be a JSON object or a serialized class or who knows what. Yes. This is one place supercolumns can be very useful, since it allows doing this w/o nasty hacks like you mention. :) -Jonathan
Re: Index values: data or pointers?
On Mar 2, 2010, at 4:17 PM, ext Jonathan Ellis wrote: On Tue, Mar 2, 2010 at 4:13 PM, jeremey.barr...@nokia.com wrote: I'm exploring data layouts and it seems like the common practice is to store an index in one CF (e.g. userid for row key and thingid for column name) and then to fetch all the things by their thingids separately... so get index, and then get each key in the index. If a thing changes relatively infrequently but gets read often, seems like it would be more performant (especially with writes being very fast) to just stuff whole objects into indexes rather than simply ids. A whole object could be a JSON object or a serialized class or who knows what. Yes. This is one place supercolumns can be very useful, since it allows doing this w/o nasty hacks like you mention. :) Good point. :) I got it in my head that supercolumns aren't indexed (from the ticket of that name http://issues.apache.org/jira/browse/CASSANDRA-598), but actually it's the subcolumns that aren't indexed, correct? (the former never made any sense to me) Thanks again, Jeremey.
Looking for work
I'm looking for work. My previous employer was a non-profit that lost funding and my position was cut. I would love to find a position that utilizes Cassandra. I have experience in programming using Python, Perl, PHP, and C/C++ (mostly Python and Perl). I have experiencing with system and network administration as well. I certainly would be willing to send a resume talking about my experience more. Peter Halliday Excelsior Systems (Phone:) 607-936-2172 (Cell:) 607-329-6905 (Fax:) 607-398-7928
Re: Looking for work
If you are willing to relocate to Pasadena, OpenX is hiring (feel free to forward me a resume if interested). I may have to question Digg's claim to first to production. We were running 0.3.0 in production last August and currently have 3 deployments spread over 30 machines in EC2, so heavy users. -Anthony On Tue, Mar 02, 2010 at 06:11:16PM -0800, Chris Goffinet wrote: Ditto here at Digg as well. We were the first to production using the open source version and have a major investment in the project. -Chris On Tue, Mar 2, 2010 at 6:01 PM, Peter Halliday phalli...@excelsiorsystems.net wrote: I'm looking for work. My previous employer was a non-profit that lost funding and my position was cut. I would love to find a position that utilizes Cassandra. I have experience in programming using Python, Perl, PHP, and C/C++ (mostly Python and Perl). I have experiencing with system and network administration as well. I certainly would be willing to send a resume talking about my experience more. Peter Halliday Excelsior Systems (Phone:) 607-936-2172 (Cell:) 607-329-6905 (Fax:) 607-398-7928 -- Chris Goffinet -- Anthony Molinaro antho...@alumni.caltech.edu
Re: Looking for work
Us too at SimpleGeo! We're Python, Cassandra, Erlang, and a smattering of Java and C++. We have offices in Boulder, CO and SF. --Joe -- Typed with big fingers on a small keyboard. On Mar 2, 2010, at 19:01, Peter Halliday phalli...@excelsiorsystems.net wrote: I'm looking for work. My previous employer was a non-profit that lost funding and my position was cut. I would love to find a position that utilizes Cassandra. I have experience in programming using Python, Perl, PHP, and C/C++ (mostly Python and Perl). I have experiencing with system and network administration as well. I certainly would be willing to send a resume talking about my experience more. Peter Halliday Excelsior Systems (Phone:) 607-936-2172 (Cell:) 607-329-6905 (Fax:) 607-398-7928
Re: Looking for work
Maybe the wiki needs a job board ? On Tue, Mar 2, 2010 at 10:15 PM, Joe Stump j...@joestump.net wrote: Us too at SimpleGeo! We're Python, Cassandra, Erlang, and a smattering of Java and C++. We have offices in Boulder, CO and SF. --Joe -- Typed with big fingers on a small keyboard. On Mar 2, 2010, at 19:01, Peter Halliday phalli...@excelsiorsystems.net wrote: I'm looking for work. My previous employer was a non-profit that lost funding and my position was cut. I would love to find a position that utilizes Cassandra. I have experience in programming using Python, Perl, PHP, and C/C++ (mostly Python and Perl). I have experiencing with system and network administration as well. I certainly would be willing to send a resume talking about my experience more. Peter Halliday Excelsior Systems (Phone:) 607-936-2172 (Cell:) 607-329-6905 (Fax:) 607-398-7928
Re: Looking for work
(This is not to say that I think job posts are off-topic here, because they are not.) On Tue, Mar 2, 2010 at 10:43 PM, Jonathan Ellis jbel...@gmail.com wrote: If there's one thing that's worse than a mailing list as a job board, it's a wiki. :) On Tue, Mar 2, 2010 at 10:39 PM, Ryan Daum r...@thimbleware.com wrote: Maybe the wiki needs a job board ? On Tue, Mar 2, 2010 at 10:15 PM, Joe Stump j...@joestump.net wrote: Us too at SimpleGeo! We're Python, Cassandra, Erlang, and a smattering of Java and C++. We have offices in Boulder, CO and SF. --Joe -- Typed with big fingers on a small keyboard. On Mar 2, 2010, at 19:01, Peter Halliday phalli...@excelsiorsystems.net wrote: I'm looking for work. My previous employer was a non-profit that lost funding and my position was cut. I would love to find a position that utilizes Cassandra. I have experience in programming using Python, Perl, PHP, and C/C++ (mostly Python and Perl). I have experiencing with system and network administration as well. I certainly would be willing to send a resume talking about my experience more. Peter Halliday Excelsior Systems (Phone:) 607-936-2172 (Cell:) 607-329-6905 (Fax:) 607-398-7928
Re: Connect during bootstrapping?
Thanks for the note. Can you help me with something else? I can't seem to get any data to transfer during bootstrapping...I must be doing something wrong. Here is what I did: I took 0.6.0-beta2, loaded 2 machines with 60-70GB each. Then I started a third node, with AutoBootstrap true. The node claims it is bootstrapping: INFO - Auto DiskAccessMode determined to be mmap INFO - Saved Token not found. Using Rb0mePN3PheW3haA INFO - Creating new commitlog segment /home/cooperb/cassandra/commitlog/CommitLog-1267594407761.log INFO - Starting up server gossip INFO - Joining: getting load information INFO - Sleeping 9 ms to wait for load information... INFO - Node /98.137.30.37 is now part of the cluster INFO - Node /98.137.30.38 is now part of the cluster INFO - InetAddress /98.137.30.37 is now UP INFO - InetAddress /98.137.30.38 is now UP INFO - Joining: getting bootstrap token INFO - New token will be user148315419 to assume load from /98.137.30.38 INFO - Joining: sleeping 3 for pending range setup INFO - Bootstrapping But when I run nodetool streams, no streams are transferring: Mode: Bootstrapping Not sending any streams. Not receiving any streams. And it doesn't look like the node is getting any data. Any ideas? Thanks for the help... Brian On 3/2/10 12:22 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: Hi folks, I'm running 0.5 and I had 2 nodes up and running, then added a 3rd node in bootstrap mode. I understand from other discussion list threads that the new node doesn't serve reads while it is bootstrapping, but does that mean it won't connect at all? it doesn't start the thrift listener until it is bootstrapped, so yes. (you can tell when it's bootstrapped by when it appears in nodeprobe ring. 0.6 also adds bootstrap progress reporting via jmx.) When I try to connect from my java client, or cassandra-cli, I get the exception below. Is it the expected behavior? (Also, cassandra-cli says Connected to xxx.yahoo.com even though it isn't really connected...) This is fixed in https://issues.apache.org/jira/browse/CASSANDRA-807 for 0.6, fwiw. -Jonathan -- Brian Cooper Principal Research Scientist Yahoo! Research
What's the ideal size of a column?
Be short - what's the ideal column size in real world? Long description - I'm working on a prototype, the application is a data store that holding blobs sizing from couple of KB to hundreds of MB, close to 1GB in the worst case. The data model is really simple - key is a string (UUID-like thing), and value is the blob, the only operations are set, get, and delete. The reason I pick up Cassandra is the feature of high availability and dynamic growth, also high write throughput is a great advantage since read/write ratio is about 1:100. Another idea is using a simple key-value store to keep UUID to location mapping, and store blob data as file in a NFS server, but managing growth is not that straightforward. If the blob size is too big to fit into Cassandra, what's the ideal size? And if this is the case, I will try to cut it into slices but still keep everything in Cassandra, is this better than NFS solution? Thanks, CB P.S. The real reason I want to try Cassandra is that I want to play with something new
Re: Connect during bootstrapping?
What are they other nodes doing? The first step is for them to copy out locally the data they will send to the new one, that usually takes a while. (They will log AntiCompacting ... AntiCompacted when doing this.) On Tue, Mar 2, 2010 at 11:50 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: Thanks for the note. Can you help me with something else? I can’t seem to get any data to transfer during bootstrapping...I must be doing something wrong. Here is what I did: I took 0.6.0-beta2, loaded 2 machines with 60-70GB each. Then I started a third node, with AutoBootstrap true. The node claims it is bootstrapping: INFO - Auto DiskAccessMode determined to be mmap INFO - Saved Token not found. Using Rb0mePN3PheW3haA INFO - Creating new commitlog segment /home/cooperb/cassandra/commitlog/CommitLog-1267594407761.log INFO - Starting up server gossip INFO - Joining: getting load information INFO - Sleeping 9 ms to wait for load information... INFO - Node /98.137.30.37 is now part of the cluster INFO - Node /98.137.30.38 is now part of the cluster INFO - InetAddress /98.137.30.37 is now UP INFO - InetAddress /98.137.30.38 is now UP INFO - Joining: getting bootstrap token INFO - New token will be user148315419 to assume load from /98.137.30.38 INFO - Joining: sleeping 3 for pending range setup INFO - Bootstrapping But when I run nodetool streams, no streams are transferring: Mode: Bootstrapping Not sending any streams. Not receiving any streams. And it doesn’t look like the node is getting any data. Any ideas? Thanks for the help... Brian On 3/2/10 12:22 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: Hi folks, I’m running 0.5 and I had 2 nodes up and running, then added a 3rd node in bootstrap mode. I understand from other discussion list threads that the new node doesn’t serve reads while it is bootstrapping, but does that mean it won’t connect at all? it doesn't start the thrift listener until it is bootstrapped, so yes. (you can tell when it's bootstrapped by when it appears in nodeprobe ring. 0.6 also adds bootstrap progress reporting via jmx.) When I try to connect from my java client, or cassandra-cli, I get the exception below. Is it the expected behavior? (Also, cassandra-cli says “Connected to xxx.yahoo.com” even though it isn’t really connected...) This is fixed in https://issues.apache.org/jira/browse/CASSANDRA-807 for 0.6, fwiw. -Jonathan -- Brian Cooper Principal Research Scientist Yahoo! Research
Re: Connect during bootstrapping?
You are probably in the portion of bootstrap where data to be transferred is split out to disk, which can take a while: see https://issues.apache.org/jira/browse/CASSANDRA-579 Look for a 'streaming' subdirectory in your data directories to confirm. -Original Message- From: Brian Frank Cooper coop...@yahoo-inc.com Sent: Tuesday, March 2, 2010 11:50pm To: cassandra-user@incubator.apache.org cassandra-user@incubator.apache.org Subject: Re: Connect during bootstrapping? Thanks for the note. Can you help me with something else? I can't seem to get any data to transfer during bootstrapping...I must be doing something wrong. Here is what I did: I took 0.6.0-beta2, loaded 2 machines with 60-70GB each. Then I started a third node, with AutoBootstrap true. The node claims it is bootstrapping: INFO - Auto DiskAccessMode determined to be mmap INFO - Saved Token not found. Using Rb0mePN3PheW3haA INFO - Creating new commitlog segment /home/cooperb/cassandra/commitlog/CommitLog-1267594407761.log INFO - Starting up server gossip INFO - Joining: getting load information INFO - Sleeping 9 ms to wait for load information... INFO - Node /98.137.30.37 is now part of the cluster INFO - Node /98.137.30.38 is now part of the cluster INFO - InetAddress /98.137.30.37 is now UP INFO - InetAddress /98.137.30.38 is now UP INFO - Joining: getting bootstrap token INFO - New token will be user148315419 to assume load from /98.137.30.38 INFO - Joining: sleeping 3 for pending range setup INFO - Bootstrapping But when I run nodetool streams, no streams are transferring: Mode: Bootstrapping Not sending any streams. Not receiving any streams. And it doesn't look like the node is getting any data. Any ideas? Thanks for the help... Brian On 3/2/10 12:22 PM, Jonathan Ellis jbel...@gmail.com wrote: On Tue, Mar 2, 2010 at 1:54 PM, Brian Frank Cooper coop...@yahoo-inc.com wrote: Hi folks, I'm running 0.5 and I had 2 nodes up and running, then added a 3rd node in bootstrap mode. I understand from other discussion list threads that the new node doesn't serve reads while it is bootstrapping, but does that mean it won't connect at all? it doesn't start the thrift listener until it is bootstrapped, so yes. (you can tell when it's bootstrapped by when it appears in nodeprobe ring. 0.6 also adds bootstrap progress reporting via jmx.) When I try to connect from my java client, or cassandra-cli, I get the exception below. Is it the expected behavior? (Also, cassandra-cli says Connected to xxx.yahoo.com even though it isn't really connected...) This is fixed in https://issues.apache.org/jira/browse/CASSANDRA-807 for 0.6, fwiw. -Jonathan -- Brian Cooper Principal Research Scientist Yahoo! Research
Re: What's the ideal size of a column?
On Tue, Mar 2, 2010 at 11:57 PM, Cool BSD c...@coolbsd.com wrote: Be short - what's the ideal column size in real world? Long description - I'm working on a prototype, the application is a data store that holding blobs sizing from couple of KB to hundreds of MB, close to 1GB in the worst case. You should be fine. Single digits of MB is a good rule of thumb.