Error installing Cassandra from package on Ubuntu

2010-03-03 Thread Beier Cai
I'm trying to install Cassandra from package on Ubuntu8.04 based on the info from following page: http://wiki.apache.org/cassandra/DebianPackaging I added the two memtioned sources to /etc/apt/sources.list, did an apt-get update, but got the following error W: GPG error: http://www.apache.org un

failed to identify others in a 3-node ring

2010-03-03 Thread Pahud
Hello list, I just setup a 3-node ring in a virtualbox bridging environment. By running the 'cassandra -f' the log indicates it discovers other nodes but if I execute 'nodeprobe -host ring', it only display one node. guestOS ~ # /usr/local/src/apache-cassandra-0.5.1/bin/nodeprobe -host 192.168.1

Re: Looking for work

2010-03-03 Thread Joseph Stein
I just started a LinkedIn group called "Cassandra NoSQL" http://www.linkedin.com/groups?gid=2822930 I invite folks to join as it has a job board and is a good place for networking stuff too =8^) I look forward to the continued nitty gritty that goes on here perhaps the linked in discussions can gr

Re: Error installing Cassandra from package on Ubuntu

2010-03-03 Thread Hernan Badenes
Beier, This may help: http://daily-hacking.blogspot.com/2009/06/nopubkey-during-apt-get-update.html (Also: It would be a good idea to add this to the wiki.) Regards, Hernan. From: Beier Cai To: cassandra-user@incubator.apache.org Date: 03/03/2010 07:26 AM Subject: Error installing Cassandra

Re: Error installing Cassandra from package on Ubuntu

2010-03-03 Thread Michael Shuler
On 03/03/2010 04:22 AM, Beier Cai wrote: I'm trying to install Cassandra from package on Ubuntu8.04 based on the info from following page: http://wiki.apache.org/cassandra/DebianPackaging I added the two memtioned sources to /etc/apt/sources.list, did an apt-get update, but got the following

Re: finding Cassandra servers

2010-03-03 Thread Ted Zlatanov
On Mon, 01 Mar 2010 12:15:11 -0600 Ted Zlatanov wrote: TZ> I need to find Cassandra servers on my network from several types of TZ> clients and platforms. The goal is to make adding and removing servers TZ> painless, assuming a leading window of at least 1 hour. The discovery TZ> should be aut

Re: Connect during bootstrapping?

2010-03-03 Thread Jonathan Ellis
Providing "what is going on, nothing seems to be happening" visibility is something we have struggled with here. When we get https://issues.apache.org/jira/browse/CASSANDRA-579 done for 0.7 we won't have the big "waiting to stream" problem since we'll stream directly from the data files w/o antico

why have ColumnFamilies?

2010-03-03 Thread Ted Zlatanov
I don't understand the advantages of ColumnFamilies over a SuperColumnFamily with just one supercolumn. Why have the former if the latter is functionally equivalent? Thanks Ted

Re: why have ColumnFamilies?

2010-03-03 Thread Jonathan Ellis
http://issues.apache.org/jira/browse/CASSANDRA-598 2010/3/3 Ted Zlatanov : > I don't understand the advantages of ColumnFamilies over a > SuperColumnFamily with just one supercolumn.  Why have the former if the > latter is functionally equivalent? > > Thanks > Ted > >

Re: why have ColumnFamilies?

2010-03-03 Thread Alexandre Conrad
2010/3/3 Ted Zlatanov : > I don't understand the advantages of ColumnFamilies over a > SuperColumnFamily with just one supercolumn.  Why have the former if the > latter is functionally equivalent? Being pretty new here with Cassandra's terminology, I'm not sure what a SuperColumnFamily is. Or mayb

Re: finding Cassandra servers

2010-03-03 Thread Gary Dusbabek
2010/3/3 Ted Zlatanov : > On Mon, 01 Mar 2010 12:15:11 -0600 Ted Zlatanov wrote: > > TZ> I need to find Cassandra servers on my network from several types of > TZ> clients and platforms.  The goal is to make adding and removing servers > TZ> painless, assuming a leading window of at least 1 hour.

Re: why have ColumnFamilies?

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 07:23:48 -0600 Jonathan Ellis wrote: JE> 2010/3/3 Ted Zlatanov : >> I don't understand the advantages of ColumnFamilies over a >> SuperColumnFamily with just one supercolumn.  Why have the former if the >> latter is functionally equivalent? JE> http://issues.apache.org/jira/b

Re: why have ColumnFamilies?

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 14:43:14 +0100 Alexandre Conrad wrote: AC> 2010/3/3 Ted Zlatanov : >> I don't understand the advantages of ColumnFamilies over a >> SuperColumnFamily with just one supercolumn.  Why have the former if the >> latter is functionally equivalent? AC> As far as I understand, ther

Re: why have ColumnFamilies?

2010-03-03 Thread Gary Dusbabek
On Wed, Mar 3, 2010 at 07:43, Alexandre Conrad wrote: > As far as I understand, there's how I organize Cassandra entities: > > http://paste.pocoo.org/show/185126/ > > Is this somehow correct? > > -- > Alex > twitter.com/alexconrad > It is basically correct. Your diagram could be extended by indi

Re: why have ColumnFamilies?

2010-03-03 Thread Alexandre Conrad
2010/3/3 Ted Zlatanov : > KeySpace->Row->ColumnFamily->Column[name, value] > (a two-level map) > > KeySpace->Row->SuperColumnFamily->SuperColumn[name]->Column[name, value] > (a three-level map) Thanks for the explanation. So this means that entities under SuperColumnFamily can only be SuperColumns

Re: finding Cassandra servers

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 07:57:32 -0600 Gary Dusbabek wrote: GD> 2010/3/3 Ted Zlatanov : TZ> I need to find Cassandra servers on my network from several types of TZ> clients and platforms.  The goal is to make adding and removing servers TZ> painless, assuming a leading window of at least 1 hour.  The

Re: why have ColumnFamilies?

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 15:21:42 +0100 Alexandre Conrad wrote: AC> 2010/3/3 Ted Zlatanov : KeySpace-> Row->ColumnFamily->Column[name, value] >> (a two-level map) >> KeySpace-> Row->SuperColumnFamily->SuperColumn[name]->Column[name, value] >> (a three-level map) AC> Thanks for the explanation. So t

Re: why have ColumnFamilies?

2010-03-03 Thread Alexandre Conrad
2010/3/3 Gary Dusbabek : > It is basically correct.  Your diagram could be extended by indicating > that columns consist of name, value, timestamp and that super columns > have names and that a column family consists exclusively of columns or > super columns (never both). Thanks Gary for the preci

Re: finding Cassandra servers

2010-03-03 Thread Gary Dusbabek
2010/3/3 Ted Zlatanov : > > I don't think routing multicasts across subnets is a burden. Try telling that to a network administrator who is concerned about flooding his routers with multicast chatter. First, you'll have to find a network administrator who is willing to even have that conversation

Re: why have ColumnFamilies?

2010-03-03 Thread Jonathan Ellis
I would rather move to a more flexible model ("as many levels of nesting as you want") than a less-flexible one. 2010/3/3 Ted Zlatanov : > On Wed, 3 Mar 2010 07:23:48 -0600 Jonathan Ellis wrote: > > JE> 2010/3/3 Ted Zlatanov : >>> I don't understand the advantages of ColumnFamilies over a >>> Sup

Re: finding Cassandra servers

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 08:41:18 -0600 Gary Dusbabek wrote: GD> It wouldn't be a lot work for you to write a mdns service that would GD> query the seeds for endpoints and publish it to interested clients. GD> It could go in contrib. This requires knowledge of the seeds so I need to at least look in

Re: why have ColumnFamilies?

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 08:56:16 -0600 Jonathan Ellis wrote: JE> I would rather move to a more flexible model ("as many levels of JE> nesting as you want") than a less-flexible one. That's very exciting. I've often wished for "just one more level" while putting Cassandra schemas together, so I hope

Re: finding Cassandra servers

2010-03-03 Thread Gary Dusbabek
2010/3/3 Ted Zlatanov : > On Wed, 3 Mar 2010 08:41:18 -0600 Gary Dusbabek wrote: > > GD> It wouldn't be a lot work for you to write a mdns service that would > GD> query the seeds for endpoints and publish it to interested clients. > GD> It could go in contrib. > > This requires knowledge of the s

Re: finding Cassandra servers

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 09:32:33 -0600 Gary Dusbabek wrote: GD> 2010/3/3 Ted Zlatanov : >> This requires knowledge of the seeds so I need to at least look in >> storage-conf.xml to find them.  Are you saying there's no chance of >> Cassandra nodes (or just seeds) announcing themselves, even if it's >

Re: finding Cassandra servers

2010-03-03 Thread Gary Dusbabek
-1 core. +1 contrib. +10 github. Client-endpoint discovery is not currently addressed at all in the codebase. I don't think it is a job we should take up because needs will vary across applications and there isn't a general solution that will work for everybody. Gary 2010/3/3 Ted Zlatanov : > O

Re: finding Cassandra servers

2010-03-03 Thread Eric Evans
On Wed, 2010-03-03 at 10:05 -0600, Ted Zlatanov wrote: > I can do a patch+ticket for this in the core, making it optional and > off by default, or do the same for a contrib/ service as you > suggested. So I'd appreciate a +1/-1 quick vote on whether this can > go in the core to save me from rewrit

Re: finding Cassandra servers

2010-03-03 Thread Christopher Brind
So is the current general practice to connect to a known node, e.g. by ip address? If so, what happens if that node is down? Is the entire cluster effectively broken at that point? Or do clients simply maintain a list of nodes a just connect to the first available in the list? Thanks in advance

Re: finding Cassandra servers

2010-03-03 Thread Ted Zlatanov
On Wed, 03 Mar 2010 10:43:19 -0600 Eric Evans wrote: EE> It's entirely possible that you've identified a problem that others EE> can't see, or haven't yet encountered. I don't see it, but then maybe EE> I'm just thick. Getting back to my original question, how do you (and others) find usable Ca

Re: finding Cassandra servers

2010-03-03 Thread Ryan King
2010/3/3 Ted Zlatanov : > On Wed, 03 Mar 2010 10:43:19 -0600 Eric Evans wrote: > > EE> It's entirely possible that you've identified a problem that others > EE> can't see, or haven't yet encountered. I don't see it, but then maybe > EE> I'm just thick. > > Getting back to my original question, how

Re: finding Cassandra servers

2010-03-03 Thread Ian Holsman
+1 on erics comments We could create a branch or git fork where you guys could develop it, and if it reaches a usable state and others find it interesting it could get integrated in then On 3/3/10, Eric Evans wrote: > On Wed, 2010-03-03 at 10:05 -0600, Ted Zlatanov wrote: >> I can do a patch+tic

Re: finding Cassandra servers

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 09:04:37 -0800 Ryan King wrote: RK> Something like RRDNS is no more complex that managing a list of seed nodes. How do your clients at Twitter find server nodes? Do you just run them local to each node? My concern is that both RRDNS and seed node lists are vulnerable to ind

Re: finding Cassandra servers

2010-03-03 Thread Chris Goffinet
At Digg we have automated infrastructure. We use Puppet + our own in-house system that allows us to query pools of nodes for 'seeds'. Config files like storage-conf.xml are auto generated on the fly, and we randomly pick a set of seeds. Seeds can be per datacenter as well. As soon as a machine

Re: finding Cassandra servers

2010-03-03 Thread Brandon Williams
2010/3/3 Ted Zlatanov > On Wed, 3 Mar 2010 09:04:37 -0800 Ryan King wrote: > > RK> Something like RRDNS is no more complex that managing a list of seed > nodes. > > My concern is that both RRDNS and seed node lists are vulnerable to > individual node failure. They're not. That's why they're l

Re: finding Cassandra servers

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 12:08:06 -0500 Ian Holsman wrote: IH> We could create a branch or git fork where you guys could develop it, IH> and if it reaches a usable state and others find it interesting it IH> could get integrated in then Thanks, Ian. Would it be OK to do it as a patch in http://issue

Re: Anti-compaction Diskspace issue even when latest patch applied

2010-03-03 Thread Jonathan Ellis
You are proposing manually moving your data from a 5TB disk to a 12TB disk, and that is the only change you want to make? Then just keep the IP the same when you restart it after moving, and you won't have to do anything else, it will just look like the node was down temporarily and is now back up

Re: finding Cassandra servers

2010-03-03 Thread Jonathan Ellis
We appear to be reaching consensus that this is solving a non-problem, so I have closed that ticket. 2010/3/3 Ted Zlatanov : > On Wed, 3 Mar 2010 12:08:06 -0500 Ian Holsman wrote: > > IH> We could create a branch or git fork where you guys could develop it, > IH> and if it reaches a usable state

Re: finding Cassandra servers

2010-03-03 Thread Eric Evans
On Wed, 2010-03-03 at 16:49 +, Christopher Brind wrote: > So is the current general practice to connect to a known node, e.g. by > ip address? There are so many ways you could tackle this but... If you're talking about provisioning/startup of new nodes, just use the IPs of 2-4 nodes in the se

Re: finding Cassandra servers

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 09:19:28 -0800 Chris Goffinet wrote: CG> At Digg we have automated infrastructure. We use Puppet + our own CG> in-house system that allows us to query pools of nodes for CG> 'seeds'. Config files like storage-conf.xml are auto generated on CG> the fly, and we randomly pick a s

Re: finding Cassandra servers

2010-03-03 Thread Christopher Brind
Great, thanks Eric On 3 Mar 2010 17:27, "Eric Evans" wrote: On Wed, 2010-03-03 at 16:49 +, Christopher Brind wrote: > So is the current general practice to ... There are so many ways you could tackle this but... If you're talking about provisioning/startup of new nodes, just use the IPs of

Re: finding Cassandra servers

2010-03-03 Thread Ryan King
2010/3/3 Ted Zlatanov : > On Wed, 3 Mar 2010 09:04:37 -0800 Ryan King wrote: > > RK> Something like RRDNS is no more complex that managing a list of seed > nodes. > > How do your clients at Twitter find server nodes?  Do you just run them > local to each node? RRDNS + loading the token map to di

Re: finding Cassandra servers

2010-03-03 Thread Ryan King
On Wed, Mar 3, 2010 at 9:27 AM, Eric Evans wrote: > On Wed, 2010-03-03 at 16:49 +, Christopher Brind wrote: >> So is the current general practice to connect to a known node, e.g. by >> ip address? > > There are so many ways you could tackle this but... > > If you're talking about provisioning/

Re: finding Cassandra servers

2010-03-03 Thread Ted Zlatanov
On Wed, 3 Mar 2010 09:35:31 -0800 Ryan King wrote: >> With seed node lists, if I get unlucky I'd be trying to hit a downed >> node in which case I may as well just use RRDNS and deal with connection >> failure from the start. RK> Why would you not deal with connection failure? I mean it's simp

Re: Connect during bootstrapping?

2010-03-03 Thread Brian Frank Cooper
Sure... now that I understand what is going on it is easy to see the signs (looking in the data/usertable/stream directory, then looking for the tmp files). A small script (or some special logic in nodetool) that just looked for those signs even, and said "things are in progress," would be helpf

Re: Looking for work

2010-03-03 Thread Tatu Saloranta
On Wed, Mar 3, 2010 at 4:47 AM, Joseph Stein wrote: > I just started a LinkedIn group called "Cassandra NoSQL" > http://www.linkedin.com/groups?gid=2822930 I invite folks to join as > it has a job board and is a good place for networking stuff too =8^) > > I look forward to the continued nitty gri

Re: Looking for work

2010-03-03 Thread Jeffrey Johnson
wouldnt nosqlgigs.com ala djangogigs.com be simpler than ning? Seems like this could/should be about a 1-2 hour job for someone to stand up? Jeff On Wed, Mar 3, 2010 at 11:04 AM, Tatu Saloranta wrote: > On Wed, Mar 3, 2010 at 4:47 AM, Joseph Stein wrote: >> I just started a LinkedIn group calle

Re: Looking for work

2010-03-03 Thread Jesse McConnell
i dunno, one mail to a mailing list seemed to have worked pretty well :P cheers, jesse -- jesse mcconnell jesse.mcconn...@gmail.com On Wed, Mar 3, 2010 at 13:06, Jeffrey Johnson wrote: > wouldnt nosqlgigs.com ala djangogigs.com be simpler than ning? Seems > like this could/should be about a 1

Re: Looking for work

2010-03-03 Thread Jeremey.Barrett
And the only thing worse than a wiki is a wiki plus a blog plus five social networks plus a website built on Rails plus SMS notifications plus ... :) Jeremey. On Mar 2, 2010, at 10:43 PM, ext Jonathan Ellis wrote: > If there's one thing that's worse than a mailing list as a job board, > it's a

Re: Looking for work

2010-03-03 Thread Tatu Saloranta
On Wed, Mar 3, 2010 at 11:06 AM, Jeffrey Johnson wrote: > wouldnt nosqlgigs.com ala djangogigs.com be simpler than ning? Seems > like this could/should be about a 1-2 hour job for someone to stand > up? It all depends. Just wanted to mention it -- Ning networks are general purpose, not optimized

Re: why have ColumnFamilies?

2010-03-03 Thread Tatu Saloranta
On Wed, Mar 3, 2010 at 6:56 AM, Jonathan Ellis wrote: > I would rather move to a more flexible model ("as many levels of > nesting as you want") than a less-flexible one. +1 This is one of patterns that I have seen many times: providing for "as many levels as you want" may not be more difficult

RAID 0 stripe size

2010-03-03 Thread B. Todd Burruss
has anyone performed any testing on the affect of RAID 0 stripe size on cassandra performance? i have 3 drives in RAID 0 setup with 128k stripe size and i think the performance can be better once the sstable file sizes start to grow. any advice?

Re: failed to identify others in a 3-node ring

2010-03-03 Thread Jonathan Ellis
You probably assigned all nodes the same token. Don't do that. :) On Wed, Mar 3, 2010 at 4:41 AM, Pahud wrote: > Hello list, > I just setup a 3-node ring in a virtualbox bridging environment. By running > the 'cassandra -f' the log indicates it discovers other nodes but if I > execute 'nodeprobe

Re: failed to identify others in a 3-node ring

2010-03-03 Thread Pahud
oh yes! I just cloned my vbox after the 1st node generated its token so all other clones have the same token. I fixed this problem. Thanks. pahud On Thu, Mar 4, 2010 at 9:35 AM, Jonathan Ellis wrote: > You probably assigned all nodes the same token. Don't do that. :) > > > > > > > > > > > > >