Re: Cassandra users survey

2009-11-24 Thread Ted Zlatanov
On Mon, 23 Nov 2009 23:30:51 -0500 Matt Revelle mreve...@gmail.com wrote: 

MR Are you both using timestamps as row keys?  Would be great to hear
MR more details.

I'm using super column keys in a super column.

So let's say your resource is routerA.

Your data will be:

Row routerA
 SuperColumn Status
  SuperColumn key T0 (this morning)
   Columns { status: connected, location: USA, ... }
  SuperColumn key T1 (T0 + 10 seconds for example)
   Columns { status: disconnected, location: Europe, ... }
  SuperColumn key T2 (T1 + 10 seconds for example)
   Columns { status: connected, ... } // no location specified

Then you can say give me the latest super column key (limit = 1, 
order = reversed, start == end == 0) and you'll get T1.

Ted



Re: cassandra over hbase

2009-11-24 Thread Ted Zlatanov
On Mon, 23 Nov 2009 11:58:08 -0800 Jun Rao jun...@almaden.ibm.com wrote: 

JR After chatting with some Facebook guys, we realized that one potential
JR benefit from using HDFS is that the recovery from losing partial data in a
JR node is more efficient. Suppose that one lost a single disk at a node. HDFS
JR can quickly rebuild the blocks on the failed disk in parallel. This is a
JR bit hard to do in cassandra, since we can't easily find the data on the
JR failed disk from another node. 

This is an architectural issue, right?  IIUC Cassandra simply doesn't
care about disks.  I think that's a plus, actually, because it
simplifies the code and filesystems in my experience are better left up
to the OS.  For instance, we're evaluating Lustre and for many specific
reasons it's significantly better for our needs than HDFS, so HDFS would
be a tough sell.

JR So, when this happens, the whole node probably has to be taken out
JR and bootstrapped. The same problem exists when a single sstable file
JR is corrupted.

I think recovering a single sstable is a useful thing, and it seems like
a better problem to solve.

Ted



Re: Wish list [from users survey thread]

2009-11-24 Thread Ted Zlatanov
On Mon, 23 Nov 2009 13:45:09 -0600 Jonathan Ellis jbel...@gmail.com wrote: 

JE 1. Increment/decrement: atomic is a dirty word in a system
JE emphasizing availability, but incr/decr can be provided in an
JE eventually consistent manner with vector clocks.  There are other
JE possible approaches but this is probably the best fit for us.  We'd
JE want to allow ColumnFamilies with either traditional (for Cassandra)
JE long timestamps, or vector clocks, but not mixed.  The bad news is,
JE this is a very substantial change and will probably not be in 0.9
JE unless someone steps up to do the work.  (This would also cover
JE flexible conflict resolution, which came up as well.)

Just for my benefit, can someone explain the reasons why atomic inc/dec
are needed inside Cassandra if 64-bit time stamps and UUIDs are
available?  I have not needed them in my usage but am curious about
other schemas that do.

Thanks
Ted



Re: Wish list [from users survey thread]

2009-11-24 Thread Ian Holsman
well.
I'd like to see how many times a specific user hits the site, without having to 
add them up every time.

On Nov 24, 2009, at 9:47 AM, Ted Zlatanov wrote:

 On Mon, 23 Nov 2009 13:45:09 -0600 Jonathan Ellis jbel...@gmail.com wrote: 
 
 JE 1. Increment/decrement: atomic is a dirty word in a system
 JE emphasizing availability, but incr/decr can be provided in an
 JE eventually consistent manner with vector clocks.  There are other
 JE possible approaches but this is probably the best fit for us.  We'd
 JE want to allow ColumnFamilies with either traditional (for Cassandra)
 JE long timestamps, or vector clocks, but not mixed.  The bad news is,
 JE this is a very substantial change and will probably not be in 0.9
 JE unless someone steps up to do the work.  (This would also cover
 JE flexible conflict resolution, which came up as well.)
 
 Just for my benefit, can someone explain the reasons why atomic inc/dec
 are needed inside Cassandra if 64-bit time stamps and UUIDs are
 available?  I have not needed them in my usage but am curious about
 other schemas that do.
 
 Thanks
 Ted
 

--
Ian Holsman
i...@holsman.net





Re: Wish list [from users survey thread]

2009-11-24 Thread Jonathan Ellis
On Mon, Nov 23, 2009 at 1:45 PM, Jonathan Ellis jbel...@gmail.com wrote:
 9. Design documentation: also agreed.  Chris has started on this
 (http://wiki.apache.org/cassandra/ArchitectureSSTable) and I will try
 to at least sketch out some more this week.

http://wiki.apache.org/cassandra/ArchitectureInternals


Re: ring state out of sync in build 883477

2009-11-24 Thread Jonathan Ellis
Looks like this is another symptom of
https://issues.apache.org/jira/browse/CASSANDRA-150, which is on track
to be fixed soon

On Tue, Nov 24, 2009 at 11:19 AM, B. Todd Burruss bburr...@real.com wrote:
 they all were restarted at various times.

 for vmguest85 the other three are seed nodes.


 On Mon, 2009-11-23 at 19:21 -0600, Jonathan Ellis wrote:
 So vmquest85 was restarted, but gen-app02 hasn't told it that there
 are 2 other nodes that are down?

 Which one is the seed node?

 On Mon, Nov 23, 2009 at 6:38 PM, B. Todd Burruss bburr...@real.com wrote:
  i'm observing the following on a cluster that started with 4 nodes.  i have
  been killing and restarting the various nodes as i test cassandra and now
  i'm seeing a lot of NotFoundException exceptions in the client because what
  i believe is ring state out of sync between the two nodes that are still up
  and available.  The first ring state shown below reflects the current state
  of the cluster.  Also I have seen similar issues when one of the nodes
  thinks another node is still available when in fact it has been killed.  it
  seems to be related to bringing up, killing nodes too fast and not letting
  them figure out when a node is dead.  in this case i see 
  TimedOutException
  related to NIO SocketChannel class.
 
  thx!
 
  [cassandra.883477]$ bin/nodeprobe -host gen-app02.dev.real.com -port 8080
  ring
  Address       Status     Load
  Range                                      Ring
 
  144038903974614862325597275257769797985
  172.27.128.186Down       22.17 MB
  31124469348629903091013930339840898757     |--|
  172.27.128.23 Down       22.17 MB
  64378740291415296162944450043143967518     |   |
  172.27.128.22 Up         22.17 MB
  121134220722269938669001112695509564769    |   |
  172.27.128.185Up         14.69 MB
  144038903974614862325597275257769797985    |--|
 
  [cassandra.883477]$ bin/nodeprobe -host vmguest85.prognet.com -port 8080
  ring
  Address       Status     Load
  Range                                      Ring
 
  144038903974614862325597275257769797985
  172.27.128.22 Up         22.17 MB
  121134220722269938669001112695509564769    |--|
  172.27.128.185Up         14.69 MB
  144038903974614862325597275257769797985    |--|
  [cassandra.883477]$
 
 
 





Re: Cassandra access control

2009-11-24 Thread Ted Zlatanov
Looks like I could use:

PAM auth: http://jpam.sourceforge.net/

LDAP/AD auth: http://www.openldap.org/jldap/

The first is definitely OK (Apache license), but I'm not sure about the
second one (OpenLDAP public license).  Looks BSDish to me.  It claims to
support Windows auth and is officially provided by the OpenLDAP project.
Has anyone used it?

Thanks
Ted



Re: Cassandra users survey

2009-11-24 Thread Rich Atkinson
We at Platform46 are building an on-premise enterprise appliance which
provides twitter-like, open-follower, short messaging services for
internal corporate networks.

We are using Cassandra, Python and RabbitMQ to help us build a
scalable solution where appliances may be configured as true peers and
installed in different data centers.

--
rich

http://www.platform46.com/


urgent: missing data!

2009-11-24 Thread kevin
hi guys
i have been using cassandra this version

Path: .
URL: http://svn.apache.org/repos/asf/incubator/cassandra/trunk
Repository Root: http://svn.apache.org/repos/asf
Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
Revision: 831540
Node Kind: directory
Schedule: normal
Last Changed Author: jbellis
Last Changed Rev: 831433
Last Changed Date: 2009-10-30 12:45:38 -0700 (Fri, 30 Oct 2009)

./bin/nodeprobe -host localhost info
29814395632524962303611017038378268216
Load : 753.64 MB
Generation No: 1259113951
Uptime (seconds) : 443
Heap Memory (MB) : 121.32 / 12285.94




i just have one server. and the config is here

http://pastie.org/713889

all of a sudden super columns stored in the column family Supern1 are
disappearing. I tried flushing the cassandra node and starting again, and
still the same problem. Any suggestions how to figure out what the problem
is and to retrieve back the data?
The size of the cassandra data directory hasnt reduced.
thanks a lot!


Re: Cassandra users survey

2009-11-24 Thread matthew hawthorne
I work for Comcast, and we have tons of data that we are migrating
into non-relational storage.

we recently evaluated cassandra, riak, voldemort, and hdfs.  I focused
on cassandra, this is why you may have seen me asking dumb questions
over IRC :-)

A few desirables for cassandra:

1) I'm not a huge fan of thrift.  it would be nice if the client jar
came packaged with cassandra  (I had to manually build it from the
thrift-generated java).

also, the lack of streaming support is troubling.  a lot of our
internal services are http, and I'd like to be able to connect a
column's input stream to the output stream of an http response,
instead of loading it all into memory.

2) a practical/situational view of managing a cassandra cluster
(deployment guide, maybe) would be nice.  for my evaluation, I was
seeking answers to questions like:

- how do I add capacity?

- how do I remove capacity? (I believe you're calling it decommissioning)

- what files should I backup?

- how can I mitigate the risk of lost writes during a power failure?

- how can I ensure that my writes go to multiple data centers?

I think overall the docs are good (I found answers to most of my
questions), but since a lot of groups are analyzing cassandra in this
fashion, and needing to make a sales pitch to management, ops, etc. --
it would be nice to have a more comprehensive deployment guide.

you fellows at Rackspace should consider offering Cassandra support.
I know that the ability to have some paid professionals come in and
train our ops team on how to monitor + manage a cassandra cluster
would have made a huge difference for us.

thanks!

-matt


On Fri, Nov 20, 2009 at 4:17 PM, Jonathan Ellis jbel...@gmail.com wrote:
 Hi all,

 I'd love to get a better feel for who is using Cassandra and what kind
 of applications it is seeing.  If you are using Cassandra, could you
 share what you're using it for and what stage you are at with it
 (evaluation / testing / production)? Also, what alternatives you
 evaluated/are evaluating would be useful.  Finally, feel free to throw
 in I'd love to use Cassandra if only it did X wishes. :)

 I can start: Rackspace is using Cassandra for stats collection
 (testing, almost production) and as a backend for the Mail  Apps
 division (early testing).  We evaluated HBase, Hypertable, dynomite,
 and Voldemort as well.

 Thanks,

 -Jonathan

 (If you're in stealth mode or don't want to say anything in public,
 feel free to reply to me privately and I will keep it off the record.)



Re: urgent: missing data!

2009-11-24 Thread kevin
On Tue, Nov 24, 2009 at 6:05 PM, kevin kevincastigli...@gmail.com wrote:

 hi guys
 i have been using cassandra this version

 Path: .
 URL: http://svn.apache.org/repos/asf/incubator/cassandra/trunk
 Repository Root: http://svn.apache.org/repos/asf
 Repository UUID: 13f79535-47bb-0310-9956-ffa450edef68
 Revision: 831540
 Node Kind: directory
 Schedule: normal
 Last Changed Author: jbellis
 Last Changed Rev: 831433
 Last Changed Date: 2009-10-30 12:45:38 -0700 (Fri, 30 Oct 2009)

 ./bin/nodeprobe -host localhost info
 29814395632524962303611017038378268216
 Load : 753.64 MB
 Generation No: 1259113951
 Uptime (seconds) : 443
 Heap Memory (MB) : 121.32 / 12285.94




 i just have one server. and the config is here

 http://pastie.org/713889

 all of a sudden super columns stored in the column family Supern1 are
 disappearing. I tried flushing the cassandra node and starting again, and
 still the same problem. Any suggestions how to figure out what the problem
 is and to retrieve back the data?
 The size of the cassandra data directory hasnt reduced.
 thanks a lot!


this is the settings in file cassandra.in.sh
http://pastie.org/713922