Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 10:14 AM, Jonathan Ellis wrote: > On Mon, Mar 8, 2010 at 12:07 PM, Erik Holstad > wrote: > > So why is it again that the value field in the Column cannot be null if > it > > is not the > > value field in the map, but just a part of the value f

Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 9:30 AM, Jonathan Ellis wrote: > On Mon, Mar 8, 2010 at 11:22 AM, Erik Holstad > wrote: > > I was probably a little bit unclear here. I'm wondering about the two > byte[] > > in Column. > > One for name and one for value. I w

Re: Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
On Mon, Mar 8, 2010 at 9:10 AM, Jonathan Ellis wrote: > On Mon, Mar 8, 2010 at 11:07 AM, Erik Holstad > wrote: > > Why is it that null column values are not allowed? > > It's semantically unnecessary and potentially harmful at an > implementation level. (Many java

Reason for not allowing null values for in Column

2010-03-08 Thread Erik Holstad
Hey! Been looking at the src and have a couple of questions: Why is it that null column values are not allowed? What is the reason for using a ConcurrentSkipListMap for columns_ in ColumnFamily compared to using the set version and use the comparator to sort on the name field in IColumn? For the

Re: ColumnFamilies vs composite rows in one table.

2010-03-06 Thread Erik Holstad
Thanks David and Jonathan! @David Yes rows doesn't have a name, I'm just using the word name for anything, like cluster name, table name, row name etc, that is my bad. Yes, I did change two things, that was probably stupid, but the reason for the second change is space efficiency. You are totall

ColumnFamilies vs composite rows in one table.

2010-03-05 Thread Erik Holstad
What are the benefits of using multiple ColumnFamilies compared to using a composite row name? Example: You have messages that you want to index on sent and to. So you can either have ColumnFamilyFrom:userTo:{userFrom->messageid} ColumnFamilyTo:userFrom:{userTo->messageid} or something like Colu

Re: Storage format

2010-03-02 Thread Erik Holstad
Thank you!

Re: Storage format

2010-03-01 Thread Erik Holstad
On Mon, Mar 1, 2010 at 2:51 PM, Jonathan Ellis wrote: > On Mon, Mar 1, 2010 at 4:49 PM, Erik Holstad > wrote: > > Haha! > > Thanks. Well I'm z little bit worried about this but since the indexes > are > > pretty > > small I don't think it is going to

Re: Storage format

2010-03-01 Thread Erik Holstad
Haha! Thanks. Well I'm z little bit worried about this but since the indexes are pretty small I don't think it is going to be too bad. But was mostly thinking about performance and and having the index row as a bottleneck for writing, since the partition is per row. -- Regards Erik

Re: Is Cassandra a document based DB?

2010-03-01 Thread Erik Holstad
Yes, Cassandra has supercolumns and HBase versions and you are probably correct that supercolumns are more used than versions, but I don't really think you can compare them since versions are not a serialized structure. The reason that I didn't include table and family in the mapping is as I've u

Re: Storage format

2010-03-01 Thread Erik Holstad
So that is kinda of what I want to do, but I want to go from a row with multiple columns to multiple rows with one column, maybe I'm not hearing you here and you are trying to tell me that the columns, not supercolumns, are not stored together in a row structure? -- Regards Erik

Re: Is Cassandra a document based DB?

2010-03-01 Thread Erik Holstad
On Mon, Mar 1, 2010 at 4:41 AM, Brandon Williams wrote: > On Mon, Mar 1, 2010 at 5:34 AM, HHB wrote: > >> >> What are the advantages/disadvantages of Cassandra over HBase? >> > > Ease of setup: all nodes are the same. > > No single point of failure: all nodes are the same. > > Speed: http://www.

Re: Storage format

2010-03-01 Thread Erik Holstad
Sorry about that! Continuing: And in that case when using rows as indexes instead of columns we only need to read that specific row and might be more efficient in that case than to read a big row every time? -- Regards Erik

Re: Storage format

2010-03-01 Thread Erik Holstad
ure first need to be deserialized and then we can get the columns we are looking for? And in that case when using rows as indexes instead of columns we only need to read On Mon, Mar 1, 2010 at 11:24 AM, Jonathan Ellis wrote: > On Mon, Mar 1, 2010 at 12:50 PM, Erik Holstad > wrote: >

Storage format

2010-03-01 Thread Erik Holstad
I've been looking at the source, but not quite find the things I'm looking for, so I have a few questions. Are columns for a row stored in a serialized data structure on disk or stored individually and put into a data structure when the call is being made? Because of the slice query, does that mean

Deleted rows showing up when doing a get_range_slice query

2010-02-24 Thread Erik Holstad
When deleting rows from a table and then using a get_range_slice query, the keys or the deleted rows show up, with no name/value pairs. What is the reasoning behind this? I have also seen a weird issue when using a md5 generated byte[] as a column name, doesn't seem like it actually work. I can't

Re: Getting the keys in your system?

2010-02-24 Thread Erik Holstad
Haha! Yeah, fortunately we are only in the testing phase so this is not that big of a deal. Thanks a lot! -- Regards Erik

Re: Getting the keys in your system?

2010-02-24 Thread Erik Holstad
Thanks Jonathan! We are thinking about moving over to the OPP to be able to be able to do this and to use an md5 for some of the data just to get the data written to different nodes for some of the cases where order is not really needed. Is there anything we need to think about when making the swi

Getting the keys in your system?

2010-02-24 Thread Erik Holstad
If you have a system setup using the RandomPartitioner and have a couple of indexes setup for your data but realize that you need to add another index. How do you get the keys for your data, so that you can know where to point your indexes? I guess what I'm really asking is, is there a way to get y

Re: Row with many columns

2010-02-18 Thread Erik Holstad
Hey Rusian! Maybe you should do what Ted suggested, look at what Cassandra is good at and then try to change your data structure from 10 rows with 10 columns to maybe 10 rows with 10 columns each. I think the best way to solve a problem is to look at the tools that you have at hand and try

Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Don't be silly, thanks a lot for helping me out! -- Regards Erik

Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
I don't understand what you mean ;) Will see what happens when we are done with this first project, will see if we can get some time to give back. -- Regards Erik

Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
s I dont have anything reporting-ish like you describe with > SuperColumns (yet). I will defer to more experienced folks with this. > > Regards, > -Nate > > > On Tue, Feb 2, 2010 at 3:02 PM, Erik Holstad > wrote: > > @Nathan > > So what I'm planning to do i

Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
e number of SuperColumns for a key, but make > > sure you understand get_slice vs. get_range_slice before you commit to > > a design. Hopefully I understood your example correctly, if not, do > > you have anything more concrete? > > > > Cheers, > > -Nate > >

Re: Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Thanks Nate for the example. I was thinking more a long the lines of something like: If you have a family Data : { row1 : { col1:val1, row2 : { col1:val2, ... } } Using Sorts : { sort_row : { sortKey1_datarow1: [], sortKey2_datarow2: [] } } Instead of Sorts : {

Re: Key/row names?

2010-02-02 Thread Erik Holstad
Thank you! On Tue, Feb 2, 2010 at 9:41 AM, Jonathan Ellis wrote: > On Tue, Feb 2, 2010 at 11:36 AM, Erik Holstad > wrote: > > Is there a way to use a byte[] as the key instead of a string? > > no. > > > If not what is the main reason for using strings for the key

Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 9:57 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 11:39 AM, Erik Holstad wrote: > >> >> Wow that sounds really good. So you are saying if I set it to reverse sort >> order and count 10 for the first round I get the last 10, >> for the

Using column plus value or only column?

2010-02-02 Thread Erik Holstad
Sorry that there are a lot of questions from me this week, just trying to better understand the best way to use Cassandra :) Let us say that you know the length of your key, everything is standardized, are there people out there that just tag the value onto the key so that you don't have to pay t

Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 9:35 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 11:29 AM, Erik Holstad wrote: > >> Thanks guys! >> So I want to use sliceRange but thinking about using the count parameter. >> For example give me >> the first x columns, next call

Key/row names?

2010-02-02 Thread Erik Holstad
Is there a way to use a byte[] as the key instead of a string? If not what is the main reason for using strings for the key but the columns and the values can be byte[]? Is it just to be able to use it as the key in a Map etc or are there other reasons? -- Regards Erik

Re: Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
? On Tue, Feb 2, 2010 at 9:23 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 11:21 AM, Erik Holstad wrote: > >> Hey! >> I'm looking for a comparator that sort columns in reverse order on for >> example bytes? >> I saw that you can write your own comp

Reverse sort order comparator?

2010-02-02 Thread Erik Holstad
Hey! I'm looking for a comparator that sort columns in reverse order on for example bytes? I saw that you can write your own comparator class, but just thought that someone must have done that already. -- Regards Erik

Re: Best design in Cassandra

2010-02-02 Thread Erik Holstad
On Tue, Feb 2, 2010 at 7:45 AM, Brandon Williams wrote: > On Tue, Feb 2, 2010 at 9:27 AM, Erik Holstad wrote: >> >> A supercolumn can still only compare subcolumns in a single way. >>> >> Yeah, I know that, but you can have a super column per sort order without &g

Re: How to retrieve keys from Cassandra ?

2010-02-02 Thread Erik Holstad
Hi Sebastien! I'm totally new to Cassandra, but as far as I know there is no way of getting just the keys that are in the database, they are not stored separately but only with the data itself. Why do you want a list of keys, what are you going to use them for? Maybe there is another way of solvin

Re: Best design in Cassandra

2010-02-02 Thread Erik Holstad
On Mon, Feb 1, 2010 at 3:31 PM, Brandon Williams wrote: > On Mon, Feb 1, 2010 at 5:20 PM, Erik Holstad wrote: > >> Hey! >> Have a couple of questions about the best way to use Cassandra. >> Using the random partitioner + the multi_get calls vs order preservatio

Re: Sample applications

2010-02-02 Thread Erik Holstad
Hi Carlos! I'm also really new to Cassandra but here are a couple of links that I found useful: http://wiki.apache.org/cassandra/ClientExamples http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model and one of the presentations like: http://www.slideshare.net/jhammerb/data-presentations-ca

Best design in Cassandra

2010-02-01 Thread Erik Holstad
Hey! Have a couple of questions about the best way to use Cassandra. Using the random partitioner + the multi_get calls vs order preservation + range_slice calls? What is the benefit of using multiple families vs super column? For example in the case of sorting in different orders. One good thing

Re: Internal structure of api calls

2010-02-01 Thread Erik Holstad
Thanks a lot Brandon!

Internal structure of api calls

2010-02-01 Thread Erik Holstad
Hey guys! I'm totally new to Cassandra and have a couple of question about the internal structure of some of the calls. When using the slicerange(count) for the get calls, does the actual result being truncated on the server or is it happening on the client ie is it more efficient than the regula