Re: error using get_range_slice with random partitioner

2010-08-09 Thread Thomas Heller
am the only one encountering a bug My money is on 1) of course.  I can check the thrift API against what my Scala client is calling under the hood. -Adam -Original Message- From: th.hel...@gmail.com on behalf of Thomas Heller Sent: Fri 8/6/2010 7:17 PM To: user

Re: error using get_range_slice with random partitioner

2010-08-07 Thread Thomas Heller
On Sat, Aug 7, 2010 at 11:41 AM, Peter Schuller peter.schul...@infidyne.com wrote: Remember the returned results are NOT sorted, so you whenever you are dropping the first by default, you might be dropping a good one. At least that would be my guess here. Sorry I may be forgetting something

Re: Columns limit

2010-08-07 Thread Thomas Heller
Ok, I think the part I was missing was the concatenation of the key and partition to do the look ups. Is this the preferred way of accomplishing needs such as this? Are there alternatives ways? Depending on your needs you can concat the row key or use super columns. How would one then query

Re: error using get_range_slice with random partitioner

2010-08-06 Thread Thomas Heller
Wild guess here, but are you using start_token/end_token here when you should be using start_key? Looks to me like you are trying end_token = ''. HTH, /thomas On Thursday, August 5, 2010, Adam Crain adam.cr...@greenenergycorp.com wrote: Hi, I'm on 0.6.4. Previous tickets in the JIRA in

Re: error using get_range_slice with random partitioner

2010-08-06 Thread Thomas Heller
of the random partitioner. I really don't care about the order of the iteration, but only each key once and that I see all keys is important. -Adam -Original Message- From: th.hel...@gmail.com on behalf of Thomas Heller Sent: Fri 8/6/2010 7:27 AM To: user@cassandra.apache.org

Re: error using get_range_slice with random partitioner

2010-08-06 Thread Thomas Heller
Another way to do it is to filter results to exclude columns received twice due to being on iteration end points. Well, depends on the size of your rows, keeping lists of 1mil+ column names will eventually become rally slow (at least in ruby). This is useful because it is not always

Re: error using get_range_slice with random partitioner

2010-08-06 Thread Thomas Heller
On Sat, Aug 7, 2010 at 1:05 AM, Adam Crain adam.cr...@greenenergycorp.com wrote: I took this approach... reject the first result of subsequent get_range_slice requests. If you look back at output I posted (below) you'll notice that not all of the 30 keys [key1...key30] get listed! The

Re: Columns limit

2010-08-06 Thread Thomas Heller
Howdy, thought I jump in here. I did something similar, meaning I had lots of items coming in per day and wanted to somehow partition them to avoid running into the column limit (it was also logging related). Solution was pretty simple, log data is immutable, so no SuperColumn needed.

Re: Columns limit

2010-08-06 Thread Thomas Heller
Thanks for the suggestion. I've somewhat understand all that, the point where my head begins to explode is when I want to figure out something like Continuing with your example: Over the last X amount of days give me all the logs for remote_addr:XXX. I'm guessing I would need to create a

Re: Iterate all keys - doing it as the faq fails for me :(

2010-07-13 Thread Thomas Heller
I'm not entirely sure but I think you can only use get_range_slices with start_key/end_key on a cluster using OrderPreservingPartitioner. Dont know if that is intentional or buggy like Jonathan suggest but I saw the same duplicates behaviour when trying to iterate all rows using RP and

Learning-by-doing (also announcing a new Ruby Client Codename: Greek Architect)

2010-06-18 Thread Thomas Heller
Howdy! So, last week I finally got around to playing with Cassandra. After a while I understood the basics. To test this assumption I started working on my own Client implementation since Learning-by-doing is what I do and existing Ruby Clients (which are awesome) already abstracted too much for

Beginner Assumptions

2010-06-12 Thread Thomas Heller
Hey, I've been planning to play arround with Cassandra for quite some time and finally got arround to it. I like what I've seen/used so far alot but my SQL-brain keeps popping up and trying to convince me that SQL is fine. Anyways, I want to store some (alot of) Time Series data in Cassandra and