R: Re: Sorting in Cassandra

2010-10-06 Thread cbert...@libero.it
Aaron, first of all thanks for your time. 1. You cannot return just the super columns, you have to get their sub columns as well. The returned data is ordered, please provide and example of where it is not. I don't know what I did before but now I checked and data are sorted as I expected

Re: Cassandra + Pig + PHP

2010-10-06 Thread Petr Odut
Hi, PHP: I basicaly need to start pig program from a php script (via thrift or something..?) PIG: there is a LoadFunc that loads data from Cassandra, is there also a StoreFunc? On Tue, Oct 5, 2010 at 9:22 PM, Aaron Morton aa...@thelastpickle.comwrote: There is an example for pig in contrib/pig

Null Pointer Exception / Secondary Indices

2010-10-06 Thread J T
Hi, I've been battling against some errors that only seem to crop up when I'm messing around with secondary indices in 0.7-beta2. Namely I seem to get errors like this start to happen, after I 'delete' a row in a CF that has a couple of secondary indices on it and then at some point later try to

Re: Cassandra + Pig + PHP

2010-10-06 Thread Jeff Zhang
Pig do not have thrift interface, But I believe you can create it. And another way I think is create a web service for your pig service, and call the web service in your php. On Wed, Oct 6, 2010 at 4:17 PM, Petr Odut petr.o...@gmail.com wrote: Hi, PHP: I basicaly need to start pig program

Retaining commit logs

2010-10-06 Thread Narendra Sharma
Cassandra Version: 0.6.5 I am running a long duration test and I need to keep the commit log to see the sequence of operations to debug few application issues. Is it possible to retain the commit logs? Apart from increasing the value of CommitLogRotationThresholdInMB what is the other way to

Re: R: Re: Sorting in Cassandra

2010-10-06 Thread Aaron Morton
Your sort of right for point two. The comparators you define in the keyspace def are for the names of the columns (or super columns) not their values. So it's not possible to sort by the value of your name column, you'll need to do it client side. The indexing features in 0.7 can sort the

Re: Retaining commit logs

2010-10-06 Thread Aaron Morton
If you turn the log level up to DEBUG that will include information about each request. Would that help? You could restrict it by setting a logging configuration for the specific classes that output the message you are interested in. Not sure about retaining the commit logs. Aaron On 6 Oct

Re: Cassandra + Pig + PHP

2010-10-06 Thread Jeremy Hanna
PHP: I basicaly need to start pig program from a php script (via thrift or something..?) Can't you just execute a Pig script with PHP by calling Pig with a PHP exec function call? I'm not sure what you're trying to do with it, but that's one way you could do it. PIG: there is a LoadFunc

Re: Cassandra + Pig + PHP

2010-10-06 Thread Aaron Morton
AFAIK you can submit a pig job to the Hadoop job server via the pig command line interface. If you have not done so already have a read of the Hadoop Book it discusses pig as well http://bit.ly/9gGRyH Not sure how you go about monitoring the hadoop job though. There is support for hadoop to

Re: Cassandra + Pig + PHP

2010-10-06 Thread Jeremy Hanna
Yes - the HadoopSupport should be updated for the functionality that is added to 0.7. It's still a little in flux. There is an output format and output streaming support on trunk/0.7 beta2. The output format has a java example in the contrib/word_count example code. The output streaming,

Re: Tuning cassandra to use less memory

2010-10-06 Thread Oleg Anastasyev
Hi All,We're currently starting to get OOM exceptions in our cluster. I'm trying to push the limiations of our machines. Currently we have 1.7 G memory (ec2-medium)I'm wondering if by tweaking some of cassandra's configuration settings, is it possible to make it live in peace and less memory.

Re: Retaining commit logs

2010-10-06 Thread Oleg Anastasyev
Is it possible to retain the commit logs? In off-the-shelf cassandra 0.6.5 this is not possible, AFAIK. I developed a patch we use internally in our company for commit log archivation and replay. I can share a patch with you, if you dare patching cassandra sources by yourself ;-) PS. Are

Read Latency

2010-10-06 Thread Wayne
I have been seeing some strange trends in read latency that I wanted to throw out there to find some explanations. We are running .6.5 in a 10 node cluster rf=3. We find that the read latency reported by the cfstats is always about 1/4 of the actual time it takes to get the data back to the python

Column TTL

2010-10-06 Thread Dan Hendry
Hi, I have a quick and quite frankly ridiculous question regarding the column TTL value; what are the time units? Milliseconds/seconds or something else? I initially thought milliseconds given that it is Java and that is what timestamps are in but the data type used in the setTll() Java

Re: Column TTL

2010-10-06 Thread Michal Augustýn
Hi, I checked Cassandra.thrift file and found: @param ttl. An optional, positive delay (in seconds) after which the column will be automatically deleted. Augi 2010/10/6 Dan Hendry d...@ec2.dustbunnytycoon.com Hi, I have a quick and quite frankly ridiculous question regarding the column

RE: Column TTL

2010-10-06 Thread Dan Hendry
Ah, great, thanks. I was looking under trunk/src/java/... instead of trunk/interface/... Dan From: Michal Augustýn [mailto:augustyn.mic...@gmail.com] Sent: October-06-10 10:38 To: user@cassandra.apache.org Subject: Re: Column TTL Hi, I checked Cassandra.thrift file and found:

RE: Null Pointer Exception / Secondary Indices

2010-10-06 Thread Stu Hood
Hey JT, I believe this issue should be fixed by CASSANDRA-1571... if you're able to test that patch, it would be very helpful. Thanks, Stu -Original Message- From: J T jt4websi...@googlemail.com Sent: Tuesday, October 5, 2010 9:50pm To: cassandra-u...@incubator.apache.org Subject: Null

get keys based on values??

2010-10-06 Thread Brayton Thompson
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ok, I am VERY new to Cassandra and trying to get my head around its core ideas. So lets say I have a CF of Users that contains all the info I would ever want to know about them. One day I decide(for some reason) that I want to send a mass email to

Re: Retaining commit logs

2010-10-06 Thread Peter Schuller
PS. Are other ppl interested in this functionality ? I could file it to JIRA as well... I was about to post that such a thing was useful for point-in-time recovery before reading your post, so yes :) -- / Peter Schuller

Re: Tuning cassandra to use less memory

2010-10-06 Thread Utku Can Topçu
Hi Oleg, I've been also looking into these after some research. I've been tacking with: 1. Setting the default max and min heap from 1G to 1500M. 2. I'm not using row caches, and the key caches are set to 1000, before they were 200K as default 3. I've lowered the memtable throughput to 32MB 4.

Re: Query on sstable2json - possible bug

2010-10-06 Thread Narendra Sharma
Has any one used sstable2json on 0.6.5 and noticed the issue I described in my email below? This doesn't look like data corruption issue as sstablekeys shows the keys. Thanks, Naren On Tue, Oct 5, 2010 at 8:09 PM, Narendra Sharma narendra.sha...@gmail.comwrote: 0.6.5 -Naren On Tue, Oct

Re: Retaining commit logs

2010-10-06 Thread Narendra Sharma
Thanks Oleg! Could you please share the patch. I have build Cassandra before from source. I can definitely give it try. -Naren On Wed, Oct 6, 2010 at 3:55 AM, Oleg Anastasyev olega...@gmail.com wrote: Is it possible to retain the commit logs? In off-the-shelf cassandra 0.6.5 this is not

Newbie Question about restarting Cassandra

2010-10-06 Thread Alberto Velandia
Hi I've stopped cassandra hitting Ctrl + Z and tried to restart it and got this message: INFO 11:46:16,039 JNA not found. Native methods will be disabled. INFO 11:46:16,159 DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap ERROR 11:46:16,449 Fatal exception during

Re: Newbie Question about restarting Cassandra

2010-10-06 Thread Norman Maurer
CTRL + Z does not stop a programm it just suspend it. You will need to resume it with fg and then hit CTRL + C to stop it. For some basic background: http://linuxreviews.org/beginner/jobs/ Bye, Norman 2010/10/6 Alberto Velandia b...@yogadigital.net: Hi I've stopped cassandra hitting Ctrl + Z

Re: Newbie Question about restarting Cassandra

2010-10-06 Thread Alberto Velandia
So, is ctrl + C how you stop cassandra? or I'm i better doing it another way? Thanks On Oct 6, 2010, at 11:59 AM, Norman Maurer wrote: CTRL + Z does not stop a programm it just suspend it. You will need to resume it with fg and then hit CTRL + C to stop it. For some basic background:

API mismatch Cassandra and Pycassa versions

2010-10-06 Thread Dipti Mathur
Hi All, I was trying to connect to cassandra using the pycassa module. Looks like there is a API cersion mismatch. Any ideas where I can get the right version of the APIs? I am using: INFO 22:11:50,860 Cassandra version: 0.7.0-beta2 INFO 22:11:50,861 Thrift API version: 17.1.0 Error message on

Re: get keys based on values??

2010-10-06 Thread Tyler Hobbs
If you're interested in only checking part of a column's value, you can generally just store that part of the value in a different column. So, have an email_addr column and a email_domain column, which stores aol.com, for example. Then you can just use a secondary index on the email_domain

Re: API mismatch Cassandra and Pycassa versions

2010-10-06 Thread Tyler Hobbs
Hmm, I thought the Thrift API was moved to 18 before beta2 was released. I'll make a matching release for pycassa in just a moment. Thanks for the notice. By the way, there is a pycassa specific mailing list, pycassa-disc...@googlegroups.com - Tyler On Wed, Oct 6, 2010 at 12:13 PM, Dipti

Re: get keys based on values??

2010-10-06 Thread Brayton Thompson
Are secondary index's available in .6.5? or are they only in .7? On Oct 6, 2010, at 1:15 PM, Tyler Hobbs wrote: If you're interested in only checking part of a column's value, you can generally just store that part of the value in a different column. So, have an email_addr column and a

Re: get keys based on values??

2010-10-06 Thread Norman Maurer
Only in 0.7 Bye, Norman 2010/10/6 Brayton Thompson thomp...@grnoc.iu.edu: Are secondary index's available in .6.5? or are they only in .7? On Oct 6, 2010, at 1:15 PM, Tyler Hobbs wrote: If you're interested in only checking part of a column's value, you can generally just store that part

Re: get keys based on values??

2010-10-06 Thread Matthew Dennis
As Norman said, secondary indexes are only in .7 but you can create standard indexes in both .6 and .7 Basically have a email_domain_idx CF where the row key is the domain and the column names have the row id of the user (the column value is unused in this scenario). This sounds basically like

Re: Null Pointer Exception / Secondary Indices

2010-10-06 Thread J T
Hi, On a first pass, that patch seems to have solved the problem. I'll be testing that functionality repeatedly in the next day or so I'll let you know how it fairs. Thanks Jason On Wed, Oct 6, 2010 at 4:06 PM, Stu Hood stu.h...@rackspace.com wrote: Hey JT, I believe this issue should be

Re: Tuning cassandra to use less memory

2010-10-06 Thread Rob Coli
On 10/6/10 9:05 AM, Utku Can Topçu wrote: The nodes are still swapping, even though the swappiness is set to zero right now. After swapping comes the OOM. https://issues.apache.org/jira/browse/CASSANDRA-1214 ? =Rob

Re: Newbie Question about restarting Cassandra

2010-10-06 Thread Scott Mann
Yes. ctrl-C if running in the foreground. Use kill pid, if running in the background (see the man page for kill if you are unfamiliar with it). Killing Cassandra is the only way to terminate it. On Wed, Oct 6, 2010 at 11:03 AM, Alberto Velandia b...@yogadigital.net wrote: So, is ctrl + C how you

Re: get keys based on values??

2010-10-06 Thread Brayton Thompson
Ok, let me tweak the scenario a tiny bit. What if I wanted something extremely arbitrary, for instance... simple comparisons like a WHERE clause in SQL get Users.someuser['uuid'] where Users.someuser['age']33 From what i've read this functionality defeats the point of Cassandra

Strange Behavior : Commitlog data is not flushed

2010-10-06 Thread Rana Aich
Hello Experts, I see a queer behavior from on of the Cassandra nodes in my cluster where the data is not flushed off Commitlogs and the Commitlog file grows in number. I was inserting the data into the cluster and since yesterday this node had more than 900 commitlog files. -rw-r--r-- 1 dev dev

Re: Strange Behavior : Commitlog data is not flushed

2010-10-06 Thread Jonathan Ellis
Commitlog segments remain until all the data in them has been flushed. Reduce MemtableFlushAfterMinutes. If I had to guess without your error log why the node went down, I would guess you exceeded the open file handle allowance. You can increase that with the standard ulimit or

Re: get keys based on values??

2010-10-06 Thread Morten Wegelbye Nissen
So would my best bet be to simply get ALL of my users uuids and ages, then throw away all of those that do not meet the required test? And in fact this is also what a traditional database does when you need table scan. And this will happen if you have not prepared an index on that column. (

Re: get keys based on values??

2010-10-06 Thread Jonathan Ellis
On Wed, Oct 6, 2010 at 1:49 PM, Brayton Thompson thomp...@grnoc.iu.edu wrote: Ok, let me tweak the scenario a tiny bit. What if I wanted something extremely arbitrary, for instance... simple comparisons like a WHERE clause in SQL get Users.someuser['uuid'] where Users.someuser['age']    33

Re: get keys based on values??

2010-10-06 Thread Brayton Thompson
Ok, Thank you all. More reading to do :) On Oct 6, 2010, at 3:21 PM, Jonathan Ellis wrote: On Wed, Oct 6, 2010 at 1:49 PM, Brayton Thompson thomp...@grnoc.iu.edu wrote: Ok, let me tweak the scenario a tiny bit. What if I wanted something extremely arbitrary, for instance... simple

Re: Tuning cassandra to use less memory

2010-10-06 Thread Aaron Morton
There is an explanation of how to lock the JVM into memory herehttp://www.riptano.com/blog/whats-new-cassandra-065However from the JVM Heap Size section herehttp://wiki.apache.org/cassandra/MemtableThresholdsFor a rough rule of thumb, Cassandra's internal datastructures will require

Re: Newbie Question about restarting Cassandra

2010-10-06 Thread Matthew Dennis
Some relevant reading if you're interested: http://dslab.epfl.ch/pubs/crashonly/ http://web.archive.org/web/20060426230247/http://crash.stanford.edu/ On Wed, Oct 6, 2010 at 1:46 PM, Scott Mann sdm...@gmail.com wrote: Yes. ctrl-C if running in the foreground. Use kill pid, if running in the

Re: Retaining commit logs

2010-10-06 Thread Matthew Dennis
PS. Are other ppl interested in this functionality ? I could file it to JIRA as well... Yes, please file it to Jira. It seems like it would be pretty useful for various things and fairly easy to change the code to move it to another directory whenever C* thinks it should be deleted...

Re: Re: Sorting in Cassandra

2010-10-06 Thread Matthew Dennis
The SCs are stored on disk in the order defined by the compareWith setting so if you want them back in a different order either someone is sorting them (C*, which doesn't sort them right now, or the client; which doesn't make much of a difference, it's just moving the load around) or you're

Re: Read Latency

2010-10-06 Thread Aaron Morton
Thats a lot of questions, I'll try to answer some...Read/Write latency as reported for a CF is the timetakento perform a local read on that node.Read/Write latency reported on the o.a.c.service.StorageProxy are the time taken to process a complete request, including local and remote reads when CL

Re: Newbie Question about restarting Cassandra

2010-10-06 Thread Rob Coli
On 10/6/10 1:13 PM, Aaron Morton wrote: To shutdown cleanly, say in a production system, use nodetool drain first. This will flush the memtables and put the node into a read only mode, AFAIK this also gives the other nodes a faster way of detecting the node is down via the drained node gossiping

Re: atomic test-or-set

2010-10-06 Thread Simon Reavely
Ryan, Independent of this ambiguous requirement what were you thinking about. What I am trying to ask is can you be more specific/concrete about when you can Simon Reavely On Oct 5, 2010, at 11:30 AM, Ryan King r...@twitter.com wrote: On Tue, Oct 5, 2010 at 8:23 AM, Ian Rogers

Re: Query on sstable2json - possible bug

2010-10-06 Thread Jonathan Ellis
can you tar.gz the filter/index/data files for this sstable and attach it to a ticket so we can debug? if you can't make the data public you can send it to me off list and I can have a look. On Wed, Oct 6, 2010 at 11:37 AM, Narendra Sharma narendra.sha...@gmail.com wrote: Has any one used

Does the secondary index in 0.7 cost extra space like an extra ColumnFamily?

2010-10-06 Thread Alvin UW
Hello, Before 0.7, actually we can create an extra ColumnFamily as an secondary index, if we need. I was wondering whether the secondary index mechanism in 0.7 just likes creating an extra ColumnFamily as an index. The difference is only that users don't take care of the maintainence of the

get_range_slices problem with super columns

2010-10-06 Thread Jianing Hu
I'm seeing cases where the count in slicerange predicate is not respected. This is only happening for super columns. I'm running Cassandra 0.6.4 in a single node. Steps to reproduce, using the Keyspace1.Super1 CF: * insert three super columns, bar1 bar 2, and bar3, under the same key * delete

Re: Newbie Question about restarting Cassandra

2010-10-06 Thread Matthew Dennis
Rob is correct. drain is really on there for when you need the commit log to be empty (some upgrades or a complete backup of a shutdown cluster). There really is no point to using to shutdown C* normally, just kill it... On Wed, Oct 6, 2010 at 4:18 PM, Rob Coli rc...@digg.com wrote: On

Re: Does the secondary index in 0.7 cost extra space like an extra ColumnFamily?

2010-10-06 Thread Matthew Dennis
Creating indexes takes extra space (does in MySQL, PGSQL, etc too). https://issues.apache.org/jira/browse/CASSANDRA-749 has quite a bit of detail about how the secondary indexes currently work. On Wed, Oct 6, 2010 at 7:17 PM, Alvin UW alvi...@gmail.com wrote: Hello, Before 0.7, actually we

How is data propagated

2010-10-06 Thread MK
Say I have a cluster of N nodes and I have started all the nodes with a replication factor of N. So effectively all data is being mirrored everywhere. Now, when I write to a node, how does this data get propagated to the remaining N-1 nodes. 1) Does this one origin node do N-1 network operations

Re: How is data propagated

2010-10-06 Thread Jonathan Ellis
the former, but also see http://issues.apache.org/jira/browse/CASSANDRA-1530 On Wed, Oct 6, 2010 at 9:22 PM, MK stardust...@gmail.com wrote: Say I have a cluster of N nodes and I have started all the nodes with a replication factor of N. So effectively all data is being mirrored everywhere.