commitlog replay missing data

2011-07-11 Thread Jeffrey Wang
Hey all, Recently upgraded to 0.8.1 and noticed what seems to be missing data after a commitlog replay on a single-node cluster. I start the node, insert a bunch of stuff (~600MB), stop it, and restart it. There are log messages pertaining to the commitlog replay and no errors, but some of the

hinted handoff sleeping

2011-06-23 Thread Jeffrey Wang
Hey all, We're running a slightly patched version of 0.7.3 on a cluster of 5 nodes. I've been noticing a number of messages in our logs which look like this (after a node goes down and comes back up, usually just due to a GC): 2011-06-23 14:46:35,381 INFO [HintedHandoff:1]

RE: hinted handoff sleeping

2011-06-23 Thread Jeffrey Wang
: Re: hinted handoff sleeping On Thu, Jun 23, 2011 at 2:55 PM, Jeffrey Wang jw...@palantir.com wrote: Hey all, We’re running a slightly patched version of 0.7.3 on a cluster of 5 nodes. I’ve been noticing a number of messages in our logs which look like this (after a node goes “down

multiple clusters communicating

2011-06-06 Thread Jeffrey Wang
Hey all, We're seeing a strange issue in which two completely separate clusters (0.7.3) on the same subnet (X.X.X.146 through X.X.X.150) with 3 machines (146-148) and 2 machines (149-150). Both of them are seeded with the respective machines in their cluster, yet when we run them they end up

RE: pig + hadoop

2011-04-19 Thread Jeffrey Wang
Did you set PIG_RPC_PORT in your hadoop-env.sh? I was seeing this error for a while before I added that. -Jeffrey From: pob [mailto:peterob...@gmail.com] Sent: Tuesday, April 19, 2011 6:42 PM To: user@cassandra.apache.org Subject: Re: pig + hadoop Hey Aaron, I read it, and all of 3 env

DatabaseDescriptor.defsVersion

2011-04-15 Thread Jeffrey Wang
Hey all, I've been seeing a very rare issue with schema change conflicts on 0.7.3 (I am serializing all schema changes to a single Cassandra node and waiting for them to finish before continuing). Occasionally a node in the cluster will never report the correct schema, and I think it may have

RE: DatabaseDescriptor.defsVersion

2011-04-15 Thread Jeffrey Wang
Done: https://issues.apache.org/jira/browse/CASSANDRA-2490 -Jeffrey -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Friday, April 15, 2011 7:39 PM To: user@cassandra.apache.org Cc: Jeffrey Wang Subject: Re: DatabaseDescriptor.defsVersion I think you found a bug

RE: pig counting question

2011-03-25 Thread Jeffrey Wang
. The pig model is that you can have huge bags that don't kill you on memory but they are just slower because they spill to disk. What is the schema that you impose when you load the data? On Mar 24, 2011, at 3:57 PM, Jeffrey Wang wrote: It looks like this functionality is not in the 0.7.3 version

RE: pig counting question

2011-03-25 Thread Jeffrey Wang
- From: Jeffrey Wang [mailto:jw...@palantir.com] Sent: Friday, March 25, 2011 11:42 AM To: user@cassandra.apache.org Subject: RE: pig counting question I don't think it's Pig running out of memory, but rather Cassandra itself (the data doesn't even make it to Pig). get_range_slices() is called

RE: pig counting question

2011-03-24 Thread Jeffrey Wang
, like so: rows = LOAD 'cassandra://Keyspace/ColumnFamily' USING CassandraStorage(4096); or whatever value you wish. Give that a try and see if it gives you more of what you're looking for. On Mar 24, 2011, at 1:16 PM, Jeffrey Wang wrote: Hey all, I'm trying to run a very simple Pig script

RE: running all unit tests

2011-03-15 Thread Jeffrey Wang
[mailto:aa...@thelastpickle.com] Sent: Tuesday, March 15, 2011 1:26 AM To: user@cassandra.apache.org Subject: Re: running all unit tests There is a test target in the build script. Aron On 15 Mar 2011, at 17:29, Jeffrey Wang wrote: Hey all, We're applying some patches to our own branch of Cassandra

running all unit tests

2011-03-14 Thread Jeffrey Wang
Hey all, We're applying some patches to our own branch of Cassandra, and we are wondering if there is a good way to run all the unit tests. Just having JUnit run all the test classes seems to result in a lot of errors that are hard to fix, so I'm hoping there's an easy way to do this. Thanks!

get_range_slices perf

2011-03-13 Thread Jeffrey Wang
Hey all, I'm trying to get a list of all the rows from a column family using get_range_slices retrieving no actual columns. I expected this operation to be pretty quick, but it seems to take a while (5-node 0.7.0 cluster takes 20 min to page through 60k keys 1000 at a time). It's not

understanding tombstones

2011-03-09 Thread Jeffrey Wang
Hey all, I was wondering if this is the expected behavior of deletes (0.7.0). Let's say I have a 1-node cluster with a single CF which has gc_grace_seconds = 0. The following sequence of operations happens (in the given order): insert row X with timestamp T delete row X with timestamp T+1

RE: understanding tombstones

2011-03-09 Thread Jeffrey Wang
Yup. https://issues.apache.org/jira/browse/CASSANDRA-2305 -Jeffrey -Original Message- From: Jonathan Ellis [mailto:jbel...@gmail.com] Sent: Wednesday, March 09, 2011 6:19 PM To: user@cassandra.apache.org Subject: Re: understanding tombstones On Wed, Mar 9, 2011 at 4:54 PM, Jeffrey Wang

when do snapshots go away?

2011-03-07 Thread Jeffrey Wang
Hi all, When I drop a column family, it creates a snapshot. When does the snapshot go away and free up the disk space? I was able to run nodetool clearsnapshot to get rid of them, but will they go away themselves? (Also, is there a purpose to keeping a snapshot around?) -Jeffrey

RE: memtable_flush_after_mins setting not working

2011-02-25 Thread Jeffrey Wang
I just noticed this thread. Does this mean that (assuming the same setup of an empty keyspace and CFs added later) if I have a CF that I write to for some time, but not enough to hit the flush limits, it will never get flushed until the server is restarted? I believe this is causing commit logs

dropped mutations, UnavailableException, and long GC

2011-02-24 Thread Jeffrey Wang
Hey all, Our setup is 5 machines running Cassandra 0.7.0 with 24GB of heap and 1.5TB disk each collocated in a DC. We're doing bulk imports from each of the nodes with RF = 2 and write consistency ANY (write perf is very important). The behavior we're seeing is this: - Nodes often

RE: rolling window of data

2011-02-03 Thread Jeffrey Wang
Thanks for the response, but unfortunately a TTL is not enough for us. We would like to be able to dynamically control the window in case there is an unusually large amount of data or something so we don't run out of disk space. One question I have in particular is: if I use the timestamp of my

RE: rolling window of data

2011-02-03 Thread Jeffrey Wang
exactly half of it or is there stuff that might go on under the covers that makes this not work as you might expect? -Jeffrey -Original Message- From: Jeffrey Wang [mailto:jw...@palantir.com] Sent: Thursday, February 03, 2011 3:03 PM To: user@cassandra.apache.org Subject: RE: rolling

rolling window of data

2011-02-02 Thread Jeffrey Wang
Hi, We're trying to use Cassandra 0.7 to store a rolling window of log data (e.g. last 90 days). We use the timestamp of the log entries as the column names so we can do time range queries. Everything seems to be working fine, but it's not clear if there is an efficient way to delete data that

RE: rolling window of data

2011-02-02 Thread Jeffrey Wang
@cassandra.apache.org Subject: Re: rolling window of data This project may provide some inspiration for you https://github.com/thobbs/logsandra Not sure if it has a rolling window, if you find out let me know :) Aaron On 03 Feb, 2011,at 06:08 PM, Jeffrey Wang jw...@palantir.com wrote: Hi, We're trying to use