Re: Java heap space on Cassandra start up version 1.0.10

2012-07-10 Thread Jonathan Ellis
You may have a corrupt metadata/statistics sstable component.  You can
try deleting those and restarting.  Cassandra can rebuild that
component if it is missing.

On Fri, Jul 6, 2012 at 6:00 PM, Jason Hill jasonhill...@gmail.com wrote:
 Hello friends,

 I'm getting a:

 ERROR 22:50:29,695 Fatal exception in thread Thread[SSTableBatchOpen:2,5,main]
 java.lang.OutOfMemoryError: Java heap space

 error when I start Cassandra. This node was running fine and after
 some server work/upgrades it started throwing this error when I start
 the Cassandra service. I was on 0.8.? and have upgraded to 1.0.10 to
 see if it would help, but I get the same error. I've removed some of
 the column families from my keyspace directory to see if I can get it
 to start without the heap space error and with some combinations it
 will run. However, I'd like to get it running with all my colFams and
 wonder if someone could give me some advice on what might be causing
 my error. It doesn't seem to be related to compaction, if I am reading
 the log correctly, and most of the help I've found on this topic deals
 with compaction. I'm thinking that my 2 column families should not be
 enough to fill my heap, but I am at a loss as to what I should try
 next?

 Thanks for your consideration.

 output.log:

  INFO 22:50:26,319 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
 VM/1.6.0_26
  INFO 22:50:26,322 Heap size: 5905580032/5905580032
  INFO 22:50:26,322 Classpath:
 /usr/share/cassandra/lib/antlr-3.2.jar:/usr/share/cassandra/lib/avro-1.4.0-fixes.jar:/usr/share/cassandra/lib/avro-1.4.0-sources-fixes.jar:/usr/share/cassandra/lib/commons-cli-1.1.jar:/usr/share/cassandra/lib/commons-codec-1.2.jar:/usr/share/cassandra/lib/commons-lang-2.4.jar:/usr/share/cassandra/lib/compress-lzf-0.8.4.jar:/usr/share/cassandra/lib/concurrentlinkedhashmap-lru-1.2.jar:/usr/share/cassandra/lib/guava-r08.jar:/usr/share/cassandra/lib/high-scale-lib-1.1.2.jar:/usr/share/cassandra/lib/jackson-core-asl-1.4.0.jar:/usr/share/cassandra/lib/jackson-mapper-asl-1.4.0.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar:/usr/share/cassandra/lib/jline-0.9.94.jar:/usr/share/cassandra/lib/json-simple-1.1.jar:/usr/share/cassandra/lib/libthrift-0.6.jar:/usr/share/cassandra/lib/log4j-1.2.16.jar:/usr/share/cassandra/lib/servlet-api-2.5-20081211.jar:/usr/share/cassandra/lib/slf4j-api-1.6.1.jar:/usr/share/cassandra/lib/slf4j-log4j12-1.6.1.jar:/usr/share/cassandra/lib/snakeyaml-1.6.jar:/usr/share/cassandra/lib/snappy-java-1.0.4.1.jar:/usr/share/cassandra/apache-cassandra-1.0.10.jar:/usr/share/cassandra/apache-cassandra-thrift-1.0.10.jar:/usr/share/cassandra/apache-cassandra.jar:/usr/share/java/jna.jar:/etc/cassandra:/usr/share/java/commons-daemon.jar:/usr/share/cassandra/lib/jamm-0.2.5.jar
  INFO 22:50:28,586 JNA mlockall successful
  INFO 22:50:28,593 Loading settings from file:/etc/cassandra/cassandra.yaml
 DEBUG 22:50:28,677 Syncing log with a period of 1
  INFO 22:50:28,677 DiskAccessMode 'auto' determined to be mmap,
 indexAccessMode is mmap
  INFO 22:50:28,686 Global memtable threshold is enabled at 1877MB
 DEBUG 22:50:28,761 setting auto_bootstrap to true
 snip
 DEBUG 22:50:28,797 Checking directory /var/lib/cassandra/data
 DEBUG 22:50:28,798 Checking directory /var/lib/cassandra/commitlog
 DEBUG 22:50:28,798 Checking directory /var/lib/cassandra/saved_caches
 DEBUG 22:50:28,806 Removing compacted SSTable files from NodeIdInfo
 (see http://wiki.apache.org/cassandra/MemtableSSTable)
 DEBUG 22:50:28,808 Removing compacted SSTable files from Versions (see
 http://wiki.apache.org/cassandra/MemtableSSTable)
 DEBUG 22:50:28,818 Removing compacted SSTable files from
 Versions.76657273696f6e (see
 http://wiki.apache.org/cassandra/MemtableSSTable)
 DEBUG 22:50:28,819 Removing compacted SSTable files from IndexInfo
 (see http://wiki.apache.org/cassandra/MemtableSSTable)
 DEBUG 22:50:28,821 Removing compacted SSTable files from Schema (see
 http://wiki.apache.org/cassandra/MemtableSSTable)
 DEBUG 22:50:28,823 Removing compacted SSTable files from Migrations
 (see http://wiki.apache.org/cassandra/MemtableSSTable)
 DEBUG 22:50:28,825 Removing compacted SSTable files from LocationInfo
 (see http://wiki.apache.org/cassandra/MemtableSSTable)
 DEBUG 22:50:28,827 Removing compacted SSTable files from
 HintsColumnFamily (see
 http://wiki.apache.org/cassandra/MemtableSSTable)
 DEBUG 22:50:28,833 Initializing system.NodeIdInfo
 DEBUG 22:50:28,839 Starting CFS NodeIdInfo
 DEBUG 22:50:28,868 Creating IntervalNode from []
 DEBUG 22:50:28,869 KeyCache capacity for NodeIdInfo is 1
 DEBUG 22:50:28,871 Initializing system.Versions
 DEBUG 22:50:28,873 Starting CFS Versions
  INFO 22:50:28,877 Opening
 /var/lib/cassandra/data/system/Versions-hd-5 (248 bytes)
 DEBUG 22:50:28,879 Load metadata for
 /var/lib/cassandra/data/system/Versions-hd-5
  INFO 22:50:28,880 Opening
 /var/lib/cassandra/data/system/Versions-hd-6 (248 bytes)
 DEBUG 22:50:28,880 Load metadata for
 

Re: BulkLoading sstables from v1.0.3 to v1.1.1

2012-07-10 Thread rubbish me
Thanks Ivo. 

We are quite close to releasing so we'd hope to understand what causing the 
error and may try to avoid it where possible. As said, it seems to work ok the 
first time round. 

The problem you referring in the last mail, was it restricted to bulk loading 
or otherwise?

Thanks

-A

Ivo Meißner i...@overtronic.com 於 10 Jul 2012 07:20 寫道:

 Hi,
 
 there are some problems in version 1.1.1 with secondary indexes and key 
 caches that are fixed in 1.1.2. 
 I would try to upgrade to 1.1.2 and see if the error still occurs. 
 
 Ivo
 
 
 
 
 
 Hi 
 
 As part of a continuous development of a system migration, we have a test 
 build to take a snapshot of a keyspace from cassandra v 1.0.3 and bulk load 
 it to a cluster of 1.1.1 using the sstableloader.sh.  Not sure if relevant, 
 but one of the cf contains a secondary index. 
 
 The build basically does: 
 Drop the destination keyspace if exist 
 Add the destination keyspace, wait for schema to agree 
 run sstableLoader 
 Do some validation of the streamed data 
 
 Keyspace / column families schema are basically the same, apart from in the 
 one of v1.1.1, we had compression and key cache switched on. 
 
 On a clean cluster, (empty data, commit log, saved-cache dirs) the sstables 
 loaded beautifully. 
 
 But subsequent build failed with 
 -- 
 [21:02:02][exec] progress: [snip ip_addresses]... [total: 0 - 0MB/s (avg: 
 0MB/s)]ERROR 21:02:02,811 Error in 
 ThreadPoolExecutorjava.lang.RuntimeException: java.net.SocketException: 
 Connection reset 


Re: Serious issue updating Cassandra version and topology

2012-07-10 Thread aaron morton
To be clear, this happened on a 1.1.2 node and it happened again *after* you 
had run a scrub ? 

Has this cluster been around for a while or was the data created with 1.1 ?

Can you confirm that all sstables were re-written for the CF? Check the 
timestamp on the files. Also also files should have the same version, the -h?- 
part of the name.

Can you repair the other CF's ? 

If this cannot be repaired by scrub or upgradetables you may need to cut the 
row out of the sstables. Using sstable2json and json2sstable. 

 
Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 8/07/2012, at 4:05 PM, Michael Theroux wrote:

 Hello,
 
 We're in the process of trying to move a 6-node cluster from RF=1 to RF=3. 
 Once our replication factor was upped to 3, we ran nodetool repair, and 
 immediately hit an issue on the first node we ran repair on:
 
  INFO 03:08:51,536 Starting repair command #1, repairing 2 ranges.
  INFO 03:08:51,552 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] new 
 session: will sync xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101, 
 /10.29.187.61 on range 
 (Token(bytes[d558]),Token(bytes[])]
  for x.[a, b, c, d, e, f, g, h, i, 
 j, k, l, m, n, o, p, q, r, s]
  INFO 03:08:51,555 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] requesting 
 merkle trees for a (to [/10.29.187.61, 
 xxx-xx-xx-xxx-compute-1.amazonaws.com/10.202.99.101])
  INFO 03:08:52,719 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Received 
 merkle tree for a from /10.29.187.61
  INFO 03:08:53,518 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Received 
 merkle tree for a from 
 xxx-xx-xx-xxx-.compute-1.amazonaws.com/10.202.99.101
  INFO 03:08:53,519 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] requesting 
 merkle trees for b (to [/10.29.187.61, 
 xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101])
  INFO 03:08:53,639 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Endpoints 
 /10.29.187.61 and xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101 are 
 consistent for a
  INFO 03:08:53,640 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] a is 
 fully synced (18 remaining column family to sync for this session)
  INFO 03:08:54,049 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Received 
 merkle tree for b from /10.29.187.61
 ERROR 03:09:09,440 Exception in thread Thread[ValidationExecutor:1,1,main]
 java.lang.AssertionError: row 
 DecoratedKey(Token(bytes[efd5654ce92a705b14244e2f5f73ab98c3de2f66c7adbd71e0e893997e198c47]),
  efd5654ce92a705b14244e2f5f73ab98c3de2f66c7adbd71e0e893997e198c47) received 
 out of order wrt 
 DecoratedKey(Token(bytes[f33a5ad4a45e8cac7987737db246ddfe9294c95bea40f411485055f5dbecbadb]),
  f33a5ad4a45e8cac7987737db246ddfe9294c95bea40f411485055f5dbecbadb)
   at 
 org.apache.cassandra.service.AntiEntropyService$Validator.add(AntiEntropyService.java:349)
   at 
 org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:712)
   at 
 org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:68)
   at 
 org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:438)
   at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
   at java.util.concurrent.FutureTask.run(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
 
 It looks from the log above, the sync of the a column family was 
 successful.  However, the b column family resulted in this error.  In 
 addition, the repair hung after this error.  We ran node tool scrub on all 
 nodes and invalidated the key and row caches and tried again (with RF=2), and 
 it didn't help alleviate the problem.
 
 Some other important pieces of information:
 We use ByteOrderedPartitioner (we MD5 hash the keys ourselves)
 We're using Leveled Compaction
 As we're in the middle of a transition, one node is on 1.1.2 (the one we 
 tried repair on), the other 5 are on 1.1.1
 
 Thanks,
 -Mike
 



Re: Effect of rangequeries with RandomPartitioner

2012-07-10 Thread aaron morton
Index files map keys (not tokens) to offsets in the data file.

A range scan uses the index file to seek to the start position in the data file 
and then does a partial scan of the data file. 

Cheers
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 9/07/2012, at 7:24 PM, prasenjit mukherjee wrote:

 Thanks for the response. Further questions inline..
 
 On Mon, Jul 9, 2012 at 11:50 AM, samal samalgo...@gmail.com wrote:
 1. With RandomPartitioner, on a given node, are the keys  sorted by
 their hash_values or original/unhashed keys  ?
 
 hash value,
 
 1. Based on the second answer in
 http://stackoverflow.com/questions/2359175/cassandra-file-structure-how-are-the-files-used
 it seems that the index-file ( for a given ssTable ) contains the
 row-key ( and not the hash_keys ).  Or may be I am missing something.
 
 2. Do the keys in  Index-file ( ref
 http://hi.csdn.net/attachment/20/28/0_1322461982l3D8.gif )
 actually contain : hash(row_key)+row_key or something like that ?
 Otherwise you need a separate mapping info from hash_bucket - rows
 for reading.
 
 -Thanks,
 Prasenjit



Re: Setting the Memtable allocator on a per CF basis

2012-07-10 Thread aaron morton
 Would you guys consider adding this option to a future release?
All improvements are considered :) Please create a ticket on
https://issues.apache.org/jira/browse/CASSANDRA and reference CASSANDRA-3073

 If you want I can try to create a patch myself and submit it to you?
Sounds like a plan.

Thanks
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 10/07/2012, at 1:47 AM, Joost van de Wijgerd wrote:

 Hello Cassandra Devs,
 
 We are currently trying to optimize our Cassandra system with
 different workloads. One of our workload is update heavy (very).
 
 Currently we are running with a patch that allows the Live Ratio to go
 below 1.0 (lower bound set to 0.1 now) which gives us a
 bit better performance in terms of flushes on this particular CF. We
 then experienced unexpected memory issues which on further
 inspection seems to be related to the SlabAllocator. What happens is
 that we allocate a Region of 1MB every couple of seconds (the columns
 we write in this CF contain serialized session data, can be 100K
 each), so overwrites are actually done into another Region and these
 regions are only freed (most of the time) when the Memtable is
 flushed. We actually added some debug logs and to write about 300MB to
 disk we created roughly 3000 regions. (3GB of data, some of them might
 be collected before the flush but probably not much)
 
 It would really great if we could use the native allocator only for
 this CF. Since the SlabAllocator gives us very good results on our
 other
 CFs. (we tried running on a patched version with the HeapAllocator set
 but went OOM almost immediately)
 
 I have found this issue in which Jonathan mentions he is ok with
 adding a configuration option:
 
 https://issues.apache.org/jira/browse/CASSANDRA-3073
 
 Unfortunately it seems the issue was closed and nothing was implemented.
 
 Would you guys consider adding this option to a future release?
 SlabAllocator should be the default but in the CF properties the
 HeapAllocator
 can be set.
 
 If you want I can try to create a patch myself and submit it to you?
 
 Kind Regards
 
 Joost
 
 -- 
 Joost van de Wijgerd
 Visseringstraat 21B
 1051KH Amsterdam
 +31624111401
 joost.van.de.wijgerd@Skype
 http://www.linkedin.com/in/jwijgerd



Re: Composite Slice Query returning non-sliced data

2012-07-10 Thread aaron morton
Ah, it's a Hector query question. 

You may have bette luck on the Hector email list. Or if you can turn on debug 
logging on the server and grab the query that would be handy. 

The first thing that stands out is that (in cassandra) comparison operations 
are not used in a slice range. 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 10/07/2012, at 12:36 PM, Sunit Randhawa wrote:

 Aaron,
 
 Let me start from the beginning.
 
 1- I have a ColumnFamily called Rollup15 with below definition:
 
 create column family Rollup15
  with comparator =
 'CompositeType(org.apache.cassandra.db.marshal.Int32Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type)'
and key_validation_class = UTF8Type
and default_validation_class = UTF8Type;
 
 
 2- Once created, it is empty. Below is the output of CLI:
 
 [default@Schema] list Rollup15;
 Using default limit of 100
 
 0 Row Returned.
 Elapsed time: 16 msec(s).
 
 3- I use the Code below to insert the Composite Data into Cassandra:
 
 public void insertData(String columnFamilyName, String key,
   String value, int rollupInterval, String... 
 columnSlice) {
   
   Composite colKey = new Composite();
   colKey.addComponent(rollupInterval, IntegerSerializer.get());
   if (columnSlice != null){
   for (String colName : columnSlice){
   colKey.addComponent(colName, serializer);
   }
   }
   createMutator(keyspace, serializer).addInsertion(key, 
 columnFamilyName,
   createColumn(colKey, value, new 
 CompositeSerializer(),
 serializer)).execute();
 
   }
 
 4- After insertion, below is the CLI Output:
 
 [default@Schema] list Rollup15;
 Using default limit of 100
 ---
 RowKey: query1_1337295600
 = (column=15:Composite1:Composite2, value=value123, timesta
 mp=134187983347)
 
 1 Row Returned.
 Elapsed time: 9 msec(s).
 
 So, there is record with 3 Composite Keys (15, Composite1 and Composite2)
 
 
 5- Now I am doing fetch based on Code Below. I am doing a fetch for
 column 15:Composite3 which I know it is not there:
 
 Composite start = new Composite();
 
start.addComponent(0, 15,
Composite.ComponentEquality.EQUAL);
   start.addComponent(1,
 Composite3,Composite.ComponentEquality.EQUAL);
 
 
Composite finish = new Composite();
finish.addComponent(0, 15,
Composite.ComponentEquality.EQUAL);
 
finish.addComponent(1,Composite3+
 Character.MAX_VALUE, Composite.ComponentEquality.GREATER_THAN_EQUAL);
 
SliceQueryString, Composite, String sq
  =  HFactory.createSliceQuery(keyspace, StringSerializer.get(),
   new CompositeSerializer(),
   StringSerializer.get());
sq.setColumnFamily(Rollup15);
 
sq.setKey(query1_1337295600);
sq.setRange(start, finish, false, 1);
 
QueryResultColumnSliceComposite, String result = sq
.execute();
ColumnSliceComposite, String orderedRows = result.get();
 
 6- And I get output for RowKey: query1_1337295600 as
 (column=15:Composite1:Composite2, value=value123, timesta
 mp=134187983347) which should not be the case since it does not
 belong to the 'Composite3' slice.
 
 Sunit.
 
 
 On Sun, Jul 8, 2012 at 11:45 AM, aaron morton aa...@thelastpickle.com wrote:
 Something like:
 
 This is how I did the write in CLI and this is what it printed.
 
 and then
 
 This is how I did the read in the CLI and this is what it printed.
 
 It's hard to imagine what data is in cassandra based on code.
 
 cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 7/07/2012, at 1:28 PM, Sunit Randhawa wrote:
 
 Aaron,
 
 For writing, i am using cli.
 Below is the piece of code that is reading column names of different types.
 
 
 Composite start = new Composite();
 
 start.addComponent(0, beginTime,
 Composite.ComponentEquality.EQUAL);
 
 if (columns != null){
 int colCount =1;
 for (String colName : columns){
 start.addComponent(colCount,colName,Composite.ComponentEquality.EQUAL);
 colCount++;
 }
 }
 
 Composite finish = new Composite();
 finish.addComponent(0, endTime,
 Composite.ComponentEquality.EQUAL);
 
 if (columns != null){
 int colCount =1;
 for (String colName : columns){
 if (colCount == columns.size())
 finish.addComponent(colCount,colName+ Character.MAX_VALUE,
 Composite.ComponentEquality.GREATER_THAN_EQUAL);
 //Greater_than_equal is meant for any subslices to A:B:C if searched on A:B
 else
 

Re: Dynamic CF

2012-07-10 Thread Sylvain Lebresne
On Fri, Jul 6, 2012 at 10:49 PM, Leonid Ilyevsky
lilyev...@mooncapital.com wrote:
 At this point I am really confused about what direction Cassandra is going. 
 CQL 3 has the benefit of composite keys, but no dynamic columns.
 I thought, the whole point of Cassandra was to provide dynamic tables.

CQL3 absolutely provide dynamic tables/wide rows, the syntax is just
different. The typical example for wide rows is a time serie, for
instance keeping all the events for a given event_kind in the same C*
row ordered by time. You declare that in CQL3 using:
  CREATE TABLE events (
event_kind text,
time timestamp,
event_name text,
event_details text,
PRIMARY KEY (event_kind, time)
  )

The important part in such definition is that one CQL row (i.e a given
event_kind, time, event_name, even_details) does not map to an internal
Cassandra row. More precisely, all events sharing the same event_kind will be
in the same internal row. This is a wide row/dynamic table in the sense of
thrift.


 I need to have a huge table to store market quotes, and be able to query it 
 by name and timestamp (t1 = t = t2), therefore I wanted the composite key.
 Loading data to such table using prepared statements (CQL 3-based) was very 
 slow, because it makes a server call for each row.

You should use a BATCH statement which is the equivalent to batch_mutate.

--
Sylvain


Re: cannot build 1.1.2 from source

2012-07-10 Thread Sylvain Lebresne
I would check if you don't have a version of antlr install on you
system that takes
precedence over the one distributed with C* and happens to not be compatible.

Because I don't remember there having been much change to the Cli between 1.1.1
and 1.1.2 and the grammar nobody has had that problem so far.

--
Sylvain

On Mon, Jul 9, 2012 at 8:07 PM, Arya Goudarzi gouda...@gmail.com wrote:
 Thanks for your response. Yes. I do that every time before I build.

 On Sun, Jul 8, 2012 at 11:51 AM, aaron morton aa...@thelastpickle.com wrote:
 Did you try running ant clean first ?

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 8/07/2012, at 1:57 PM, Arya Goudarzi wrote:

 Hi Fellows,

 I used to be able to build cassandra 1.1 up to 1.1.1 with the same set
 of procedures by running ant on the same machine, but now the stuff
 associated with gen-cli-grammar breaks the build. Any advice will be
 greatly appreciated.

 -Arya

 Source:
 source tarball for 1.1.2 downloaded from one of the mirrors in
 cassandra.apache.org
 OS:
 Ubuntu 10.04 Precise 64bit
 Ant:
 Apache Ant(TM) version 1.8.2 compiled on December 3 2011
 Maven:
 Apache Maven 3.0.3 (r1075438; 2011-02-28 17:31:09+)
 Java:
 java version 1.6.0_32
 Java(TM) SE Runtime Environment (build 1.6.0_32-b05)
 Java HotSpot(TM) 64-Bit Server VM (build 20.7-b02, mixed mode)



 Buildfile: /home/arya/workspace/cassandra-1.1.2/build.xml

 maven-ant-tasks-localrepo:

 maven-ant-tasks-download:

 maven-ant-tasks-init:

 maven-declare-dependencies:

 maven-ant-tasks-retrieve-build:

 init-dependencies:
 [echo] Loading dependency paths from file:
 /home/arya/workspace/cassandra-1.1.2/build/build-dependencies.xml

 init:
[mkdir] Created dir:
 /home/arya/workspace/cassandra-1.1.2/build/classes/main
[mkdir] Created dir:
 /home/arya/workspace/cassandra-1.1.2/build/classes/thrift
[mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/build/test/lib
[mkdir] Created dir:
 /home/arya/workspace/cassandra-1.1.2/build/test/classes
[mkdir] Created dir: /home/arya/workspace/cassandra-1.1.2/src/gen-java

 check-avro-generate:

 avro-interface-generate-internode:
 [echo] Generating Avro internode code...

 avro-generate:

 build-subprojects:

 check-gen-cli-grammar:

 gen-cli-grammar:
 [echo] Building Grammar
 /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g
 
 [java] warning(209):
 /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:697:1:
 Multiple token rules can match input such as '-':
 IntegerNegativeLiteral, COMMENT
 [java]
 [java] As a result, token(s) COMMENT were disabled for that input
 [java] warning(209):
 /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
 Multiple token rules can match input such as 'I': INCR, INDEX,
 Identifier
 [java]
 [java] As a result, token(s) INDEX,Identifier were disabled for that
 input
 [java] warning(209):
 /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
 Multiple token rules can match input such as '0'..'9': IP_ADDRESS,
 IntegerPositiveLiteral, DoubleLiteral, Identifier
 [java]
 [java] As a result, token(s)
 IntegerPositiveLiteral,DoubleLiteral,Identifier were disabled for that
 input
 [java] warning(209):
 /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
 Multiple token rules can match input such as 'T': TRUNCATE, TTL,
 Identifier
 [java]
 [java] As a result, token(s) TTL,Identifier were disabled for that input
 [java] warning(209):
 /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
 Multiple token rules can match input such as 'A': T__109,
 API_VERSION, AND, ASSUME, Identifier
 [java]
 [java] As a result, token(s) API_VERSION,AND,ASSUME,Identifier
 were disabled for that input
 [java] warning(209):
 /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
 Multiple token rules can match input such as 'E': EXIT, Identifier
 [java]
 [java] As a result, token(s) Identifier were disabled for that input
 [java] warning(209):
 /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
 Multiple token rules can match input such as 'L': LIST, LIMIT,
 Identifier
 [java]
 [java] As a result, token(s) LIMIT,Identifier were disabled for that
 input
 [java] warning(209):
 /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
 Multiple token rules can match input such as 'B': BY, Identifier
 [java]
 [java] As a result, token(s) Identifier were disabled for that input
 [java] warning(209):
 /home/arya/workspace/cassandra-1.1.2/src/java/org/apache/cassandra/cli/Cli.g:628:1:
 Multiple token rules can match input such as 'O': ON, Identifier
 [java]
 [java] As a result, token(s) 

Trigger and customized filter

2012-07-10 Thread Felipe Schmidt
Does anyone know something about the following questions?

1. Does Cassandra support customized filter? customized filter means
programmer can define his desired filter to select the data.
2. Does Cassandra support trigger? trigger has the same meaning as in
RDBMS.

Thanks in advance.

Regards,
Felipe Mathias Schmidt
*(Computer Science UFRGS, RS, Brazil)*


RE: Dynamic CF

2012-07-10 Thread Leonid Ilyevsky
Thanks Sylvain, this is useful.
So I guess, in the batch_mutate call, in the map that I pass to it, only the 
first element of the composite key should be used as a key (because it is the 
real key), and the other parts of the key should be passed as regular columns? 
Is this correct? While I am waiting for your confirmation, I am going to try it.

-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Tuesday, July 10, 2012 8:24 AM
To: user@cassandra.apache.org
Subject: Re: Dynamic CF

On Fri, Jul 6, 2012 at 10:49 PM, Leonid Ilyevsky
lilyev...@mooncapital.com wrote:
 At this point I am really confused about what direction Cassandra is going. 
 CQL 3 has the benefit of composite keys, but no dynamic columns.
 I thought, the whole point of Cassandra was to provide dynamic tables.

CQL3 absolutely provide dynamic tables/wide rows, the syntax is just
different. The typical example for wide rows is a time serie, for
instance keeping all the events for a given event_kind in the same C*
row ordered by time. You declare that in CQL3 using:
  CREATE TABLE events (
event_kind text,
time timestamp,
event_name text,
event_details text,
PRIMARY KEY (event_kind, time)
  )

The important part in such definition is that one CQL row (i.e a given
event_kind, time, event_name, even_details) does not map to an internal
Cassandra row. More precisely, all events sharing the same event_kind will be
in the same internal row. This is a wide row/dynamic table in the sense of
thrift.


 I need to have a huge table to store market quotes, and be able to query it 
 by name and timestamp (t1 = t = t2), therefore I wanted the composite key.
 Loading data to such table using prepared statements (CQL 3-based) was very 
 slow, because it makes a server call for each row.

You should use a BATCH statement which is the equivalent to batch_mutate.

--
Sylvain

This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately.  This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax matters was not intended or written to be relied upon 
by any person as advice. Moon Capital does not waive confidentiality or 
privilege as a result of this email.


Re: Dynamic CF

2012-07-10 Thread Carlos Carrasco
I think he means something like having a fixed set of coiumns in the table
definition, then in the actual rows having other columns not specified in
the defintion, indepentent of the composited part of the PK. When I
reviewed CQL3 for using in Gossie[1] I realized I couldn't have this, and
that it would complicate things like migrations or optional columns. For
this reason I didn't use CQL3 and instead wrote a row unmaper that detects
the discontinuities in the composited part and uses those as the boundaries
for the individual concrete rows stored in a wide row [2]. For example:

Given a Timeline table defined as key validation UTF8Type, column name
validation CompositeType(LongType, AsciiType), value validation BytesType:

Timeline: {
user1: {
134193302100: {
Author: Tom,
Body: Hey!
},
134193302200: {
Author: Paul,
Body: Nice,
Lat: 40.0,
Lon: 20.0
},
134193302300: {
Author: Lana,
Body: Cool
}
},
...
}

Both of the following structs are valid and will be able to be unmaped from
the wide row user1:

type Tweet struct {
UserID  string `cf:Timeline key:UserID cols:When`
Whenint64
Author  string
Bodystring
}

type GeoTweet struct {
UserID  string `cf:Timeline key:UserID cols:When`
Whenint64
Author  string
Bodystring
Lat float32
Lon float32
}

Granted I lose database-side validation over the individual column values
(BytesType) but in exchange I get very flexible rows and much nicer
behaviour for model changes and migrations.

1: https://github.com/carloscm/gossie
2: https://github.com/carloscm/gossie/blob/master/src/gossie/mapping.go#L339

On 10 July 2012 14:23, Sylvain Lebresne sylv...@datastax.com wrote:

 On Fri, Jul 6, 2012 at 10:49 PM, Leonid Ilyevsky
 lilyev...@mooncapital.com wrote:
  At this point I am really confused about what direction Cassandra is
 going. CQL 3 has the benefit of composite keys, but no dynamic columns.
  I thought, the whole point of Cassandra was to provide dynamic tables.

 CQL3 absolutely provide dynamic tables/wide rows, the syntax is just
 different. The typical example for wide rows is a time serie, for
 instance keeping all the events for a given event_kind in the same C*
 row ordered by time. You declare that in CQL3 using:
   CREATE TABLE events (
 event_kind text,
 time timestamp,
 event_name text,
 event_details text,
 PRIMARY KEY (event_kind, time)
   )

 The important part in such definition is that one CQL row (i.e a given
 event_kind, time, event_name, even_details) does not map to an internal
 Cassandra row. More precisely, all events sharing the same event_kind will
 be
 in the same internal row. This is a wide row/dynamic table in the sense of
 thrift.


  I need to have a huge table to store market quotes, and be able to query
 it by name and timestamp (t1 = t = t2), therefore I wanted the composite
 key.
  Loading data to such table using prepared statements (CQL 3-based) was
 very slow, because it makes a server call for each row.

 You should use a BATCH statement which is the equivalent to batch_mutate.

 --
 Sylvain




-- 
http://www.groupalia.com/Carlos CarrascoIT - Software Architect
Llull, 95-97, 2º planta, 08005 BarcelonaSkype: carlos.carrasco.groupalia
www.groupalia.comcarlos.carra...@groupalia.com


Re: Dynamic CF

2012-07-10 Thread Sylvain Lebresne
On Tue, Jul 10, 2012 at 4:19 PM, Carlos Carrasco 
carlos.carra...@groupalia.com wrote:

 I think he means something like having a fixed set of coiumns in the table
 definition, then in the actual rows having other columns not specified in
 the defintion, indepentent of the composited part of the PK. When I
 reviewed CQL3 for using in Gossie[1] I realized I couldn't have this, and
 that it would complicate things like migrations or optional columns. For
 this reason I didn't use CQL3 and instead wrote a row unmaper that detects
 the discontinuities in the composited part and uses those as the boundaries
 for the individual concrete rows stored in a wide row [2]. For example:

 Given a Timeline table defined as key validation UTF8Type, column name
 validation CompositeType(LongType, AsciiType), value validation BytesType:

 Timeline: {
 user1: {
 134193302100: {
 Author: Tom,
 Body: Hey!
 },
 134193302200: {
 Author: Paul,
 Body: Nice,
 Lat: 40.0,
 Lon: 20.0
 },
 134193302300: {
 Author: Lana,
 Body: Cool
 }
 },
 ...
 }

 Both of the following structs are valid and will be able to be unmaped
 from the wide row user1:

 type Tweet struct {
 UserID  string `cf:Timeline key:UserID cols:When`
 Whenint64
 Author  string
 Bodystring
 }

 type GeoTweet struct {
 UserID  string `cf:Timeline key:UserID cols:When`
 Whenint64
 Author  string
 Bodystring
 Lat float32
 Lon float32
 }


That's exactly how CQL3 works. In that example, you would declare:
CREATE TABLE tweet (
UserID text,
When int,
Author text,
Body text,
Lat float,
Long float,
PRIMARY KEY (UserId, When)
)
and that would layout things *exactly* like your Timeline above, but with
validation.

The fact that you have to declare Lat and Long does not mean that every CQL
row must have them.


 much nicer behaviour for model changes and migrations.


Not sure what you mean by that since adding new columns to a CQL3
definition is basically free.

--
Sylvain


Re: Dynamic CF

2012-07-10 Thread Sylvain Lebresne
On Tue, Jul 10, 2012 at 4:17 PM, Leonid Ilyevsky
lilyev...@mooncapital.com wrote:
 So I guess, in the batch_mutate call, in the map that I pass to it, only the 
 first element of the composite key should be used as a key (because it is the 
 real key), and the other parts of the key should be passed as regular 
 columns? Is this correct? While I am waiting for your confirmation, I am 
 going to try it.

I would really advise you to use the BATCH statement of CQL3 rather
than the thrift batch_mutate call. If only because until
https://issues.apache.org/jira/browse/CASSANDRA-4377 is resolved it
won't work at all, but also because the whole point of CQL3 is to hide
that kind of complexity.

--
Sylvain


 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: Tuesday, July 10, 2012 8:24 AM
 To: user@cassandra.apache.org
 Subject: Re: Dynamic CF

 On Fri, Jul 6, 2012 at 10:49 PM, Leonid Ilyevsky
 lilyev...@mooncapital.com wrote:
 At this point I am really confused about what direction Cassandra is going. 
 CQL 3 has the benefit of composite keys, but no dynamic columns.
 I thought, the whole point of Cassandra was to provide dynamic tables.

 CQL3 absolutely provide dynamic tables/wide rows, the syntax is just
 different. The typical example for wide rows is a time serie, for
 instance keeping all the events for a given event_kind in the same C*
 row ordered by time. You declare that in CQL3 using:
   CREATE TABLE events (
 event_kind text,
 time timestamp,
 event_name text,
 event_details text,
 PRIMARY KEY (event_kind, time)
   )

 The important part in such definition is that one CQL row (i.e a given
 event_kind, time, event_name, even_details) does not map to an internal
 Cassandra row. More precisely, all events sharing the same event_kind will be
 in the same internal row. This is a wide row/dynamic table in the sense of
 thrift.


 I need to have a huge table to store market quotes, and be able to query it 
 by name and timestamp (t1 = t = t2), therefore I wanted the composite key.
 Loading data to such table using prepared statements (CQL 3-based) was very 
 slow, because it makes a server call for each row.

 You should use a BATCH statement which is the equivalent to batch_mutate.

 --
 Sylvain

 This email, along with any attachments, is confidential and may be legally 
 privileged or otherwise protected from disclosure. Any unauthorized 
 dissemination, copying or use of the contents of this email is strictly 
 prohibited and may be in violation of law. If you are not the intended 
 recipient, any disclosure, copying, forwarding or distribution of this email 
 is strictly prohibited and this email and any attachments should be deleted 
 immediately.  This email and any attachments do not constitute an offer to 
 sell or a solicitation of an offer to purchase any interest in any investment 
 vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon 
 Capital does not provide legal, accounting or tax advice. Any statement 
 regarding legal, accounting or tax matters was not intended or written to be 
 relied upon by any person as advice. Moon Capital does not waive 
 confidentiality or privilege as a result of this email.


Re: Dynamic CF

2012-07-10 Thread Carlos Carrasco
I am confused then. I remember reviewing the source for CQL3 and finding
that the row reader used the column count in the CF definition in order to
find how many columns it needed to read a single row. I guess I missed a
filter over the composited part or that I reviewed an old version.

On 10 July 2012 16:34, Sylvain Lebresne sylv...@datastax.com wrote:

 On Tue, Jul 10, 2012 at 4:19 PM, Carlos Carrasco 
 carlos.carra...@groupalia.com wrote:

 I think he means something like having a fixed set of coiumns in the
 table definition, then in the actual rows having other columns not
 specified in the defintion, indepentent of the composited part of the PK.
 When I reviewed CQL3 for using in Gossie[1] I realized I couldn't have
 this, and that it would complicate things like migrations or optional
 columns. For this reason I didn't use CQL3 and instead wrote a row unmaper
 that detects the discontinuities in the composited part and uses those as
 the boundaries for the individual concrete rows stored in a wide row [2].
 For example:

 Given a Timeline table defined as key validation UTF8Type, column name
 validation CompositeType(LongType, AsciiType), value validation BytesType:

 Timeline: {
 user1: {
 134193302100: {
 Author: Tom,
 Body: Hey!
 },
 134193302200: {
 Author: Paul,
 Body: Nice,
 Lat: 40.0,
 Lon: 20.0
 },
 134193302300: {
 Author: Lana,
 Body: Cool
 }
 },
 ...
 }

 Both of the following structs are valid and will be able to be unmaped
 from the wide row user1:

 type Tweet struct {
 UserID  string `cf:Timeline key:UserID cols:When`
 Whenint64
 Author  string
 Bodystring
 }

 type GeoTweet struct {
 UserID  string `cf:Timeline key:UserID cols:When`
 Whenint64
 Author  string
 Bodystring
 Lat float32
 Lon float32
 }


 That's exactly how CQL3 works. In that example, you would declare:
 CREATE TABLE tweet (
 UserID text,
 When int,
 Author text,
 Body text,
 Lat float,
 Long float,
 PRIMARY KEY (UserId, When)
 )
 and that would layout things *exactly* like your Timeline above, but with
 validation.

 The fact that you have to declare Lat and Long does not mean that every
 CQL row must have them.


 much nicer behaviour for model changes and migrations.


 Not sure what you mean by that since adding new columns to a CQL3
 definition is basically free.

 --
 Sylvain




-- 
http://www.groupalia.com/Carlos CarrascoIT - Software Architect
Llull, 95-97, 2º planta, 08005 BarcelonaSkype: carlos.carrasco.groupalia
www.groupalia.comcarlos.carra...@groupalia.com


Re: Trigger and customized filter

2012-07-10 Thread Brian O'Neill
While Jonathan and crew work on the infrastructure to support triggers:
https://issues.apache.org/jira/browse/CASSANDRA-4285

We have a project going over here that provides a trigger-like capability:
https://github.com/hmsonline/cassandra-triggers/
https://github.com/hmsonline/cassandra-triggers/wiki/GettingStarted

We are working enhancements that would support synchronous triggers w/
javascript.
For now, they are processed asynchronously, and you implement a Java interface.

-brian

On Tue, Jul 10, 2012 at 9:24 AM, Felipe Schmidt felipef...@gmail.com wrote:
 Does anyone know something about the following questions?

 1. Does Cassandra support customized filter? customized filter means
 programmer can define his desired filter to select the data.
 2. Does Cassandra support trigger? trigger has the same meaning as in
 RDBMS.

 Thanks in advance.

 Regards,
 Felipe Mathias Schmidt
 (Computer Science UFRGS, RS, Brazil)






-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Re: Serious issue updating Cassandra version and topology

2012-07-10 Thread Michael Theroux
Hello Aaron,

Thank you for responding.  Since the time of my original email, we noticed that 
in the process of performing this upgrade that data was lost.  We have restored 
from backup and are now trying this again with two changes:

1) We will be using 1.1.2 throughout the cluster
2) We have switched back to Tiered compaction

In the process I've hit another very interesting issue that I will write a 
separate email about.

However, to answer your questions, this happened on the 1.1.2 node and it 
happened against after you ran the scrub.  The data has been around for a 
while.  We upgraded from 1.0.7 - 1.1.2.

Unfortunately, I can't check the sstables as we've restarted the migration from 
the beginning.  If it happens again, I'll respond with more information.  

Thanks again,
-Mike

On Jul 10, 2012, at 5:05 AM, aaron morton wrote:

 To be clear, this happened on a 1.1.2 node and it happened again *after* you 
 had run a scrub ? 
 
 Has this cluster been around for a while or was the data created with 1.1 ?
 
 Can you confirm that all sstables were re-written for the CF? Check the 
 timestamp on the files. Also also files should have the same version, the 
 -h?- part of the name.
 
 Can you repair the other CF's ? 
 
 If this cannot be repaired by scrub or upgradetables you may need to cut the 
 row out of the sstables. Using sstable2json and json2sstable. 
 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 8/07/2012, at 4:05 PM, Michael Theroux wrote:
 
 Hello,
 
 We're in the process of trying to move a 6-node cluster from RF=1 to RF=3. 
 Once our replication factor was upped to 3, we ran nodetool repair, and 
 immediately hit an issue on the first node we ran repair on:
 
 INFO 03:08:51,536 Starting repair command #1, repairing 2 ranges.
 INFO 03:08:51,552 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] new 
 session: will sync xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101, 
 /10.29.187.61 on range 
 (Token(bytes[d558]),Token(bytes[])]
  for x.[a, b, c, d, e, f, g, h, i, 
 j, k, l, m, n, o, p, q, r, s]
 INFO 03:08:51,555 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] requesting 
 merkle trees for a (to [/10.29.187.61, 
 xxx-xx-xx-xxx-compute-1.amazonaws.com/10.202.99.101])
 INFO 03:08:52,719 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Received 
 merkle tree for a from /10.29.187.61
 INFO 03:08:53,518 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Received 
 merkle tree for a from 
 xxx-xx-xx-xxx-.compute-1.amazonaws.com/10.202.99.101
 INFO 03:08:53,519 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] requesting 
 merkle trees for b (to [/10.29.187.61, 
 xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101])
 INFO 03:08:53,639 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Endpoints 
 /10.29.187.61 and xxx-xx-xx-xxx-132.compute-1.amazonaws.com/10.202.99.101 
 are consistent for a
 INFO 03:08:53,640 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] a is 
 fully synced (18 remaining column family to sync for this session)
 INFO 03:08:54,049 [repair #3e724fe0-c8aa-11e1--4f728ab9d6ff] Received 
 merkle tree for b from /10.29.187.61
 ERROR 03:09:09,440 Exception in thread Thread[ValidationExecutor:1,1,main]
 java.lang.AssertionError: row 
 DecoratedKey(Token(bytes[efd5654ce92a705b14244e2f5f73ab98c3de2f66c7adbd71e0e893997e198c47]),
  efd5654ce92a705b14244e2f5f73ab98c3de2f66c7adbd71e0e893997e198c47) received 
 out of order wrt 
 DecoratedKey(Token(bytes[f33a5ad4a45e8cac7987737db246ddfe9294c95bea40f411485055f5dbecbadb]),
  f33a5ad4a45e8cac7987737db246ddfe9294c95bea40f411485055f5dbecbadb)
  at 
 org.apache.cassandra.service.AntiEntropyService$Validator.add(AntiEntropyService.java:349)
  at 
 org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:712)
  at 
 org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:68)
  at 
 org.apache.cassandra.db.compaction.CompactionManager$8.call(CompactionManager.java:438)
  at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
  at java.util.concurrent.FutureTask.run(Unknown Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
 Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
 
 It looks from the log above, the sync of the a column family was 
 successful.  However, the b column family resulted in this error.  In 
 addition, the repair hung after this error.  We ran node tool scrub on all 
 nodes and invalidated the key and row caches and tried again (with RF=2), 
 and it didn't help alleviate the problem.
 
 Some other important pieces of information:
 We use ByteOrderedPartitioner (we MD5 hash the 

RE: Dynamic CF

2012-07-10 Thread Leonid Ilyevsky
I see. I actually tried it, and it consistently throws an exception. Below is 
my test code. I have two tests; test1 is for the composite key case, and test2 
is for the simple key. The test2 works fine, while test1 gives me:

Exception in thread main InvalidRequestException(why:Not enough bytes to read 
value of component 0)
at 
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20253)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:922)
at 
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:908)
at com.moon.cql.BatchTest.test1(BatchTest.java:99)
at com.moon.cql.BatchTest.main(BatchTest.java:45)


So you suggest to use BATCH statement. Since I do it from Java, it means 
creating a huge string (I may need to update thousands records at once), and 
executing it. Does it even make sense? Why is this going to be any better than 
simply execute prepared statement multiple times? The only thing it does is 
reduce number of calls to the server, but I have to figure out if this is the 
bottle neck I need to optimize.
Or maybe I need to break all my updates in a number of batches.
By the way, can a batch statement be prepared? With thousands of question marks 
in it?




public class BatchTest {

/**
 * @param args the command line arguments
 */
public static void main(String[] args) throws TTransportException,
InvalidRequestException, TException, UnavailableException,
TimedOutException {

String host = args[0];
int port = Integer.parseInt(args[1]);

test1(host, port);
//test2(host, port);
}

private static void test1(String host, int port) throws TTransportException,
InvalidRequestException, TException, UnavailableException,
TimedOutException {
TTransport transport =
new TFramedTransport(new org.apache.thrift.transport.TSocket(
host, port));
transport.open();
TProtocol protocol = new TBinaryProtocol(transport);
Cassandra.Client client = new Cassandra.Client(protocol);
client.set_cql_version(3.0.0);
client.set_keyspace(test);

MapByteBuffer, MapString, ListMutation mutationMap =
new HashMap();

MapString, ListMutation mutations = new HashMap();
ListMutation columnsMutations = new ArrayList();

// key
ByteBuffer keyBuffer = AsciiType.instance.decompose(KEY1);

// key1 as column
Column key1 = new Column();
key1.setName(key1.getBytes());
key1.setValue(LongType.instance.decompose(System.nanoTime()));
key1.setTimestamp(System.currentTimeMillis());
ColumnOrSuperColumn cc = new ColumnOrSuperColumn();
cc.setColumn(key1);
Mutation m = new Mutation();
m.setColumn_or_supercolumn(cc);
columnsMutations.add(m);

// value column
Column value = new Column();
value.setName(value.getBytes());
value.setValue(DoubleType.instance.decompose(5.3));
value.setTimestamp(System.currentTimeMillis());
cc = new ColumnOrSuperColumn();
cc.setColumn(value);
m = new Mutation();
m.setColumn_or_supercolumn(cc);
columnsMutations.add(m);

// Inner mutation map
mutations.put(testtable1, columnsMutations);

// outer map : use the partition key
mutationMap.put(keyBuffer, mutations);

// Execute
client.batch_mutate(mutationMap, ConsistencyLevel.ANY);
}

  private static void test2(String host, int port) throws 
TTransportException,
InvalidRequestException, TException, UnavailableException,
TimedOutException {
TTransport transport =
new TFramedTransport(new org.apache.thrift.transport.TSocket(
host, port));
transport.open();
TProtocol protocol = new TBinaryProtocol(transport);
Cassandra.Client client = new Cassandra.Client(protocol);
client.set_cql_version(3.0.0);
client.set_keyspace(test);

MapByteBuffer, MapString, ListMutation mutationMap =
new HashMap();

MapString, ListMutation mutations = new HashMap();
ListMutation columnsMutations = new ArrayList();

// key
ByteBuffer keyBuffer = AsciiType.instance.decompose(KEY1);

// value column
Column value = new Column();
value.setName(value.getBytes());
value.setValue(DoubleType.instance.decompose(5.3));
value.setTimestamp(System.currentTimeMillis());
ColumnOrSuperColumn cc = new ColumnOrSuperColumn();
cc.setColumn(value);
Mutation m = new Mutation();

reading deleted rows is super-slow

2012-07-10 Thread Thorsten von Eicken
We're finding that reading deleted columns can be very slow and I'm
trying to get confirmation for our analysis of what happens. We wrote
lots of data eons ago into fairly large rows (up to 1MB). We recently
read those rows and then deleted them. After this, we ran a
verification-type pass that attempts to re-read these rows and verifies
that they are indeed deleted. The interval between the deletion and
verification pass was far less than gc_grace. We noticed that the
verification pass took as much time as the readdelete pass(!), while
verifying the non-existence of rows that never existed is blindingly
fast in comparison. So it seems that cassandra is reading the old data,
reading the new tombstones, and then returning there is no data.
Functionally correct, but rather unexpected performance
characteristics... Am I missing something or is this expected?
Thanks!
Thorsten


Re: Composite Slice Query returning non-sliced data

2012-07-10 Thread Tyler Hobbs
I think in this case that's just Hector's way of setting the EOC byte for a
component.  My guess is that the composite isn't being structured correctly
through Hector, as well.

On Tue, Jul 10, 2012 at 4:40 AM, aaron morton aa...@thelastpickle.comwrote:


 The first thing that stands out is that (in cassandra) comparison
 operations are not used in a slice range.




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: reading deleted rows is super-slow

2012-07-10 Thread Tyler Hobbs
This is expected due to tombstones, which this explains pretty well:
http://wiki.apache.org/cassandra/DistributedDeletes

If you don't have any tombstones for the row, the bloom filter will let
Cassandra avoid doing any disk reads at all 99% of the time.

On Tue, Jul 10, 2012 at 10:50 AM, Thorsten von Eicken 
t...@rightscale.comwrote:

 We're finding that reading deleted columns can be very slow and I'm
 trying to get confirmation for our analysis of what happens. We wrote
 lots of data eons ago into fairly large rows (up to 1MB). We recently
 read those rows and then deleted them. After this, we ran a
 verification-type pass that attempts to re-read these rows and verifies
 that they are indeed deleted. The interval between the deletion and
 verification pass was far less than gc_grace. We noticed that the
 verification pass took as much time as the readdelete pass(!), while
 verifying the non-existence of rows that never existed is blindingly
 fast in comparison. So it seems that cassandra is reading the old data,
 reading the new tombstones, and then returning there is no data.
 Functionally correct, but rather unexpected performance
 characteristics... Am I missing something or is this expected?
 Thanks!
 Thorsten




-- 
Tyler Hobbs
DataStax http://datastax.com/


what is the best data model for time series of small data chunks...

2012-07-10 Thread Roland Hänel
Hi,

I have an application that consists of multiple (possible 1000's) of
measurement series, and each measurement series generates a small amount of
data output (only about 500 bytes) every 10 seconds. This time series of
data should be stored in Cassandra in a fashion that both read access is
possible for a given time range.

What I do today is
   - assign a timeuuid to each data output
   - write in two CF:
 - first CF has key = measurement series ID, column name =
timeuuid_of_output
 - second CF has key = timeuuid_of_output, column value = data
output (~ 500 bytes)

When someone requests a time range of data, I read the first CF, get a
series of timeuuid's, and then do a row-multiget on the second CF.

This works great, but tends to be slow for big series of data (lets say for
10 days, nearly 100,000 records will be requested from the second CF). This
load of 100,000 reads will be distributed through the cluster (because the
second CF scales very nicely with a RandomPartitioner), but more or less
one ends up with 100,000 individual read requests, at least that's what I
suspect.

Can anyone say if there is a better data model for this type of queries?
Would it be a reasonable improvement to put all data to a single CF with

   - single CF, key = measurement series ID, column name =
timeuuid_of_output, column value = data output

When I request a series of 100,000 columns from this row (now it's a single
row), can the performance really be better? Is there any chance that
Cassandra will be able to read this data en bloc from the hard drive?

Any advise is appreciated...

Greetings,
Roland


Re: what is the best data model for time series of small data chunks...

2012-07-10 Thread Tyler Hobbs
On Tue, Jul 10, 2012 at 12:14 PM, Roland Hänel rol...@haenel.me wrote:

 Hi,

 I have an application that consists of multiple (possible 1000's) of
 measurement series, and each measurement series generates a small amount of
 data output (only about 500 bytes) every 10 seconds. This time series of
 data should be stored in Cassandra in a fashion that both read access is
 possible for a given time range.

 What I do today is
- assign a timeuuid to each data output
- write in two CF:
  - first CF has key = measurement series ID, column name =
 timeuuid_of_output
  - second CF has key = timeuuid_of_output, column value = data
 output (~ 500 bytes)

 When someone requests a time range of data, I read the first CF, get a
 series of timeuuid's, and then do a row-multiget on the second CF.

 This works great, but tends to be slow for big series of data (lets say
 for 10 days, nearly 100,000 records will be requested from the second CF).
 This load of 100,000 reads will be distributed through the cluster (because
 the second CF scales very nicely with a RandomPartitioner), but more or
 less one ends up with 100,000 individual read requests, at least that's
 what I suspect.

 Can anyone say if there is a better data model for this type of queries?
 Would it be a reasonable improvement to put all data to a single CF with

- single CF, key = measurement series ID, column name =
 timeuuid_of_output, column value = data output

 When I request a series of 100,000 columns from this row (now it's a
 single row), can the performance really be better? Is there any chance that
 Cassandra will be able to read this data en bloc from the hard drive?


This is definitely the approach I would take.  Reading a single row is
nearly sequential, so you'll get very good performance.

I recommend you check these out:

- http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
- http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

-- 
Tyler Hobbs
DataStax http://datastax.com/


Using a node in separate cluster without decommissioning.

2012-07-10 Thread rohit bhatia
Hi

I want to take out 2 nodes from a 8 node cluster and use in another
cluster, but can't afford the overhead of streaming the data and
rebalance cluster. Since replication factor is 2 in first cluster, I
won't lose any data.

I'm planning to save my commit_log and data directories and
bootstrapping the node in the second cluster. Afterwards I'll just
replace both the directories and join the node back to the original
cluster.  This should work since cassandra saves all the cluster and
schema info in the system keyspace.

Is it advisable and safe to go ahead?

Thanks
Rohit


RE: Dynamic CF

2012-07-10 Thread Leonid Ilyevsky
I see now there is a package org.apache.cassandra.cql3.statements, with 
BatchStatement class. Is this what I should use?

-Original Message-
From: Leonid Ilyevsky [mailto:lilyev...@mooncapital.com]
Sent: Tuesday, July 10, 2012 11:45 AM
To: user@cassandra.apache.org
Subject: RE: Dynamic CF

I see. I actually tried it, and it consistently throws an exception. Below is 
my test code. I have two tests; test1 is for the composite key case, and test2 
is for the simple key. The test2 works fine, while test1 gives me:

Exception in thread main InvalidRequestException(why:Not enough bytes to read 
value of component 0)
at 
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:20253)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
at 
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:922)
at 
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:908)
at com.moon.cql.BatchTest.test1(BatchTest.java:99)
at com.moon.cql.BatchTest.main(BatchTest.java:45)


So you suggest to use BATCH statement. Since I do it from Java, it means 
creating a huge string (I may need to update thousands records at once), and 
executing it. Does it even make sense? Why is this going to be any better than 
simply execute prepared statement multiple times? The only thing it does is 
reduce number of calls to the server, but I have to figure out if this is the 
bottle neck I need to optimize.
Or maybe I need to break all my updates in a number of batches.
By the way, can a batch statement be prepared? With thousands of question marks 
in it?




public class BatchTest {

/**
 * @param args the command line arguments
 */
public static void main(String[] args) throws TTransportException,
InvalidRequestException, TException, UnavailableException,
TimedOutException {

String host = args[0];
int port = Integer.parseInt(args[1]);

test1(host, port);
//test2(host, port);
}

private static void test1(String host, int port) throws TTransportException,
InvalidRequestException, TException, UnavailableException,
TimedOutException {
TTransport transport =
new TFramedTransport(new org.apache.thrift.transport.TSocket(
host, port));
transport.open();
TProtocol protocol = new TBinaryProtocol(transport);
Cassandra.Client client = new Cassandra.Client(protocol);
client.set_cql_version(3.0.0);
client.set_keyspace(test);

MapByteBuffer, MapString, ListMutation mutationMap =
new HashMap();

MapString, ListMutation mutations = new HashMap();
ListMutation columnsMutations = new ArrayList();

// key
ByteBuffer keyBuffer = AsciiType.instance.decompose(KEY1);

// key1 as column
Column key1 = new Column();
key1.setName(key1.getBytes());
key1.setValue(LongType.instance.decompose(System.nanoTime()));
key1.setTimestamp(System.currentTimeMillis());
ColumnOrSuperColumn cc = new ColumnOrSuperColumn();
cc.setColumn(key1);
Mutation m = new Mutation();
m.setColumn_or_supercolumn(cc);
columnsMutations.add(m);

// value column
Column value = new Column();
value.setName(value.getBytes());
value.setValue(DoubleType.instance.decompose(5.3));
value.setTimestamp(System.currentTimeMillis());
cc = new ColumnOrSuperColumn();
cc.setColumn(value);
m = new Mutation();
m.setColumn_or_supercolumn(cc);
columnsMutations.add(m);

// Inner mutation map
mutations.put(testtable1, columnsMutations);

// outer map : use the partition key
mutationMap.put(keyBuffer, mutations);

// Execute
client.batch_mutate(mutationMap, ConsistencyLevel.ANY);
}

  private static void test2(String host, int port) throws 
TTransportException,
InvalidRequestException, TException, UnavailableException,
TimedOutException {
TTransport transport =
new TFramedTransport(new org.apache.thrift.transport.TSocket(
host, port));
transport.open();
TProtocol protocol = new TBinaryProtocol(transport);
Cassandra.Client client = new Cassandra.Client(protocol);
client.set_cql_version(3.0.0);
client.set_keyspace(test);

MapByteBuffer, MapString, ListMutation mutationMap =
new HashMap();

MapString, ListMutation mutations = new HashMap();
ListMutation columnsMutations = new ArrayList();

// key
ByteBuffer keyBuffer = AsciiType.instance.decompose(KEY1);

// value column
Column value = new Column();

Re: Composite Slice Query returning non-sliced data

2012-07-10 Thread Sunit Randhawa
I have tested this extensively and EOC has huge issue in terms of
usability of CompositeTypes in Cassandra.

As an example: If you have 2 Composite Columns such as A:B:C and A:D:C.

And if you do search on A:B as start and end Composite Components, it
will return D as well. Because it returns all the remaining columns
from your start range.

Similarly if you do search on A:D as start and end Composite
Components, it will not return B because the D comes after B.

Sadly, the information given here on intro to composite Types:
http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1
also does not work.

On Tue, Jul 10, 2012 at 9:24 AM, Tyler Hobbs ty...@datastax.com wrote:
 I think in this case that's just Hector's way of setting the EOC byte for a
 component.  My guess is that the composite isn't being structured correctly
 through Hector, as well.


 On Tue, Jul 10, 2012 at 4:40 AM, aaron morton aa...@thelastpickle.com
 wrote:


 The first thing that stands out is that (in cassandra) comparison
 operations are not used in a slice range.




 --
 Tyler Hobbs
 DataStax



Re: Composite Slice Query returning non-sliced data

2012-07-10 Thread Tyler Hobbs
On Tue, Jul 10, 2012 at 2:20 PM, Sunit Randhawa sunit.randh...@gmail.comwrote:

 I have tested this extensively and EOC has huge issue in terms of
 usability of CompositeTypes in Cassandra.

 As an example: If you have 2 Composite Columns such as A:B:C and A:D:C.

 And if you do search on A:B as start and end Composite Components, it
 will return D as well. Because it returns all the remaining columns
 from your start range.


That shouldn't be happening, and I can test that it works correctly using
pycassa.  So I suspect a problem with Hector.



 Similarly if you do search on A:D as start and end Composite
 Components, it will not return B because the D comes after B.


This is expected behavior.



 Sadly, the information given here on intro to composite Types:
 http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1
 also does not work.

 On Tue, Jul 10, 2012 at 9:24 AM, Tyler Hobbs ty...@datastax.com wrote:
  I think in this case that's just Hector's way of setting the EOC byte
 for a
  component.  My guess is that the composite isn't being structured
 correctly
  through Hector, as well.
 
 
  On Tue, Jul 10, 2012 at 4:40 AM, aaron morton aa...@thelastpickle.com
  wrote:
 
 
  The first thing that stands out is that (in cassandra) comparison
  operations are not used in a slice range.
 
 
 
 
  --
  Tyler Hobbs
  DataStax
 




-- 
Tyler Hobbs
DataStax http://datastax.com/


help using org.apache.cassandra.cql3

2012-07-10 Thread Leonid Ilyevsky
I am trying to use the org.apache.cassandra.cql3 package. Having problem 
connecting to the server using ClientState.
I was not sure what to put in the credentials map (I did not set any 
users/passwords on my server), so I tried setting empty strings for username 
and password, setting them to bogus values, passing null to the login method 
- there was no difference.
It does not complain at the login(), but then it complains about 
setKeyspace(my keyspace), saying that the specified keyspace does not exist 
(it obviously does exist).
The configuration was loaded from cassandra.yaml used by the server.

I did not have any problem like this when I used 
org.apache.cassandra.thrift.Cassandra.Client .

What am I doing wrong?

Appreciate your help,

Leonid



This email, along with any attachments, is confidential and may be legally 
privileged or otherwise protected from disclosure. Any unauthorized 
dissemination, copying or use of the contents of this email is strictly 
prohibited and may be in violation of law. If you are not the intended 
recipient, any disclosure, copying, forwarding or distribution of this email is 
strictly prohibited and this email and any attachments should be deleted 
immediately. This email and any attachments do not constitute an offer to sell 
or a solicitation of an offer to purchase any interest in any investment 
vehicle sponsored by Moon Capital Management LP (Moon Capital). Moon Capital 
does not provide legal, accounting or tax advice. Any statement regarding 
legal, accounting or tax matters was not intended or written to be relied upon 
by any person as advice. Moon Capital does not waive confidentiality or 
privilege as a result of this email.


failed to delete commitlog, cassandra can't accept writes

2012-07-10 Thread Frank Hsueh
after reading the JIRA, I decided to use Java 6.

with Casandra 1.1.2 on Java 6 x64 on Win7 sp1 x64 (all latest versions),
after a several minutes of sustained writes, I see:

from system.log:

java.io.IOError: java.io.IOException: Failed to delete
C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log
at
org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:176)
at
org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
at
org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Failed to delete
C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log
at
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
at
org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
... 4 more


anybody seen this before?  is this related to 4337 ?




On Sat, Jul 7, 2012 at 6:36 PM, Frank Hsueh frank.hs...@gmail.com wrote:

 bug already reported:

 https://issues.apache.org/jira/browse/CASSANDRA-4337



 On Sat, Jul 7, 2012 at 6:26 PM, Frank Hsueh frank.hs...@gmail.com wrote:

 Hi,

 I'm running Casandra 1.1.2 on Java 7 x64 on Win7 sp1 x64 (all latest
 versions).  If it matters, I'm using a late version of Astyanax as my
 client.

 I'm using 4 threads to write a lot of data into a single CF.

 After several minutes of load (~ 30m at last incident), Cassandra stops
 accepting writes (client reports an OperationTimeoutException).  I looked
 at the logs and I see on the Cassandra server:

 
 ERROR 18:00:42,807 Exception in thread Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.io.IOError: java.io.IOException: Rename from
 \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to
 703272597990002 failed
 at
 org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:127)
 at
 org.apache.cassandra.db.commitlog.CommitLogSegment.recycle(CommitLogSegment.java:204)
 at
 org.apache.cassandra.db.commitlog.CommitLogAllocator$2.run(CommitLogAllocator.java:166)
 at
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
  at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.io.IOException: Rename from
 \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to
 703272597990002 failed
 at
 org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:105)
 ... 5 more
 

 Anybody else seen this before ?


 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 Frank Hsueh | frank.hs...@gmail.com




-- 
Frank Hsueh | frank.hs...@gmail.com


Re: failed to delete commitlog, cassandra can't accept writes

2012-07-10 Thread Frank Hsueh
oops; I missed log line:


ERROR [COMMIT-LOG-ALLOCATOR] 2012-07-10 14:19:39,776
AbstractCassandraDaemon.java (line 134) Exception in thread
Thread[COMMIT-LOG-ALLOCATOR,5,main]
java.io.IOError: java.io.IOException: Failed to delete
C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log
at
org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:176)
at
org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
at
org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Failed to delete
C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log
at
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
at
org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
... 4 more



On Tue, Jul 10, 2012 at 2:35 PM, Frank Hsueh frank.hs...@gmail.com wrote:

 after reading the JIRA, I decided to use Java 6.

 with Casandra 1.1.2 on Java 6 x64 on Win7 sp1 x64 (all latest versions),
 after a several minutes of sustained writes, I see:

 from system.log:
 
 java.io.IOError: java.io.IOException: Failed to delete
 C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log
 at
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:176)
  at
 org.apache.cassandra.db.commitlog.CommitLogAllocator$4.run(CommitLogAllocator.java:223)
 at
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
  at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.lang.Thread.run(Thread.java:662)
 Caused by: java.io.IOException: Failed to delete
 C:\var\lib\cassandra\commitlog\CommitLog-948695923996466.log
 at
 org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
  at
 org.apache.cassandra.db.commitlog.CommitLogSegment.discard(CommitLogSegment.java:172)
 ... 4 more
 

 anybody seen this before?  is this related to 4337 ?




 On Sat, Jul 7, 2012 at 6:36 PM, Frank Hsueh frank.hs...@gmail.com wrote:

 bug already reported:

 https://issues.apache.org/jira/browse/CASSANDRA-4337



 On Sat, Jul 7, 2012 at 6:26 PM, Frank Hsueh frank.hs...@gmail.comwrote:

 Hi,

 I'm running Casandra 1.1.2 on Java 7 x64 on Win7 sp1 x64 (all latest
 versions).  If it matters, I'm using a late version of Astyanax as my
 client.

 I'm using 4 threads to write a lot of data into a single CF.

 After several minutes of load (~ 30m at last incident), Cassandra stops
 accepting writes (client reports an OperationTimeoutException).  I looked
 at the logs and I see on the Cassandra server:

 
 ERROR 18:00:42,807 Exception in thread
 Thread[COMMIT-LOG-ALLOCATOR,5,main]
 java.io.IOError: java.io.IOException: Rename from
 \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to
 703272597990002 failed
 at
 org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:127)
 at
 org.apache.cassandra.db.commitlog.CommitLogSegment.recycle(CommitLogSegment.java:204)
 at
 org.apache.cassandra.db.commitlog.CommitLogAllocator$2.run(CommitLogAllocator.java:166)
 at
 org.apache.cassandra.db.commitlog.CommitLogAllocator$1.runMayThrow(CommitLogAllocator.java:95)
  at
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.io.IOException: Rename from
 \var\lib\cassandra\commitlog\CommitLog-701533048437587.log to
 703272597990002 failed
 at
 org.apache.cassandra.db.commitlog.CommitLogSegment.init(CommitLogSegment.java:105)
 ... 5 more
 

 Anybody else seen this before ?


 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 Frank Hsueh | frank.hs...@gmail.com




 --
 Frank Hsueh | frank.hs...@gmail.com




-- 
Frank Hsueh | frank.hs...@gmail.com


Re: help using org.apache.cassandra.cql3

2012-07-10 Thread Derek Williams
On Tue, Jul 10, 2012 at 3:04 PM, Leonid Ilyevsky
lilyev...@mooncapital.comwrote:

  I am trying to use the org.apache.cassandra.cql3 package. Having problem
 connecting to the server using ClientState.

 I was not sure what to put in the credentials map (I did not set any
 users/passwords on my server), so I tried setting empty strings for
 “username” and “password”, setting them to bogus values, passing null to
 the login method – there was no difference.

 It does not complain at the login(), but then it complains about
 setKeyspace(my keyspace), saying that the specified keyspace does not
 exist (it obviously does exist).

 The configuration was loaded from cassandra.yaml used by the server.

 ** **

 I did not have any problem like this when I used
 org.apache.cassandra.thrift.Cassandra.Client .

 ** **

 What am I doing wrong?

 **


I think that package just contains server classes. Everything you need
should be in org.apache.cassandra.thrift.

To use cql3 I just use the client methods 'execute_cql_query',
'prepare_cql_query' and 'execute_prepared_cql_query', after setting cql
version to '3.0.0'.


-- 
Derek Williams


Re: Multiple keyspace question

2012-07-10 Thread Edward Capriolo
A problem of many keyspaces is clients are bound to a keyspace so
connection pooling multiple keyspaces is an issue. Cql has support for some
limited cross keyspace operations.

On Sunday, July 8, 2012, aaron morton aa...@thelastpickle.com wrote:
 I would do a test to see the latency difference under load between having
1 KS with 5 CF's and 50 KS with 5 CF's.
 Your test will need to read and write to all the CF's. Having many CF's
may result in more frequent memtables flushes.
 (Personally it's not an approach I would take.)
 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 7/07/2012, at 8:15 AM, Shahryar Sedghi wrote:

 Aaron

 I am going to have many (over 50 eventually) keyspaces with limited
number of CFs (5-6) do you think this one can cause a problem too.

 Thanks

 On Fri, Jul 6, 2012 at 2:28 PM, aaron morton aa...@thelastpickle.com
wrote:

 Also, all CF's in the same KS share one commit log. So all writes for the
row row key, across all CF's, are committed at the same time.
 Some other settings, such as caches in 1.1, are machine wide.
 If you have a small KS for something like app config, I'd say go with
whatever feels right. If you are talking about two full application KS's
I would think about their prospective workloads and growth patterns. Will
you always want to manage the two together ?
 Cheers
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 On 6/07/2012, at 9:47 PM, Robin Verlangen wrote:

 Hi Ben,
 The amount of keyspaces is not the problem: the amount of column families
is. Each column family adds a certain amount of memory usage to the system.
You can cope with this by adding memory or using generic column families
that store different types of data.
 With kind regards,
 Robin Verlangen
 Software engineer
 W http://www.robinverlangen.nl
 E ro...@us2.nl
 Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.
 2012/7/6 Ben Kaehne ben.kae...@sirca.org.au

 Good evening,
 I have read multiple keyspaces are bad before in a few discussions, but
to what extent?
 We have some reasonably powerful machines and looking to host
an additional (currently we have 1) 2 keyspaces within our cassandra
cluster (of 3 nodes, using RF3).
 At what point does adding extra keyspaces start becoming an issue? Is
there anything special we should be considering or watching out for as we
implement this?
 I could not imagine that all cassandra users out there are running one
massive keyspace, and at the same time can not imaging that all cassandra
users have multiple clusters just to host different keyspaces.
 Regards.
 --
 -Ben





Cassandra take 100% CPU for 2~3 minutes every half an hour and mutation lost

2012-07-10 Thread Jason Tang
Hi

I encounter the High CPU problem, Cassandra 1.0.3, happened on both
sized and leveled compaction, 6G heap, 64bit Oracle java. For normal
traffic, Cassandra will use 15% CPU.

But every half a hour, Cassandra will use almost 100% total cpu (SUSE,
12 Core).

And here is the top information for that moment.

#top -H -p 12451

top - 12:30:14 up 15 days, 12:49,  6 users,  load average: 10.52, 8.92, 8.14
Tasks: 706 total,  21 running, 685 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.7%us, 14.0%sy, 48.9%ni,  6.5%id,  0.0%wa,  0.0%hi,  4.9%si,
 0.0%st
Mem: 24150M total,12218M used,11932M free,  142M buffers
Swap:0M total,0M used,0M free, 3714M cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
20291 casadm24   4 8003m 5.4g 167m R   92 22.7   0:42.46 java
20276 casadm24   4 8003m 5.4g 167m R   88 22.7   0:43.88 java
20181 casadm24   4 8003m 5.4g 167m R   86 22.7   0:52.97 java
20213 casadm24   4 8003m 5.4g 167m R   85 22.7   0:49.21 java
20188 casadm24   4 8003m 5.4g 167m R   82 22.7   0:54.34 java
20268 casadm24   4 8003m 5.4g 167m R   81 22.7   0:46.25 java
20269 casadm24   4 8003m 5.4g 167m R   41 22.7   0:15.11 java
20316 casadm24   4 8003m 5.4g 167m S   20 22.7   0:02.35 java
20191 casadm24   4 8003m 5.4g 167m R   15 22.7   0:16.85 java
12500 casadm20   0 8003m 5.4g 167m R6 22.7   1:07.86 java
15245 casadm20   0 8003m 5.4g 167m D5 22.7   0:36.45 java

Jstack can not print the stack.
Thread 20291: (state = IN_JAVA)
Error occurred during stack walking:
...
Thread 20276: (state = IN_JAVA)
Error occurred during stack walking:

After it come back, the stack shows:
Thread 20291: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information
may be imprecise)
 - java.util.concurrent.locks.LockSupport.parkNanos(java.lang.Object, long)
@bci=20, line=196 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(java.util.concurrent.SynchronousQueue$TransferStack$SNode,
boolean, long) @bci=174, line=424 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
boolean, long) @bci=102, line=323 (Compiled frame)
 - java.util.concurrent.SynchronousQueue.poll(long,
java.util.concurrent.TimeUnit) @bci=11, line=874 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.getTask() @bci=62, line=945
(Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=18, line=907
(Compiled frame)
 - java.lang.Thread.run() @bci=11, line=662 (Interpreted frame

And after this happened, the data is not correct, some
large column which suppose to be deleted, come back again.
Here is the suspect thread when it use up 100%
Thread 20191: (state = IN_VM)
 - sun.misc.Unsafe.unpark(java.lang.Object) @bci=0 (Compiled frame;
information may be imprecise)
 - java.util.concurrent.locks.LockSupport.unpark(java.lang.Thread) @bci=8,
line=122 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack$SNode.tryMatch(java.util.concurrent.SynchronousQueue$TransferStack$SNode)
@bci=34, line=242 (Compiled frame)
 -
java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.lang.Object,
boolean, long) @bci=268, line=344 (Compiled frame)
 - java.util.concurrent.SynchronousQueue.offer(java.lang.Object) @bci=19,
line=846 (Compiled frame)
 - java.util.concurrent.ThreadPoolExecutor.execute(java.lang.Runnable)
@bci=43, line=653 (Compiled frame)
 -
java.util.concurrent.AbstractExecutorService.submit(java.util.concurrent.Callable)
@bci=20, line=92 (Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getCompactedRow(java.util.List)
@bci=86, line=190 (Compiled frame) -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced()
@bci=31, line=164 (Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer.getReduced()
@bci=1, line=144 (Compiled frame)
 - org.apache.cassandra.utils.MergeIterator$ManyToOne.consume() @bci=88,
line=116 (Compiled frame)
 - org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext() @bci=5,
line=99 (Compiled frame)
 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9,
line=140 (Compiled frame)
 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135
(Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext()
@bci=4, line=103 (Compiled frame)
 -
org.apache.cassandra.db.compaction.ParallelCompactionIterable$Unwrapper.computeNext()
@bci=1, line=90 (Compiled frame)
 - com.google.common.collect.AbstractIterator.tryToComputeNext() @bci=9,
line=140 (Compiled frame)
 - com.google.common.collect.AbstractIterator.hasNext() @bci=61, line=135
(Compiled frame)
 - com.google.common.collect.Iterators$7.computeNext() @bci=4, line=614
(Compiled frame)
 -