GCInspector works every 10 seconds!

2012-06-17 Thread Jason Tang
Hi

   After running load testing for 24 hours(insert, update and delete), now
no new traffic to Cassandra, but Cassnadra shows still have high load(CPU
usage), from the system.log, it shows it always perform GC. I don't know
why it work as that, seems memory is not low.

Here is some configuration and log, where I can find the clue why Cassandra
works as this?

cassandra.yaml
disk_access_mode: mmap_index_only

#  /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 tpstats
Pool NameActive   Pending  Completed   Blocked  All
time blocked
ReadStage 0 045387558 0
0
RequestResponseStage  0 096568347 0
0
MutationStage0 060215102 0
0
ReadRepairStage0 0  0
0 0
ReplicateOnWriteStage   0 0  0   0
0
GossipStage  0 0 399012
 0 0
AntiEntropyStage   0 0  0
 0 0
MigrationStage   0 0 30
  0 0
MemtablePostFlusher 0 0 279 0
  0
StreamStage  0 0  0
  0 0
FlushWriter0 0 1846
  0  1052
MiscStage 0 0  0
 0 0
InternalResponseStage   0 0  00
0
HintedHandoff 0 0  5
 0 0

Message type   Dropped
RANGE_SLICE  0
READ_REPAIR  0
BINARY   0
READ 1
MUTATION  1390
REQUEST_RESPONSE 0


 # /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 info
Token: 56713727820156410577229101238628035242
Gossip active: true
Load : 37.57 GB
Generation No: 1339813956
Uptime (seconds) : 120556
Heap Memory (MB) : 3261.14 / 5984.00
Data Center  : datacenter1
Rack : rack1
Exceptions   : 0


 INFO [ScheduledTasks:1] 2012-06-17 19:47:36,633 GCInspector.java (line
123) GC for ParNew: 222 ms for 1 collections, 2046077640 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:48:41,714 GCInspector.java (line
123) GC for ParNew: 262 ms for 1 collections, 2228128408 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:48:49,717 GCInspector.java (line
123) GC for ParNew: 237 ms for 1 collections, 2390412728 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:48:57,719 GCInspector.java (line
123) GC for ParNew: 223 ms for 1 collections, 2508702896 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:50:01,988 GCInspector.java (line
123) GC for ParNew: 232 ms for 1 collections, 2864574832 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:50:10,075 GCInspector.java (line
123) GC for ParNew: 208 ms for 1 collections, 2964629856 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:50:21,078 GCInspector.java (line
123) GC for ParNew: 258 ms for 1 collections, 3149127368 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:51:26,095 GCInspector.java (line
123) GC for ParNew: 213 ms for 1 collections, 3421495400 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:51:34,097 GCInspector.java (line
123) GC for ParNew: 218 ms for 1 collections, 3543978312 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:37,229 GCInspector.java (line
123) GC for ParNew: 221 ms for 1 collections, 375229 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:37,230 GCInspector.java (line
123) GC for ConcurrentMarkSweep: 206 ms for 1 collections, 3752313400 used;
max is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:46,507 GCInspector.java (line
123) GC for ParNew: 243 ms for 1 collections, 3663162192 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:52:54,510 GCInspector.java (line
123) GC for ParNew: 283 ms for 1 collections, 1582282248 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:54:01,704 GCInspector.java (line
123) GC for ParNew: 235 ms for 1 collections, 1935534800 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:55:13,747 GCInspector.java (line
123) GC for ParNew: 233 ms for 1 collections, 2356975504 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:55:21,749 GCInspector.java (line
123) GC for ParNew: 264 ms for 1 collections, 2530976328 used; max
is 6274678784
 INFO [ScheduledTasks:1] 2012-06-17 19:55:29,794 GCInspector.java (line
123) GC for ParNew: 224 ms for 1 collections, 2592311336 used; max
is 6274678784


BRs
//Ares


Re: GCInspector works every 10 seconds!

2012-06-17 Thread Jason Tang
Hi

   After I change log level to DEBUG, I found some log.

  Although we don't have traffic to Cassandra, but we have scheduled the
task to perform the sliceQuery.

  We use time-stamp as the index, we will perform the query by every second
to check if we have tasks to do.

  After 24 hours, we have 40G data in Cassandra, and we configure
Casssandra as Max JVM Heap 6G, memtable 1G, disk_access_mode:
mmap_index_only.

  It is also strange that although no data in Cassandra can fulfill the
query conditions, but it takes more time if we have more data in Cassandra.

  Because we total have 20 million records in Cassandra which has time
stamp as the index, and we query by MultigetSubSliceQuery, and set the
range the value which not match any data in Cassnadra, So it suppose to
return fast, but as we have 20 million data, it takes 2 seconds to get the
query result.

  Is the GC caused by the scheduled query operation, and why it takes so
many memory. Could we improve it?

System.log:
 INFO [ScheduledTasks:1] 2012-06-17 20:17:13,574 GCInspector.java (line
123) GC for ParNew: 559 ms for 1 collections, 3258240912 used; max
is 6274678784
DEBUG [ReadStage:99] 2012-06-17 20:17:25,563 SliceQueryFilter.java (line
123) collecting 0 of 5000:
0138ad1035880137f3372f3e0e28e3b6:false:36@1339815309124015
DEBUG [ReadStage:99] 2012-06-17 20:17:25,565 ReadVerbHandler.java (line 60)
Read key 3331; sending response to 158060445@/192.168.0.3
DEBUG [ReadStage:96] 2012-06-17 20:17:25,845 SliceQueryFilter.java (line
123) collecting 0 of 5000:
0138ad1035880137f33a80cf6cb5d383:false:36@1339815526613007
DEBUG [ReadStage:96] 2012-06-17 20:17:25,847 ReadVerbHandler.java (line 60)
Read key 3233; sending response to 158060447@/192.168.0.3
DEBUG [ReadStage:105] 2012-06-17 20:17:25,952 SliceQueryFilter.java (line
123) collecting 0 of 5000:
0138ad1035880137f330cd70c86690cd:false:36@1339814890872015
DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line
75) digest is d41d8cd98f00b204e9800998ecf8427e
DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line
60) Read key 3139; sending response to 158060448@/192.168.0.3
DEBUG [ReadStage:89] 2012-06-17 20:17:25,959 CollationController.java (line
191) collectAllData
DEBUG [ReadStage:108] 2012-06-17 20:17:25,959 CollationController.java
(line 191) collectAllData
DEBUG [ReadStage:107] 2012-06-17 20:17:25,959 CollationController.java
(line 191) collectAllData
DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408920e049c22:true:4@1339865451865018
DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408a0eeab052a:true:4@1339865451866000
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408b1319577c9:true:4@1339865451867003
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f63408c081e0b8a3:true:4@1339865451867004
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340deefb8a0627:true:4@1339865451920001
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340df9c21e9979:true:4@1339865451923002
DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340e095ead1498:true:4@1339865451928000
DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340e1af16cf151:true:4@1339865451935000
DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line
123) collecting 0 of 5000:
7fff0137f6340e396cfdc9fa:true:4@133986545195


BRs
//Ares

2012/6/17 Jason Tang ares.t...@gmail.com

 Hi

After running load testing for 24 hours(insert, update and delete), now
 no new traffic to Cassandra, but Cassnadra shows still have high load(CPU
 usage), from the system.log, it shows it always perform GC. I don't know
 why it work as that, seems memory is not low.

 Here is some configuration and log, where I can find the clue why
 Cassandra works as this?

 cassandra.yaml
 disk_access_mode: mmap_index_only

  #  /opt/cassandra/bin/nodetool -h 127.0.0.1 -p 6080 tpstats
 Pool NameActive   Pending  Completed   Blocked
  All time blocked
 ReadStage 0 045387558
 0 0
 RequestResponseStage  0 096568347 0
   0
 MutationStage0 060215102 0
 0
 ReadRepairStage0 0  0
   0 0
 ReplicateOnWriteStage   0 0  0   

Re: cassandra as a client in zookeeper

2012-06-17 Thread aaron morton
Why do you want to do that ?

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/06/2012, at 7:43 PM, Ambes Hagos wrote:

 Dear all,
 I am using cassandra as distributed data base.
 I want to register my cluster in a zookeeper.
 Do you have quick suggestion (already done) how to do it.
 
 greetings
 Ambes



Re: Help with configuring replication

2012-06-17 Thread aaron morton
Some docs here http://www.datastax.com/docs/1.0/initialize/index

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/06/2012, at 8:14 AM, Leonid Ilyevsky wrote:

 Before going into complex clustering topologies, I would like to try the most 
 simple configuration: just set up two nodes that will completely replicate 
 each other.
 Could somebody tell me how to configure it?
  
 Thanks!
 
 This email, along with any attachments, is confidential and may be legally 
 privileged or otherwise protected from disclosure. Any unauthorized 
 dissemination, copying or use of the contents of this email is strictly 
 prohibited and may be in violation of law. If you are not the intended 
 recipient, any disclosure, copying, forwarding or distribution of this email 
 is strictly prohibited and this email and any attachments should be deleted 
 immediately. This email and any attachments do not constitute an offer to 
 sell or a solicitation of an offer to purchase any interest in any investment 
 vehicle sponsored by Moon Capital Management LP (“Moon Capital”). Moon 
 Capital does not provide legal, accounting or tax advice. Any statement 
 regarding legal, accounting or tax matters was not intended or written to be 
 relied upon by any person as advice. Moon Capital does not waive 
 confidentiality or privilege as a result of this email.



Re: Cassandra out of Heap memory

2012-06-17 Thread aaron morton
Not commenting on the GC advice but Cassandra memory usage has improved a lot 
since that was written. I would take a look at what was happening and see if 
tweeking Cassandra config helped before modifying GC settings.

 GCInspector.java(line 88): Heap is .9934 full. Is this expected? or
 should I adjust my flush_largest_memtable_at variable.
flush_largetsmemtable_at is a a safety valve only. Reducing it may help avid 
OOM, by it will not treat the cause. 

What version are you using ? 

1.0.0 had a an issue where deletes were not taken into consideration 
(https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L33) but this does 
not sound like the same problem. 

Take a look in the logs on the machine and see if it was associated with a 
compaction or repair operation. 

I would also consider experimenting on one node with 8GB / 800MB heap sizes. 
More is not always better. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/06/2012, at 8:05 PM, rohit bhatia wrote:

 Looking at http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html
 and server logs, I think my situation is this
 
 The default cassandra settings has the highest peak heap usage. The
 problem with this is that it raises the possibility that during the
 CMS cycle, a collection of the young generation runs out of memory to
 migrate objects to the old generation (a so-called concurrent mode
 failure), leading to stop-the-world full garbage collection. However,
 with a slightly lower setting of the CMS threshold, we get a bit more
 headroom, and more stable overall performance.
 
 I see concurrentMarkSweep system.log Entries trying to gc 2-4 collections.
 
 Any suggestions for preemptive measure for this would be welcome.



Question on SSTable Configuration Parameters......

2012-06-17 Thread Jayesh Thakrar
Hi All,

I am a getting started with Cassandra and have been reading the O'Reilly Book 
and some other documentation.
I understand that data is persisted in SSTable files.

Where can I find parameters to control the SSTable files - e.g. their min/max 
sizes, etc.
I looked up http://wiki.apache.org/cassandra/StorageConfiguration and some 
other places do not find any such parameters.

Also, when reading the book and some other examples on backups, it seems that 
when a column family is backed up, its all contained in a single data file.
Is that because the examples did not have much data or is that the case even 
when you have hundreds of GB of data for a column family on a node in a cluster?

Also, are incremental backups possible ? Where can I find examples of that?

Thanks a lot in advance,

Jayesh Thakrar

Re: Composite as row key

2012-06-17 Thread aaron morton
 Row key is a combo of 2 uuid, the first it's the user's uuid, if i want a 
 select of all the watchdog entrys of a user.how can i do? is it 
 possible? I justk know user uuid, the other part of key is unknow uuid.
No. 
You would be doing a range scan (from key foo to key blah) and would run into 
this http://wiki.apache.org/cassandra/FAQ#range_rp

An alternative is to push the watchdog ID down into the column names thusly…

CREATE COLUMN FAMILY UserWatchdogs
WITH key_validation_class = 'LexicalUUIDType'
AND comparator ='CompositeType(LexicalUUIDType, UTF8Type)'
;

But that means you cannot create the secondary index because you wont know the 
name of the columns before hand. 

wrt to the searching do you want to the search be global for all Watchdog 
entries or scoped to a user ?

You can make a custom secondary index such as 

CREATE COLUMN FAMILY UserWatchdogIndex
WITH key_validation_class = ''CompositeType(LexicalUUIDType, UTF8Type)' 
- (user_id, column_name)  
AND comparator ='CompositeType(UTF8Type, LexicalUUIDType)' - 
(column_value, watchdog id)
;


 If i do with a supercolumn i can use secondary indexes, if key is composite 
 there is no way for select all data related to a user...

Secondary indexes are not support on Super Columns. 

Hope that helps. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/06/2012, at 10:28 PM, Juan Ezquerro wrote:

 I have a columnfamily like:
 
 CREATE COLUMN FAMILY Watchdog
 WITH key_validation_class = 
 'CompositeType(LexicalUUIDType,LexicalUUIDType)'
 AND comparator = UTF8Type
 AND column_metadata = [
 {column_name: error_code, validation_class: UTF8Type, index_type: 
 KEYS}
 {column_name: line, validation_class: IntegerType}
 {column_name: file_path, validation_class: UTF8Type}
 {column_name: function, validation_class: UTF8Type}
 {column_name: content, validation_class: UTF8Type}
 {column_name: additional_data, validation_class: UTF8Type}
 {column_name: date_created, validation_class: DateType, index_type: 
 KEYS}
 {column_name: priority, validation_class: IntegerType, index_type: 
 KEYS}
 ];
 
 Row key is a combo of 2 uuid, the first it's the user's uuid, if i want a 
 select of all the watchdog entrys of a user.how can i do? is it 
 possible? I justk know user uuid, the other part of key is unknow uuid.
 
 The idea is simple, i have a user and i want all the records on watchdog, and 
 i want secondary index to do search.very simple with mysql but here i 
 can't find the way.
 
 If i do with a supercolumn i can use secondary indexes, if key is composite 
 there is no way for select all data related to a user...
 
 The ugly way:
 
 CREATE COLUMN FAMILY Watchdog
 WITH key_validation_class = LexicalUUIDType
 AND comparator = UTF8Type
 AND column_metadata = [
   {column_name: user_uuid, validation_class: LexicalUUIDType, index_type: 
 KEYS}
 {column_name: error_code, validation_class: UTF8Type, index_type: 
 KEYS}
 {column_name: line, validation_class: IntegerType}
 {column_name: file_path, validation_class: UTF8Type}
 {column_name: function, validation_class: UTF8Type}
 {column_name: content, validation_class: UTF8Type}
 {column_name: additional_data, validation_class: UTF8Type}
 {column_name: date_created, validation_class: DateType, index_type: 
 KEYS}
 {column_name: priority, validation_class: IntegerType, index_type: 
 KEYS}
 ];
 
 But i think that is not a nice solution because y always need to search in 
 all rows of very big tables to take all user's data...
 
 Please can help?
 
 Thanks.
 
 -- 
 Juan Ezquerro LLanes Sofistic Team
 
 Telf: 618349107/964051479
 



Re: MurmurHash NPE during compaction

2012-06-17 Thread aaron morton
Can you please create a ticket on 
https://issues.apache.org/jira/browse/CASSANDRA

Please include:
* CF definition including the bloom_filter_fp_chance
* If the data was upgraded from a previous version of cassandra. 
* The names of the files that were being compacted. 

As a work around you can try using nodetool upgradetables to re-write the files 
- this may also fail, but its could be worth trying. 

The next step would be to remove determine which files were causing the issue 
(looking at the logs) and remove them from the data directory. Then run repair 
to restore consistency. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 14/06/2012, at 11:38 PM, Ravikumar Govindarajan wrote:

 We received the following NPE during compaction of a large row. We are on 
 cassandra-1.0.7. Need some help here to find the root cause of the issue
 
  ERROR [CompactionExecutor:595] 2012-06-13 09:44:46,718 
 AbstractCassandraDaemon.java (line 139) Fatal exception in thread 
 Thread[CompactionExecutor:595,1,main]
 java.lang.NullPointerException
 at org.apache.cassandra.utils.MurmurHash.hash64(MurmurHash.java:102)
 at 
 org.apache.cassandra.utils.BloomFilter.getHashBuckets(BloomFilter.java:103)
 at 
 org.apache.cassandra.utils.BloomFilter.getHashBuckets(BloomFilter.java:92)
 at org.apache.cassandra.utils.BloomFilter.add(BloomFilter.java:114)
 at 
 org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:96)
 at 
 org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:51)
 at 
 org.apache.cassandra.db.compaction.PrecompactedRow.write(PrecompactedRow.java:135)
 at 
 org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:160)
 at 
 org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:134)
 at 
 org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
 at java.lang.Thread.run(Thread.java:619)
 
 Thanks and Regards,
 Ravi



Re: is this something to be concerned about - MUTATION message dropped

2012-06-17 Thread aaron morton
http://wiki.apache.org/cassandra/FAQ#dropped_messages

https://www.google.com/#q=cassandra+dropped+messages

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/06/2012, at 12:54 AM, Poziombka, Wade L wrote:

 INFO [ScheduledTasks:1] 2012-06-14 07:49:54,355 MessagingService.java (line 
 615) 15 MUTATION message dropped in last 5000ms
  
 It is at INFO level so I’m inclined to think not but is seems like whenever 
 messages are dropped there may be some issue?



Re: Supercolumn behavior on writes

2012-06-17 Thread aaron morton
Writing to a super column family does not involve deserialisation, other than 
writing to the commit log it is an in memory operation.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/06/2012, at 3:36 AM, Greg Fausak wrote:

 Derek,
 
 Thanks for that!
 
 Yes, I am aware of that technique.  I am currently using something
 very similar on an sql database.  I think one of the great benefits with
 Cassandra is that you can invent these on the fly.  I also think there
 is great benefit to keep all of the columns in the same row.
 
 Anyway, I didn't mean to hijack Oleg's thread.  I am interested
 in the original question about the serialization/deserialization on write.
 Does anybody know?
 
 -g
 
 
 On Wed, Jun 13, 2012 at 11:45 PM, Derek Williams de...@fyrie.net wrote:
 On Wed, Jun 13, 2012 at 9:08 PM, Greg Fausak g...@named.com wrote:
 
 Interesting.
 
 How do you do it?
 
 I have a version 2 CF, that works fine.
 A version 3 table won't let me invent columns that
 don't exist yet. (for composite tables).  What's the trick?
 
 
 You are able to get the same behaviour as non cql by doing something like
 this:
 
 CREATE TABLE mytable (
   id bigint,
   name text,
   value text,
   PRIMARY KEY (id, name)
 ) WITH COMPACT STORAGE;
 
 This table will work exactly like a standard column family with no defined
 columns. For example:
 
 cqlsh:testing INSERT INTO mytable (id, name, value) VALUES (1, 'firstname',
 'Alice');
 cqlsh:testing INSERT INTO mytable (id, name, value) VALUES (1, 'email',
 'al...@example.org');
 cqlsh:testing INSERT INTO mytable (id, name, value) VALUES (2, 'firstname',
 'Bob');
 cqlsh:testing INSERT INTO mytable (id, name, value) VALUES (2, 'webpage',
 'http://bob.example.org');
 cqlsh:testing INSERT INTO mytable (id, name, value) VALUES (2, 'email',
 'b...@example.org');
 
 cqlsh:testing SELECT name, value FROM mytable WHERE id = 2;
  name  | value
 ---+
  email |b...@example.org
  firstname |Bob
webpage | http://bob.example.org
 
 Not very exciting, but when you take a look with cassandra-cli:
 
 [default@testing] get mytable[2];
 = (column=email, value=b...@example.org, timestamp=1339648270284000)
 = (column=firstname, value=Bob, timestamp=1339648270275000)
 = (column=webpage, value=http://bob.example.org,
 timestamp=133964827028)
 Returned 3 results.
 Elapsed time: 11 msec(s).
 
 which is exactly what you would expect from a normal cassandra column
 family.
 
 So the trick is to separate your static columns and your dynamic columns
 into separate column families. Column names and types can of course be
 something different then my example, and inserts can be done within a
 'BATCH' to avoid multiple round trips.
 
 Also, I'm not trying to advocate this as being a better solution then just
 using the old thrift interface, I'm just showing an example of how to do it.
 I personally do prefer this way as it is more predictable, but of course
 others will have a different opinion.
 
 --
 Derek Williams
 



Re: Random slow connects.

2012-06-17 Thread aaron morton
You could also try adding some logging in the client to track down the exactly 
where the delay is. If it is in waiting for the socket to open on the server or 
say managing the connection client side.

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/06/2012, at 4:51 AM, Tyler Hobbs wrote:

 As a random guess, you might want to check your open file descriptor limit on 
 the C* servers.  Use cat /proc/pid/limits, where pid is the pid of the 
 Cassandra process; it's the most reliable way to check this.
 
 On Thu, Jun 14, 2012 at 10:43 AM, Henrik Schröder skro...@gmail.com wrote:
 Hi Mina,
 
 The delay is not constant, in the absolute majority of cases, connecting is 
 almost instant, but occasionally, connecting to a server takes a few seconds.
 
 We can't even reproduce it reliably, we can see in our server logs that 
 sometimes, maybe a few times a day, maybe once every few days, a cassandra 
 server will be slow in accepting connections, and after a little while 
 everything will be ok again. It's not a network saturation error, it's not a 
 CPU saturation error. Not even GC pauses.
 
 Has anyone else noticed something similar? Or is this simply a result of us 
 running a tight connection pool which recycles connections every few hours 
 and only waits a few seconds for a connection before timing out?
 
 
 /Henrik
 
 
 On Thu, Jun 14, 2012 at 4:54 PM, Mina Naguib mina.nag...@bloomdigital.com 
 wrote:
 
 On 2012-06-14, at 10:38 AM, Henrik Schröder wrote:
 
  Hi everyone,
 
  We have problem with our Cassandra cluster, and that is that sometimes it 
  takes several seconds to open a new Thrift connection to the server. We've 
  had this issue when we ran on windows, and we have this issue now that we 
  run on Ubuntu. We've had it with our old networking setup, and we have it 
  with our new networking setup where we're running it over a dedicated 
  gigabit network. Normally estabishing a new connection is instant, but once 
  in a while it seems like it's not accepting any new connections until three 
  seconds have passed.
 
  We're of course running a connection-pooling client which mitigates this, 
  since once a connection is established, it's rock solid.
 
  We tried switching the rpc_server_type to hsha, but that seems to have made 
  the problem worse, we're seeing more connection timeouts because of this.
 
  For what it's woth, we're running Cassandra version 1.0.10 on Ubuntu, and 
  our connection pool is configured to abort a connection attempt after two 
  seconds, and each connection lives for six hours and then it's recycled. 
  Under current load we do about 500 writes/s and 100 reads/s, we have 20 
  clients, but each has a very small connection pool of maybe up to 5 
  simultaneous connections against each Cassandra server. We see these 
  connection issues maybe once a day, but always at random intervals.
 
  We've tried to get more information through Datastax Opscenter, the JMX 
  console, and our own application monitoring and logging, but we can't see 
  anything out of the ordinary. Sometimes, seemingly by random, it's just 
  really slow to connect. We're all out of ideas. Does anyone here have 
  suggestions on where to look and what to do next?
 
 Have you ironed out non-cassandra potential causes ?
 
 3 seconds constantly sounds it could be a timeout/retry somewhere.  Do you 
 contact cassandra via a hostname or IP address ?  If via hostname, iron out 
 DNS.
 
 Either way, I'd fire up tcpdump, both on both the client and the server, and 
 observe the TCP handshake.  Specifically see if the SYN packet is sent and 
 received, whether the SYN-ACK is sent back right away and received, and final 
 ACK.
 
 If that looks good, then TCP-wise you're in good shape and the problem is in 
 a higher layer (thrift).  If not, see where the delay/drop/retry happens.  If 
 it's in the first packet, it may be a networking/routing issue.  If in the 
 second, it may me capacity at the server (investigate with lsof/netstat/JMX), 
 etc..
 
 
 
 
 
 
 -- 
 Tyler Hobbs
 DataStax
 



Re: cql 3 qualification failing?

2012-06-17 Thread aaron morton
 So, my main primary key is on the ac_c column, text, and
 the secondary composite key is on ac_creation, which is a date.  These
 queries perform correctly:
In your CF there is only one primary key (aka row key), it is a composite of 
ac_c and ac_creation.

 select * from at_event_ac_c where ac_c = '1234' and ac_creation 
 '2012-07-15' and ac_creation  '2012-07-18'
   and ev_sev = 2;

See 
WHERE clauses can include greater-than and less-than comparisons on columns 
other than the first. As long as all previous key-component columns have 
already been identified with strict = comparisons, the last given key component 
column can be any sort of comparison. 
http://www.datastax.com/docs/1.1/references/cql/SELECT

You need to specify both the ac_c and ac_creation values using equality, or 
specify neither. 

Hope that helps. 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/06/2012, at 5:04 AM, Greg Fausak wrote:

 I have playing around with composite CFs, I have one declared:
 
 create columnfamily
at_event_ac_c
 (
ac_event_id int,
ac_creation timestamp,
ac_action text,
ac_addr text,
ac_advisory_id text,
ac_c text,
 ...
ev_sev text,
 ...
ev_total text,
ev_url text,
ev_used text,
toast text,
fw text,
name text,
resp text,
size text,
PRIMARY KEY (ac_c, ac_creation)
 ) with compression_parameters:sstable_compression = '';
 
 So, my main primary key is on the ac_c column, text, and
 the secondary composite key is on ac_creation, which is a date.  These
 queries perform correctly:
 
 select * from at_event_ac_c where ac_c = '1234';
 
 select * from at_event_ac_c where ac_c = '1234' and ac_creation 
 '2012-07-15' and ac_creation  '2012-07-18';
 
 What's weird is I can't qualify on a non-indexed column, like:
 
 select * from at_event_ac_c where ac_c = '1234' and ac_creation 
 '2012-07-15' and ac_creation  '2012-07-18'
   and ev_sev = 2;
 
 I get an error:
 Bad Request: No indexed columns present in by-columns clause with Equal 
 operator
 
 But, I just attended a class on this.  I thought that once I used my
 indices the remaining qualifications would be satisfied via a filtering
 method.  Obviously this is incorrect.  Is there a way to 'filter' results?
 
 -g



Re: 48 character cap on Keyspace + CF name length?

2012-06-17 Thread aaron morton
Has to do with the file name length on windows 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/Schema.java#L49


Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/06/2012, at 6:32 AM, Tharindu Mathew wrote:

 Hi,
 
 Wonder why this cap is in place? We are experimenting on some CF names with 
 UUIDs and hit this issue.
 
 -- 
 Regards,
 
 Tharindu
 
 blog: http://mackiemathew.com/
 



Re: Limited row cache size

2012-06-17 Thread aaron morton
cassandra 1.1.1 ships with concurrentlinkedhashmap-lru-1.3.jar

row_cache_size_in_mb starts life as an int but the byte size is stored as a 
long 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/CacheService.java#L143

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 15/06/2012, at 7:13 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

 hi,
 I configured my server with a row_cache_size_in_mb : 1920
 
 When  started the server and checked  the JMX it shows the capacity is
 set to 1024MB .
 
 I investigated further and found that the version of
 concurrentlruhashmap used is 1.2 which sets capacity max value to 1GB.
 
 So, in cassandra 1.1 the max cache size I can use is 1GB
 
 
 Digging deeper , I realized that throughout the API chain the cache
 size is passed around as an int so even if I write my own
 CacheProvider the max size would be Integer.MAX_VALUE = 2GB
 
 unless cassandra changes the version of concurrentlruhashmap to 1.3
 and change the signature to use a long for size, we can't have a big
 cache. according to me 1 GB is a really small size.
 
 So , even if I have bigger machines I can't really use them
 
 
 
 -- 
 -
 Noble Paul



Re: Problem with streaming with sstableloader into ubuntu node

2012-06-17 Thread aaron morton
Cross platform clusters are not really supported. 

That said it sounds like a bug. If you can create some steps to reproduce it 
please create a ticket here https://issues.apache.org/jira/browse/CASSANDRA it 
may get looked it. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/06/2012, at 12:41 AM, Nury Redjepow wrote:

 Good day, everyone
 
 We are using sstableloader to bulk insert data into cassandra. 
 
 Script is executed on developers machine with Windows to Single Node 
 Cassandra. 
 
 %JAVA_HOME%\bin\java -ea -cp %CASSANDRA_CLASSPATH% -Xmx256M 
 -Dlog4j.configuration=log4j-tools.properties 
 org.apache.cassandra.tools.BulkLoader -d 10.0.3.37 --debug -v 
 DestinationPrices/PricesByHotel 
 
 This works fine if destination cassandra is working under windows, but 
 doesn't work with ubuntu instance. Cli is able to connect, but sstable seem 
 to have problem with keyspace name. Logs in ubuntu instance show error 
 messages like:
 
 ERROR [Thread-41] 2012-06-15 16:05:47,620 AbstractCassandraDaemon.java (line 
 134) Exception in thread Thread[Thread-41,5,main]
 java.lang.AssertionError: Unknown keyspace 
 DestinationPrices\PricesByHotel\DestinationPrices
 
 
 In our schema we have keyspace DestinationPrices, and column family 
 PricesByHotel. Somehow it's not accepted properly.
 
 So my question is, how should I specify keyspace name in command, to make it 
 work correctly with Ubuntu?
 



Re: Cassandra error while processing message

2012-06-17 Thread aaron morton
Check are using framed transport on the client. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/06/2012, at 2:40 AM, Jim Ancona wrote:

 It's hard to tell exactly what happened--are there other messages in your 
 client log before the All host pools marked down? Also, how many nodes are 
 there in your cluster? I suspect that the Thrift protocol error was 
 (incorrectly) retried by Hector, leading to the All host pools marked down, 
 but without more info that's just a guess.
 
 Jim
 
 On Thu, Jun 14, 2012 at 4:48 AM, Tiwari, Dushyant 
 dushyant.tiw...@morganstanley.com wrote:
 Hector : 1.0.0.1
 
 Cassandra: 1.0.3
 
  
 

 
 From: Tiwari, Dushyant (ISGT) 
 Sent: Thursday, June 14, 2012 2:16 PM
 To: user@cassandra.apache.org
 Subject: Cassandra error while processing message
 
  
 
 Hi All,
 
  
 
 Help needed on the following front.
 
  
 
 In my Cassandra node logs I can see the following error:
 
  
 
 CustomTThreadPoolServer.java (line 201) Thrift error occurred during 
 processing of message.
 
 org.apache.thrift.protocol.TProtocolException: Missing version in 
 readMessageBegin, old client?
 
 at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:213)
 
 at 
 org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
 
 at 
 org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
 
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 
 at java.lang.Thread.run(Thread.java:619)
 
  
 
 In Hector client :
 
  
 
 Caused by: me.prettyprint.hector.api.exceptions.HectorException: All host 
 pools marked down. Retry burden pushed out to client.
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.connection.HConnectionManager.getClientFromLBPolicy(HConnectionManager.java:343)
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:225)
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102)
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108)
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:248)
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:245)
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:245)
 
  
 
  
 
 After some time a null pointer exception
 
 Caused by: java.lang.NullPointerException
 
 [gsc][5/8454]   at 
 me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:243)
 
  
 
 Can someone please explain what is happening and how can I rectify it.
 
  
 
  
 
 Dushyant
 
 NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions 
 or views contained herein are not intended to be, and do not constitute, 
 advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform 
 and Consumer Protection Act. If you have received this communication in 
 error, please destroy all electronic and paper copies and notify the sender 
 immediately. Mistransmission is not intended to waive confidentiality or 
 privilege. Morgan Stanley reserves the right, to the extent permitted under 
 applicable law, to monitor electronic communications. This message is subject 
 to terms available at the following link: 
 http://www.morganstanley.com/disclaimers. If you cannot access these links, 
 please notify us by reply message and we will send the contents to you. By 
 messaging with Morgan Stanley you consent to the foregoing.
 



Re: Cassandra atomicity/isolation/transaction in multithread counter updates

2012-06-17 Thread aaron morton
 I'm in a pseudo-deadlock 
BOOM BOOM ! :)

  (N.B. The updates requires a read of current value before the update write. 
 Otherwise counter column can be used, but in my opinion the problem still 
 remain).
Writes in the cassandra server do not require a read. 

 My simple question is: what happens when two (or more) threads try to update 
 (increment) the same integer column value of the same row in a column family?
Multiple values for the same column are deterministically resolved. So actual 
order of the interleaving on the server side does not matter. 

Either thread in your example will compare the column A it's trying to write 
with what is in the memtable. The columns are then resolved as:
* deletes with a higher time stamp wins
* next the column instance with the highest timestamp wins. 
* finally the column instance with the greater byte value wins

In 1.1 the threads then try to put their shadow copy of the data that was in 
the memtable back. If it's changed they get it again and try the write.

if two write threads start at the same time and try to apply their change to 
the memtable at the (roughly) same time, one will win and the other will redo 
the write in memory. The order this occurs in is irrelevant. 

Cheers
  
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 16/06/2012, at 7:37 PM, Manuel Peli wrote:

 I'm in a pseudo-deadlock about Cassandra and atomicity/isolation/transaction 
 arguments. My simple question is: what happens when two (or more) threads try 
 to update (increment) the same integer column value of the same row in a 
 column family? I've read something about row-level isolation, but I don't 
 sure that is managed properly. Any suggestions? (N.B. The updates requires a 
 read of current value before the update write. Otherwise counter column can 
 be used, but in my opinion the problem still remain).
 
 My personal idea is described next. Because it's a real time analytics 
 application, the counter updates are inherent only the current hour, while 
 previous hours still remain the same. So I think that one way to avoid the 
 problem should be to use a RDBMS layer for current updates (which support 
 ACID properties) and when the hour expires consolidate data on Cassandra. 
 It's right?
 
 Also in the case of RDBMS layer still remain the transaction problem: some 
 update on different column family are correlated and if even one fails a 
 rollback is needed. I know that Cassandra doesn't support transactions, but I 
 think that, playing with replication factor and write/read levels the problem 
 can be mitigated, eventually implementing an application level 
 commit/rollback. I read something about Zookeeper, but I guess that add 
 complexity and latency.



Re: Unbalanced ring in Cassandra 0.8.4

2012-06-17 Thread aaron morton
Assuming you have been running repair, it' can't hurt. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 17/06/2012, at 4:06 AM, Raj N wrote:

 Nick, do you think I should still run cleanup on the first node.
 
 -Rajesh
 
 On Fri, Jun 15, 2012 at 3:47 PM, Raj N raj.cassan...@gmail.com wrote:
 I did run nodetool move. But that was when I was setting up the cluster which 
 means I didn't have any data at that time.
 
 -Raj
 
 
 On Fri, Jun 15, 2012 at 1:29 PM, Nick Bailey n...@datastax.com wrote:
 Did you start all your nodes at the correct tokens or did you balance
 by moving them? Moving nodes around won't delete unneeded data after
 the move is done.
 
 Try running 'nodetool cleanup' on all of your nodes.
 
 On Fri, Jun 15, 2012 at 12:24 PM, Raj N raj.cassan...@gmail.com wrote:
  Actually I am not worried about the percentage. Its the data I am concerned
  about. Look at the first node. It has 102.07GB data. And the other nodes
  have around 60 GB(one has 69, but lets ignore that one). I am not
  understanding why the first node has almost double the data.
 
  Thanks
  -Raj
 
 
  On Fri, Jun 15, 2012 at 11:06 AM, Nick Bailey n...@datastax.com wrote:
 
  This is just a known problem with the nodetool output and multiple
  DCs. Your configuration is correct. The problem with nodetool is fixed
  in 1.1.1
 
  https://issues.apache.org/jira/browse/CASSANDRA-3412
 
  On Fri, Jun 15, 2012 at 9:59 AM, Raj N raj.cassan...@gmail.com wrote:
   Hi experts,
   I have a 6 node cluster across 2 DCs(DC1:3, DC2:3). I have assigned
   tokens using the first strategy(adding 1) mentioned here -
  
   http://wiki.apache.org/cassandra/Operations?#Token_selection
  
   But when I run nodetool ring on my cluster, this is the result I get -
  
   Address DC  Rack  Status State   LoadOwnsToken
  
113427455640312814857969558651062452225
   172.17.72.91DC1 RAC13 Up Normal  102.07 GB   33.33%  0
   45.10.80.144DC2 RAC5  Up Normal  59.1 GB 0.00%   1
   172.17.72.93DC1 RAC18 Up Normal  59.57 GB33.33%
56713727820156407428984779325531226112
   45.10.80.146DC2 RAC7  Up Normal  59.64 GB0.00%
   56713727820156407428984779325531226113
   172.17.72.95DC1 RAC19 Up Normal  69.58 GB33.33%
113427455640312814857969558651062452224
   45.10.80.148DC2 RAC9  Up Normal  59.31 GB0.00%
   113427455640312814857969558651062452225
  
  
   As you can see the first node has considerably more load than the
   others(almost double) which is surprising since all these are replicas
   of
   each other. I am running Cassandra 0.8.4. Is there an explanation for
   this
   behaviour? Could https://issues.apache.org/jira/browse/CASSANDRA-2433 be
   the
   cause for this?
  
   Thanks
   -Raj
 
 
 
 



Re: Cassandra out of Heap memory

2012-06-17 Thread rohit bhatia
I am using 1.0.5 . The logs suggest that it was one single instance of
failure and I'm unable to reproduce it.
From the logs, In a span of 30 seconds, heap usage went from 4.8 gb to
8.8 gb With stop-the-world gc running 20 times. I believe that parNew
was unable to clean up memory due to some problem. I would report if I
am able to reproduce this failure.

On Mon, Jun 18, 2012 at 6:14 AM, aaron morton aa...@thelastpickle.com wrote:
 Not commenting on the GC advice but Cassandra memory usage has improved a
 lot since that was written. I would take a look at what was happening and
 see if tweeking Cassandra config helped before modifying GC settings.

 GCInspector.java(line 88): Heap is .9934 full. Is this expected? or
 should I adjust my flush_largest_memtable_at variable.

 flush_largetsmemtable_at is a a safety valve only. Reducing it may help avid
 OOM, by it will not treat the cause.

 What version are you using ?

 1.0.0 had a an issue where deletes were not taken into consideration
 (https://github.com/apache/cassandra/blob/trunk/CHANGES.txt#L33) but this
 does not sound like the same problem.

 Take a look in the logs on the machine and see if it was associated with a
 compaction or repair operation.

 I would also consider experimenting on one node with 8GB / 800MB heap sizes.
 More is not always better.


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 14/06/2012, at 8:05 PM, rohit bhatia wrote:

 Looking at http://blog.mikiobraun.de/2010/08/cassandra-gc-tuning.html
 and server logs, I think my situation is this

 The default cassandra settings has the highest peak heap usage. The
 problem with this is that it raises the possibility that during the
 CMS cycle, a collection of the young generation runs out of memory to
 migrate objects to the old generation (a so-called concurrent mode
 failure), leading to stop-the-world full garbage collection. However,
 with a slightly lower setting of the CMS threshold, we get a bit more
 headroom, and more stable overall performance.

 I see concurrentMarkSweep system.log Entries trying to gc 2-4 collections.

 Any suggestions for preemptive measure for this would be welcome.




Re: GCInspector works every 10 seconds!

2012-06-17 Thread aaron morton
   It is also strange that although no data in Cassandra can fulfill the query 
 conditions, but it takes more time if we have more data in Cassandra.

These log messages:
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f63408920e049c22:true:4@1339865451865018
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f63408a0eeab052a:true:4@1339865451866000
Say that the slice query read columns from the disk that were deleted. 

Have you tried your test with a clean (no files on disk) database ?


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/06/2012, at 12:36 AM, Jason Tang wrote:

 Hi
 
After I change log level to DEBUG, I found some log.
 
   Although we don't have traffic to Cassandra, but we have scheduled the task 
 to perform the sliceQuery. 
 
   We use time-stamp as the index, we will perform the query by every second 
 to check if we have tasks to do.
 
   After 24 hours, we have 40G data in Cassandra, and we configure Casssandra 
 as Max JVM Heap 6G, memtable 1G, disk_access_mode: mmap_index_only.
 
   It is also strange that although no data in Cassandra can fulfill the query 
 conditions, but it takes more time if we have more data in Cassandra.
 
   Because we total have 20 million records in Cassandra which has time stamp 
 as the index, and we query by MultigetSubSliceQuery, and set the range the 
 value which not match any data in Cassnadra, So it suppose to return fast, 
 but as we have 20 million data, it takes 2 seconds to get the query result.
 
   Is the GC caused by the scheduled query operation, and why it takes so many 
 memory. Could we improve it?
 
 System.log:
  INFO [ScheduledTasks:1] 2012-06-17 20:17:13,574 GCInspector.java (line 123) 
 GC for ParNew: 559 ms for 1 collections, 3258240912 used; max
 is 6274678784
 DEBUG [ReadStage:99] 2012-06-17 20:17:25,563 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 0138ad1035880137f3372f3e0e28e3b6:false:36@1339815309124015
 DEBUG [ReadStage:99] 2012-06-17 20:17:25,565 ReadVerbHandler.java (line 60) 
 Read key 3331; sending response to 158060445@/192.168.0.3
 DEBUG [ReadStage:96] 2012-06-17 20:17:25,845 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 0138ad1035880137f33a80cf6cb5d383:false:36@1339815526613007
 DEBUG [ReadStage:96] 2012-06-17 20:17:25,847 ReadVerbHandler.java (line 60) 
 Read key 3233; sending response to 158060447@/192.168.0.3
 DEBUG [ReadStage:105] 2012-06-17 20:17:25,952 SliceQueryFilter.java (line 
 123) collecting 0 of 5000: 
 0138ad1035880137f330cd70c86690cd:false:36@1339814890872015
 DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line 75) 
 digest is d41d8cd98f00b204e9800998ecf8427e
 DEBUG [ReadStage:105] 2012-06-17 20:17:25,953 ReadVerbHandler.java (line 60) 
 Read key 3139; sending response to 158060448@/192.168.0.3
 DEBUG [ReadStage:89] 2012-06-17 20:17:25,959 CollationController.java (line 
 191) collectAllData
 DEBUG [ReadStage:108] 2012-06-17 20:17:25,959 CollationController.java (line 
 191) collectAllData
 DEBUG [ReadStage:107] 2012-06-17 20:17:25,959 CollationController.java (line 
 191) collectAllData
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f63408920e049c22:true:4@1339865451865018
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,958 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f63408a0eeab052a:true:4@1339865451866000
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f63408b1319577c9:true:4@1339865451867003
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f63408c081e0b8a3:true:4@1339865451867004
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f6340deefb8a0627:true:4@1339865451920001
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f6340df9c21e9979:true:4@1339865451923002
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,959 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f6340e095ead1498:true:4@1339865451928000
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f6340e1af16cf151:true:4@1339865451935000
 DEBUG [ReadStage:89] 2012-06-17 20:17:26,960 SliceQueryFilter.java (line 123) 
 collecting 0 of 5000: 
 7fff0137f6340e396cfdc9fa:true:4@133986545195
 
 
 BRs
 //Ares
 
 2012/6/17 Jason Tang ares.t...@gmail.com
 Hi
 
After running load testing for 24 hours(insert, update 

Re: Question on SSTable Configuration Parameters......

2012-06-17 Thread aaron morton
 Where can I find parameters to control the SSTable files - e.g. their min/max 
 sizes, etc.
It's not normally something you need to worry about. 

The initial size of the files is not really controlled by settings. The data is 
flushed to disk when either the commit log reaches a certain size, or when a 
certain amount of memory is used by the memtables. 

After that it's up to the compaction strategy. The default sized tiered just 
collects files together of similar sizes. The Levelled Compaction strategy has 
some more settings 
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra

 Is that because the examples did not have much data or is that the case even 
 when you have hundreds of GB of data for a column family on a node in a 
 cluster?
The former.

 Also, are incremental backups possible ? Where can I find examples of that?

http://www.datastax.com/docs/1.1/operations/index

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 18/06/2012, at 12:54 PM, Jayesh Thakrar wrote:

 Hi All,
 
 I am a getting started with Cassandra and have been reading the O'Reilly Book 
 and some other documentation.
 I understand that data is persisted in SSTable files.
 
 Where can I find parameters to control the SSTable files - e.g. their min/max 
 sizes, etc.
 I looked up http://wiki.apache.org/cassandra/StorageConfiguration and some 
 other places do not find any such parameters.
 
 Also, when reading the book and some other examples on backups, it seems that 
 when a column family is backed up, its all contained in a single data file.
 Is that because the examples did not have much data or is that the case even 
 when you have hundreds of GB of data for a column family on a node in a 
 cluster?
 
 Also, are incremental backups possible ? Where can I find examples of that?
 
 Thanks a lot in advance,
 
 Jayesh Thakrar



Re: 48 character cap on Keyspace + CF name length?

2012-06-17 Thread Tharindu Mathew
Oh, the world of windows... sigh.

Thanks Aaron for the pointer.

On Mon, Jun 18, 2012 at 8:02 AM, aaron morton aa...@thelastpickle.comwrote:

 Has to do with the file name length on windows
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/config/Schema.java#L49


 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 15/06/2012, at 6:32 AM, Tharindu Mathew wrote:

 Hi,

 Wonder why this cap is in place? We are experimenting on some CF names
 with UUIDs and hit this issue.

 --
 Regards,

 Tharindu

 blog: http://mackiemathew.com/





-- 
Regards,

Tharindu

blog: http://mackiemathew.com/