Re: Correct way to set strategy options in cqlsh?

2012-05-23 Thread Romain HARDOUIN
You *must* remove the hyphen.
According to the csql 2.0 documentation, here is the correct syntax to 
create keyspace:

createKeyspaceStatement ::= CREATE KEYSPACE name
 WITH optionName = optionVal
 ( AND optionName = optionVal )*
;
optionName ::= identifier
   | optionName : identifier
   | optionName : integer
   ;
optionVal ::= stringLiteral
  | identifier
  | integer
  ;

The string strategy_options:us-west=1; matches the following syntax:

optionName : identifier = integer

Thus, us-west is an *identifier*, and again according to the 
documentation:
An identifier is a letter followed by any sequence of letters, digits, 
or the underscore (_).

RE: supercolumns with TTL columns not being compacted correctly

2012-05-23 Thread Pieter Callewaert
Hi,

This means I got a serious flaw in my column family design.
At this moment I am storing sensor data into the database, rowkey is the sensor 
ID, supercolumn is the timestamp, and the different columns in the supercolumn 
are sensor readings.

This means with my current design it is almost impossible to ‘delete’ data from 
disk unless doing a major compaction on all the sstables ? (all sstables will 
contain the same rowkey)
And at this moment new data is being loaded every 5 minutes, which means it 
would be big troubles to do the major compaction.

Is this correct what I am thinking?

Kind regards,
Pieter Callewaert

From: Yuki Morishita [mailto:mor.y...@gmail.com]
Sent: dinsdag 22 mei 2012 16:21
To: user@cassandra.apache.org
Subject: Re: supercolumns with TTL columns not being compacted correctly

Data will not be deleted when those keys appear in other stables outside of 
compaction. This is to prevent obsolete data from appearing again.

yuki


On Tuesday, May 22, 2012 at 7:37 AM, Pieter Callewaert wrote:

Hi Samal,



Thanks for your time looking into this.



I force the compaction by using forceUserDefinedCompaction on only that 
particular sstable. This gurantees me the new sstable being written only 
contains the data from the old sstable.

The data in the sstable is more than 31 days old and gc_grace is 0, but still 
the data from the sstable is being written to the new one, while I am 100% sure 
all the data is invalid.



Kind regards,

Pieter Callewaert



From: samal [mailto:samalgo...@gmail.com]
Sent: dinsdag 22 mei 2012 14:33
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: supercolumns with TTL columns not being compacted correctly



Data will remain till next compaction but won't be available. Compaction will 
delete old sstable create new one.

On 22-May-2012 5:47 PM, Pieter Callewaert 
pieter.callewa...@be-mobile.bemailto:pieter.callewa...@be-mobile.be wrote:

Hi,



I’ve had my suspicions some months, but I think I am sure about it.

Data is being written by the SSTableSimpleUnsortedWriter and loaded by the 
sstableloader.

The data should be alive for 31 days, so I use the following logic:



int ttl = 2678400;

long timestamp = System.currentTimeMillis() * 1000;

long expirationTimestampMS = (long) ((timestamp / 1000) + ((long) ttl * 1000));



And using this to write it:



sstableWriter.newRow(bytes(entry.idhttp://entry.id));

sstableWriter.newSuperColumn(bytes(superColumn));

sstableWriter.addExpiringColumn(nameTT, bytes(entry.aggregatedTTMs), timestamp, 
ttl, expirationTimestampMS);

sstableWriter.addExpiringColumn(nameCov, bytes(entry.observationCoverage), 
timestamp, ttl, expirationTimestampMS);

sstableWriter.addExpiringColumn(nameSpd, bytes(entry.speed), timestamp, ttl, 
expirationTimestampMS);



This works perfectly, data can be queried until 31 days are passed, then no 
results are given, as expected.

But the data is still on disk until the sstables are being recompacted:



One of our nodes (we got 6 total) has the following sstables:

[cassandra@bemobile-cass3 ~]$ ls -hal /data/MapData007/HOS-* | grep G

-rw-rw-r--. 1 cassandra cassandra 103G May  3 03:19 
/data/MapData007/HOS-hc-125620-Data.db

-rw-rw-r--. 1 cassandra cassandra 103G May 12 21:17 
/data/MapData007/HOS-hc-163141-Data.db

-rw-rw-r--. 1 cassandra cassandra  25G May 15 06:17 
/data/MapData007/HOS-hc-172106-Data.db

-rw-rw-r--. 1 cassandra cassandra  25G May 17 19:50 
/data/MapData007/HOS-hc-181902-Data.db

-rw-rw-r--. 1 cassandra cassandra  21G May 21 07:37 
/data/MapData007/HOS-hc-191448-Data.db

-rw-rw-r--. 1 cassandra cassandra 6.5G May 21 17:41 
/data/MapData007/HOS-hc-193842-Data.db

-rw-rw-r--. 1 cassandra cassandra 5.8G May 22 11:03 
/data/MapData007/HOS-hc-196210-Data.db

-rw-rw-r--. 1 cassandra cassandra 1.4G May 22 13:20 
/data/MapData007/HOS-hc-196779-Data.db

-rw-rw-r--. 1 cassandra cassandra 401G Apr 16 08:33 
/data/MapData007/HOS-hc-58572-Data.db

-rw-rw-r--. 1 cassandra cassandra 169G Apr 16 17:59 
/data/MapData007/HOS-hc-61630-Data.db

-rw-rw-r--. 1 cassandra cassandra 173G Apr 17 03:46 
/data/MapData007/HOS-hc-63857-Data.db

-rw-rw-r--. 1 cassandra cassandra 105G Apr 23 06:41 
/data/MapData007/HOS-hc-87900-Data.db



As you can see, the following files should be invalid:

/data/MapData007/HOS-hc-58572-Data.db

/data/MapData007/HOS-hc-61630-Data.db

/data/MapData007/HOS-hc-63857-Data.db



Because they are all written more than an moth ago. gc_grace is 0 so this 
should also not be a problem.



As a test, I use forceUserSpecifiedCompaction on the HOS-hc-61630-Data.db.

Expected behavior should be an empty file is being written because all data in 
the sstable should be invalid:



Compactionstats is giving:

compaction typekeyspace   column family bytes compacted bytes total 
 progress

   Compaction  MapData007 HOS 11518215662
532355279724 2.16%



And when I ls the directory I find this:

-rw-rw-r--. 1 

Re: Exception when truncate

2012-05-23 Thread ruslan usifov
It's look s very strange but yes. Now i can't reproduce this

2012/5/22 aaron morton aa...@thelastpickle.com:
 The first part of the name is the current system time in milliseconds.

 If you run it twice do you get log messages about failing to create the same
 directory twice ?

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 21/05/2012, at 5:09 AM, ruslan usifov wrote:

 I think as you, but this is not true, there are not any permissions
 issue. And as i said before, cassandra try to create directory for
 snapshort that already exists

 2012/5/19 Jonathan Ellis jbel...@gmail.com:

 Sounds like you have a permissions problem.  Cassandra creates a

 subdirectory for each snapshot.


 On Thu, May 17, 2012 at 4:57 AM, ruslan usifov ruslan.usi...@gmail.com
 wrote:

 Hello


 I have follow situation on our test server:


 from cassandra-cli i try to use


 truncate purchase_history;


 3 times i got:


 [default@township_6waves] truncate purchase_history;

 null

 UnavailableException()

        at
 org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.java:20212)

        at
 org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.java:1077)

        at
 org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1052)

        at
 org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445)

        at
 org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:272)

        at
 org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.java:220)

        at org.apache.cassandra.cli.CliMain.main(CliMain.java:348)



 So this looks that truncate goes very slow and too long, than

 rpc_timeout_in_ms: 1 (this can happens because we have very slow

 disck on test machine)


 But in in cassandra system log i see follow exception:



 ERROR [MutationStage:7022] 2012-05-17 12:19:14,356

 AbstractCassandraDaemon.java (line 139) Fatal exception in thread

 Thread[MutationStage:7022,5,main]

 java.io.IOError: java.io.IOException: unable to mkdirs

 /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history

        at
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1433)

        at
 org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1462)

        at
 org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1657)

        at
 org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:50)

        at
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)

        at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

 Caused by: java.io.IOException: unable to mkdirs

 /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-purchase_history

        at
 org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:140)

        at
 org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:131)

        at
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(ColumnFamilyStore.java:1409)

        ... 7 more



 Also i see that in snapshort dir already exists

 1337242754356-purchase_history directory, so i think that snapshort

 names that generate cassandra not uniquely.


 PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS




 --

 Jonathan Ellis

 Project Chair, Apache Cassandra

 co-founder of DataStax, the source for professional Cassandra support

 http://www.datastax.com




Re: Cassandra 0.8.5: Column name mystery in create column family command

2012-05-23 Thread aaron morton
When you say

  comparator=BytesType

You are telling cassandra that the column names in the CF's are just bytes. But 
when you create the column meta data you are specifying the column names as 
strings. 

use UTF8Type as the comparator. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 22/05/2012, at 11:09 PM, Roshan Dawrani wrote:

 Hi,
 
 I use Cassandra 0.8.5 and am suddenly noticing some strange behavior. I run a 
 create column family command with some column meta-data and it runs fine, 
 but when I do describe keyspace, it shows me different column names for 
 those index columns.
 
 a) Here is what I run: 
 create column family UserTemplate with comparator=BytesType and 
 column_metadata=[{column_name: userid, validation_class: UTF8Type, 
 index_type: KEYS, index_name: TemplateUserIdIdx}, {column_name: type, 
 validation_class: UTF8Type, index_type: KEYS, index_name: TemplateTypeIdx}];
 
 b) This is what describe keyspace shows:
 ColumnFamily: UserTemplate
   Key Validation Class: org.apache.cassandra.db.marshal.BytesType
   ...
   ...
   Built indexes: [UserTemplate.TemplateTypeIdx, 
 UserTemplate.TemplateUserIdIdx]
   Column Metadata:
 Column Name: ff
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: TemplateUserIdIdx
   Index Type: KEYS
 Column Name: 0dfffaff
   Validation Class: org.apache.cassandra.db.marshal.UTF8Type
   Index Name: TemplateTypeIdx
   Index Type: KEYS
 
 Does anyone see why this must be happening? I have created many such column 
 families before and never run into this issue.
 
 -- 
 Roshan
 http://roshandawrani.wordpress.com/
 



Re: RE Ordering counters in Cassandra

2012-05-23 Thread aaron morton
 Just out of curiosity, is there any underlying architectural reason why it's 
 not possible to order a row based on its counters values? or is it something 
 that might be in the roadmap in the future?
it wouldn't work well with the consistency level. 
Also, sorting a list of values at the same time you want multiple clients to be 
modifying them would not work very well. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 12:25 AM, samal wrote:

 Secondary index is not supported for counters plus you must know column name 
 to support secondary index on regular column.
 
 On 22-May-2012 5:34 PM, Filippo Diotalevi fili...@ntoklo.com wrote:
 Thanks for all the answers, they definitely helped.
 
 Just out of curiosity, is there any underlying architectural reason why it's 
 not possible to order a row based on its counters values? or is it something 
 that might be in the roadmap in the future?
 
 -- 
 Filippo Diotalevi
 
 On Tuesday, 22 May 2012 at 08:48, Romain HARDOUIN wrote:
 
 
 I mean iterate over each column -- more precisly: *bunches of columns* using 
 slices -- and write new columns in the inversed index. 
 Tamar's data model is made for real time analysis. It's maybe overdesigned 
 for a daily ranking. 
 I agree with Samal, you should split your data across the space of tokens. 
 Only CF Ranking feeding would be affected, not the top N queries. 
 
 Filippo Diotalevi fili...@ntoklo.com a écrit sur 21/05/2012 19:05:28 :
 
  Hi Romain, 
  thanks for your suggestion. 
  
  When you say  build every day a ranking in a dedicated CF by 
  iterating over events: do you mean 
  - load all the columns for the specified row key 
  - iterate over each column, and write a new column in the inversed index 
  ? 
  
  That's my current approach, but since I have many of these wide rows
  (1 per day), the process is extremely slow as it involves moving an 
  entire row from Cassandra to client, inverting every column, and 
  sending the data back to create the inversed index. 
 



Re: Cassandra 0.8.5: Column name mystery in create column family command

2012-05-23 Thread Roshan Dawrani
On Wed, May 23, 2012 at 3:07 PM, aaron morton aa...@thelastpickle.comwrote:

 When you say

  comparator=BytesType


 You are telling cassandra that the column names in the CF's are just
 bytes. But when you create the column meta data you are specifying the
 column names as strings.

 use UTF8Type as the comparator.


Hi,

I get that now (after Samal's reply in this thread). My question is that I
have at leats 4-5 similar CFs where I earlier used BytesType comparator and
similar column names in column metadata and they all got the column names
right in the Cassandra schema changes done.

Why am I seeing the error only now with this particular CF?

Cheers,
Roshan


Re: Tuning cassandra (compactions overall)

2012-05-23 Thread aaron morton
I've not heard of anything like that in the recent versions. There were some 
issues in the early 0.8 
https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L383

If you are on a recent version can you please create a jira ticket 
https://issues.apache.org/jira/browse/CASSANDRA describing what you think 
happened. 

If you have kept the logs from the startup and can make them available please 
do. 

Thanks

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 12:42 AM, Alain RODRIGUEZ wrote:

 not sure what you mean by
 And after restarting the second one I have lost all the consistency of
 my data. All my statistics since September are totally false now in
 production
 
 Can you give some examples?
 
 After restarting my 2 nodes (one after the other), All my counters
 have become wrong. The counters values were modified by the restart.
 Let's say I had a counter column called 20120101#click that value was
 569, after the restart the value has become 751. I think that all the
 values have increased (I'm not sure) but all counters have increased
 in differents way, some values have increased a lot other just a bit.
 
 Counter are not idempotent so if the client app retries TimedOut
 requests you can get an over count. That should not result in lost
 data.
 
 Some of these counters haven't be written since September and have
 still been modified by the restart.
 
 Have you been running repair ?
 
 Yes, Repair didn't helped. I have the feeling that repairing doesn't
 work on counters.
 
 I have restored the data now, but I am afraid of restarting any node.
 I can remain in this position too long...



Re: unknown exception with hector

2012-05-23 Thread aaron morton
No sure but 

at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)

Looks like the client is not using framed transport. The server defaults to 
framed.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 5:35 AM, Deno Vichas wrote:

 could somebody clue me in to the cause of this exception?  i see these 
 randomly.
 
 AnalyzerService-2 2012-05-22 13:28:00,385 :: WARN  
 cassandra.connection.HConnectionManager  - Exception:
 me.prettyprint.hector.api.exceptions.HectorTransportException: 
 org.apache.thrift.transport.TTransportException
at 
 me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:39)
at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl$23.execute(KeyspaceServiceImpl.java:851)
at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl$23.execute(KeyspaceServiceImpl.java:840)
at 
 me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:99)
at 
 me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:243)
at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl.getColumn(KeyspaceServiceImpl.java:857)
at 
 me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:57)
at 
 me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:52)
at 
 me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at 
 me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
at 
 me.prettyprint.cassandra.model.thrift.ThriftColumnQuery.execute(ThriftColumnQuery.java:51)
at com.stocktouch.dao.StockDaoImpl.getHistorical(StockDaoImpl.java:365)
at 
 com.stocktouch.dao.StockDaoImpl.getHistoricalQuote(StockDaoImpl.java:433)
at 
 com.stocktouch.service.StockHistoryServiceImpl.getHistoricalQuote(StockHistoryServiceImpl.java:480)
at 
 com.stocktouch.service.AnalyzerServiceImpl.getClose(AnalyzerServiceImpl.java:180)
at 
 com.stocktouch.service.AnalyzerServiceImpl.calcClosingPrices(AnalyzerServiceImpl.java:90)
at 
 com.stocktouch.service.AnalyzerServiceImpl.nightlyRollup(AnalyzerServiceImpl.java:66)
at 
 com.stocktouch.service.AnalyzerServiceImpl$2.run(AnalyzerServiceImpl.java:55)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
 Caused by: org.apache.thrift.transport.TTransportException
at 
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
 org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at 
 org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at 
 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_get(Cassandra.java:509)
at org.apache.cassandra.thrift.Cassandra$Client.get(Cassandra.java:492)
at 
 me.prettyprint.cassandra.service.KeyspaceServiceImpl$23.execute(KeyspaceServiceImpl.java:846)
... 20 more
 
 
 thanks,
 deno
 
 



Re: Replication factor

2012-05-23 Thread aaron morton
RF is normally adjusted to modify availability (see 
http://thelastpickle.com/2011/06/13/Down-For-Me/)

 for example, if I have 4 nodes cluster in one data center, how can RF=2 vs 
 RF=4 affect read performance? If consistency level is ONE, looks reading does 
 not need to go to another hop to get data if RF=4, but it would do more work 
 on read repair in the background.
Read Repair does not run at CL ONE.
When RF == number of nodes, and you read at CL ONE you will always be reading 
locally. But with a low consistency.
If you read with QUORUM when RF == number of nodes you will still get some 
performance benefit from the data being read locally.

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 9:34 AM, Daning Wang wrote:

 Hello,
 
 What is the pros and cons to choose different number of replication factor in 
 term of performance? if space is not a concern.
 
 for example, if I have 4 nodes cluster in one data center, how can RF=2 vs 
 RF=4 affect read performance? If consistency level is ONE, looks reading does 
 not need to go to another hop to get data if RF=4, but it would do more work 
 on read repair in the background.
 
 Can you share some insights about this?
 
 Thanks in advance,
 
 Daning 
 



Re: Number of keyspaces

2012-05-23 Thread aaron morton
 We were thinking of doing a major compaction after each year is 'closed off'. 
Not a terrible idea. Years tend to happen annually, so their growth pattern is 
well understood. 

 This would mean that compactions for the current year were dealing with a 
 smaller amount of data and hence be faster and have less impact on a 
 day-to-day basis.
Older data is compacted into higher tiers / generations so will not be included 
when compacting new data (background 
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra). That 
said, there is a chance that at some point you the big older files get 
compacted. i.e. if you get (by default) 4 X 100GB files they will get compacted 
into 1. 

It feels a bit like a premature optimisation. 
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 1:52 PM, Franc Carter wrote:

 On Wed, May 23, 2012 at 7:42 AM, aaron morton aa...@thelastpickle.com wrote:
 1 KS with 24 CF's will use roughly the same resources as 24 KS's with 1 CF. 
 Each CF:
 
 * loads the bloom filter for each SSTable
 * samples the index for each sstable
 * uses row and key cache
 * has a current memtable and potentially memtables waiting to flush.
 * had secondary index CF's
 
 I would generally avoid a data model that calls for CF's to be added in 
 response to new entities or new data. Older data will move moved to larger 
 files, and not included in compaction for newer data.
 
 We were thinking of doing a major compaction after each year is 'closed off'. 
 This would mean that compactions for the current year were dealing with a 
 smaller amount of data and hence be faster and have less impact on a 
 day-to-day basis. Our query patterns will only infrequently cross year 
 boundaries.
 
 Are we being naive ?
 
 cheers
  
 
 Hope that helps. 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23/05/2012, at 3:31 AM, Luís Ferreira wrote:
 
 I have 24 keyspaces, each with a columns family and am considering changing 
 it to 1 keyspace with 24 CFs. Would this be beneficial?
 On May 22, 2012, at 12:56 PM, samal wrote:
 
 Not ideally, now cass has global memtable tuning. Each cf correspond to 
 memory  in ram. Year wise cf means it will be in read only state for next 
 year, memtable  will still consume ram.
 
 On 22-May-2012 5:01 PM, Franc Carter franc.car...@sirca.org.au wrote:
 On Tue, May 22, 2012 at 9:19 PM, aaron morton aa...@thelastpickle.com 
 wrote:
 It's more the number of CF's than keyspaces.
 
 Oh - does increasing the number of Column Families affect performance ?
 
 The design we are working on at the moment is considering using a Column 
 Family per year. We were thinking this would isolate compactions to a more 
 manageable size as we don't update previous years.
 
 cheers
  
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 22/05/2012, at 6:58 PM, R. Verlangen wrote:
 
 Yes, it does. However there's no real answer what's the limit: it depends 
 on your hardware and cluster configuration. 
 
 You might even want to search the archives of this mailinglist, I remember 
 this has been asked before.
 
 Cheers!
 
 2012/5/21 Luís Ferreira zamith...@gmail.com
 Hi,
 
 Does the number of keyspaces affect the overall cassandra performance?
 
 
 Cumprimentos,
 Luís Ferreira
 
 
 
 
 
 
 -- 
 With kind regards,
 
 Robin Verlangen
 www.robinverlangen.nl
 
 
 
 
 
 -- 
 Franc Carter | Systems architect | Sirca Ltd
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118 
 Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215
 
 
 Cumprimentos,
 Luís Ferreira
 
 
 
 
 
 
 
 -- 
 Franc Carter | Systems architect | Sirca Ltd
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118 
 Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215
 



Re: Confusion regarding the terms replica and replication factor

2012-05-23 Thread aaron morton
 Now if a row key hash is mapped to a range owned by a node in DC3,
 will the Node in DC3 still store the key as determined by the
 partitioner and then walk the ring and store 2 replicas each in DC1
 and DC2 ?
No, only nodes in the DC's specified in the NTS configuration will be replicas. 

 Or will the co-ordinator node be aware of the
 replica placement strategy,
 and override the partitioner's decision and walk the ring until it
 first encounters a node in DC1 or DC2 ? and then place the remaining
 replicas ?
The NTS considers each DC to have it's own ring. This can make token selection 
in a multi DC environment confusing at times. There is something in the DS docs 
about it. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 3:16 PM, java jalwa wrote:

 Hi all,
  I am a bit confused regarding the terms replica and
 replication factor. Assume that I am using RandomPartitioner and
 NetworkTopologyStrategy for replica placement.
 From what I understand, with a RandomPartitioner, a row key will
 always be hashed and be stored on the node that owns the range to
 which the key is mapped.
 http://www.datastax.com/docs/1.0/cluster_architecture/replication#networktopologystrategy.
 The example here, talks about having 2 data centers and a replication
 factor of 4 with 2 replicas in each datacenter, so the strategy is
 configured as DC1:2 and DC2:2. Now suppose I add another datacenter
 DC3, and do not change the NetworkTopologyStrategy.
 Now if a row key hash is mapped to a range owned by a node in DC3,
 will the Node in DC3 still store the key as determined by the
 partitioner and then walk the ring and store 2 replicas each in DC1
 and DC2 ? Will that mean that I will then have 5 replicas in the
 cluster and not 4 ? Or will the co-ordinator node be aware of the
 replica placement strategy,
 and override the partitioner's decision and walk the ring until it
 first encounters a node in DC1 or DC2 ? and then place the remaining
 replicas ?
 
 Thanks.



Re: how to get list of snapshots

2012-05-23 Thread aaron morton
 1) is there any good guide for scheduling backups ?
this http://www.datastax.com/docs/1.0/operations/backup_restore ?

 2) is there way to get list of snapshots ? (without ls in directory)
No. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 5:06 PM, Илья Шипицин wrote:

 Hello!
 
 I'm about to schedule backups in the following way
 
 a) snapshots are done daily
 b) increment backups are enabled
 
 so, backup will be consistent, very old snapshots must be removed (I guess, a 
 week depth should be enough).
 
 couple of questions:
 
 1) is there any good guide for scheduling backups ?
 2) is there way to get list of snapshots ? (without ls in directory)
 
 Cheers,
 Ilya Shipitsin



Re: how to get list of snapshots

2012-05-23 Thread Илья Шипицин
I seen that guide. It's missing several important things

1) ok, I can schedule snapshots using cron (snapshot's name will be
ganarated from current date)
 how can I remove snapshots older than a week ?

2) ok, I can enable increment backups. How can I remove incremental
SSTables older than 1 week ?
 it's more tricky than with snapshots.


it will lead me to several find/cron/bash scripts. Single mistake and I can
delete cassandara data entirely.

2012/5/23 aaron morton aa...@thelastpickle.com

 1) is there any good guide for scheduling backups ?

 this http://www.datastax.com/docs/1.0/operations/backup_restore ?

 2) is there way to get list of snapshots ? (without ls in directory)

 No.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/05/2012, at 5:06 PM, Илья Шипицин wrote:

 Hello!

 I'm about to schedule backups in the following way

 a) snapshots are done daily
 b) increment backups are enabled

 so, backup will be consistent, very old snapshots must be removed (I
 guess, a week depth should be enough).

 couple of questions:

 1) is there any good guide for scheduling backups ?
 2) is there way to get list of snapshots ? (without ls in directory)

 Cheers,
 Ilya Shipitsin





RE: Replication factor

2012-05-23 Thread Viktor Jevdokimov
 When RF == number of nodes, and you read at CL ONE you will always be reading 
 locally.
always be reading locally - only if Dynamic Snitch is off. With dynamic 
snitch on request may be redirected to other node, which may introduce 
latency spikes.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Wednesday, May 23, 2012 13:00
To: user@cassandra.apache.org
Subject: Re: Replication factor

RF is normally adjusted to modify availability (see 
http://thelastpickle.com/2011/06/13/Down-For-Me/)

for example, if I have 4 nodes cluster in one data center, how can RF=2 vs RF=4 
affect read performance? If consistency level is ONE, looks reading does not 
need to go to another hop to get data if RF=4, but it would do more work on 
read repair in the background.
Read Repair does not run at CL ONE.
When RF == number of nodes, and you read at CL ONE you will always be reading 
locally. But with a low consistency.
If you read with QUORUM when RF == number of nodes you will still get some 
performance benefit from the data being read locally.

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 9:34 AM, Daning Wang wrote:


Hello,

What is the pros and cons to choose different number of replication factor in 
term of performance? if space is not a concern.

for example, if I have 4 nodes cluster in one data center, how can RF=2 vs RF=4 
affect read performance? If consistency level is ONE, looks reading does not 
need to go to another hop to get data if RF=4, but it would do more work on 
read repair in the background.

Can you share some insights about this?

Thanks in advance,

Daning

inline: signature-logo7789.png

Re: Confusion regarding the terms replica and replication factor

2012-05-23 Thread java jalwa
Thanks Aaron. That makes things clear.
So I guess the 0 - 2^127 range for tokens corresponds to a cluster
-level top-level ring. and then you add some logic on top of that with
NTS to logically segment that range into sub-rings as per the notion
of data clusters defined in NTS. Whats the advantage of having a
single top-level ring ? intuitively it seems like each replication
group could have a separate ring so that the same tokens can be
assigned to nodes in different DC. If the hierarchy is Cluster -
DataCenter - Node, why exactly do we need globally unique node tokens
even though nodes are at the lowest level in the hierarchy.

Thanks again.


On Wed, May 23, 2012 at 3:14 AM, aaron morton aa...@thelastpickle.com wrote:
 Now if a row key hash is mapped to a range owned by a node in DC3,
 will the Node in DC3 still store the key as determined by the
 partitioner and then walk the ring and store 2 replicas each in DC1
 and DC2 ?
 No, only nodes in the DC's specified in the NTS configuration will be 
 replicas.

 Or will the co-ordinator node be aware of the
 replica placement strategy,
 and override the partitioner's decision and walk the ring until it
 first encounters a node in DC1 or DC2 ? and then place the remaining
 replicas ?
 The NTS considers each DC to have it's own ring. This can make token 
 selection in a multi DC environment confusing at times. There is something in 
 the DS docs about it.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/05/2012, at 3:16 PM, java jalwa wrote:

 Hi all,
              I am a bit confused regarding the terms replica and
 replication factor. Assume that I am using RandomPartitioner and
 NetworkTopologyStrategy for replica placement.
 From what I understand, with a RandomPartitioner, a row key will
 always be hashed and be stored on the node that owns the range to
 which the key is mapped.
 http://www.datastax.com/docs/1.0/cluster_architecture/replication#networktopologystrategy.
 The example here, talks about having 2 data centers and a replication
 factor of 4 with 2 replicas in each datacenter, so the strategy is
 configured as DC1:2 and DC2:2. Now suppose I add another datacenter
 DC3, and do not change the NetworkTopologyStrategy.
 Now if a row key hash is mapped to a range owned by a node in DC3,
 will the Node in DC3 still store the key as determined by the
 partitioner and then walk the ring and store 2 replicas each in DC1
 and DC2 ? Will that mean that I will then have 5 replicas in the
 cluster and not 4 ? Or will the co-ordinator node be aware of the
 replica placement strategy,
 and override the partitioner's decision and walk the ring until it
 first encounters a node in DC1 or DC2 ? and then place the remaining
 replicas ?

 Thanks.



Re: Number of keyspaces

2012-05-23 Thread Franc Carter
On Wed, May 23, 2012 at 8:09 PM, aaron morton aa...@thelastpickle.comwrote:

 We were thinking of doing a major compaction after each year is 'closed
 off'.

 Not a terrible idea. Years tend to happen annually, so their growth
 pattern is well understood.

 This would mean that compactions for the current year were dealing with a
 smaller amount of data and hence be faster and have less impact on a
 day-to-day basis.

 Older data is compacted into higher tiers / generations so will not be
 included when compacting new data (background
 http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra). That
 said, there is a chance that at some point you the big older files get
 compacted. i.e. if you get (by default) 4 X 100GB files they will get
 compacted into 1.


I'm a bit nervous about leveled compaction as it's new(ish)



 It feels a bit like a premature optimisation.


Yep, that's certainly possible - it's habit I tend towards ;-(

cheers



   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/05/2012, at 1:52 PM, Franc Carter wrote:

 On Wed, May 23, 2012 at 7:42 AM, aaron morton aa...@thelastpickle.comwrote:

 1 KS with 24 CF's will use roughly the same resources as 24 KS's with 1
 CF. Each CF:

 * loads the bloom filter for each SSTable
 * samples the index for each sstable
 * uses row and key cache
 * has a current memtable and potentially memtables waiting to flush.
 * had secondary index CF's

 I would generally avoid a data model that calls for CF's to be added in
 response to new entities or new data. Older data will move moved to larger
 files, and not included in compaction for newer data.


 We were thinking of doing a major compaction after each year is 'closed
 off'. This would mean that compactions for the current year were dealing
 with a smaller amount of data and hence be faster and have less impact on a
 day-to-day basis. Our query patterns will only infrequently cross year
 boundaries.

 Are we being naive ?

 cheers



 Hope that helps.

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 23/05/2012, at 3:31 AM, Luís Ferreira wrote:

 I have 24 keyspaces, each with a columns family and am considering
 changing it to 1 keyspace with 24 CFs. Would this be beneficial?
 On May 22, 2012, at 12:56 PM, samal wrote:

 Not ideally, now cass has global memtable tuning. Each cf correspond to
 memory  in ram. Year wise cf means it will be in read only state for next
 year, memtable  will still consume ram.
 On 22-May-2012 5:01 PM, Franc Carter franc.car...@sirca.org.au wrote:

 On Tue, May 22, 2012 at 9:19 PM, aaron morton 
 aa...@thelastpickle.comwrote:

 It's more the number of CF's than keyspaces.


 Oh - does increasing the number of Column Families affect performance ?

 The design we are working on at the moment is considering using a Column
 Family per year. We were thinking this would isolate compactions to a more
 manageable size as we don't update previous years.

 cheers



 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22/05/2012, at 6:58 PM, R. Verlangen wrote:

 Yes, it does. However there's no real answer what's the limit: it
 depends on your hardware and cluster configuration.

 You might even want to search the archives of this mailinglist, I
 remember this has been asked before.

 Cheers!

 2012/5/21 Luís Ferreira zamith...@gmail.com

 Hi,

 Does the number of keyspaces affect the overall cassandra performance?


 Cumprimentos,
 Luís Ferreira






 --
 With kind regards,

 Robin Verlangen
 www.robinverlangen.nl





 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215


  Cumprimentos,
 Luís Ferreira







 --
 *Franc Carter* | Systems architect | Sirca Ltd
  marc.zianideferra...@sirca.org.au
 franc.car...@sirca.org.au | www.sirca.org.au
 Tel: +61 2 9236 9118
  Level 9, 80 Clarence St, Sydney NSW 2000
 PO Box H58, Australia Square, Sydney NSW 1215





-- 

*Franc Carter* | Systems architect | Sirca Ltd
 marc.zianideferra...@sirca.org.au

franc.car...@sirca.org.au | www.sirca.org.au

Tel: +61 2 9236 9118

Level 9, 80 Clarence St, Sydney NSW 2000

PO Box H58, Australia Square, Sydney NSW 1215


Re: Correct way to set strategy options in cqlsh?

2012-05-23 Thread Damick, Jeffrey
Since this is the EC2MultiRegionSnitch, how do you suggest I change name?  It 
needs to match the datacenter name that is bound to aws region names, so it 
seems like this is a bug to me..




On 5/23/12 2:33 AM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote:


You *must* remove the hyphen.
According to the csql 2.0 documentation, here is the correct syntax to create 
keyspace:

createKeyspaceStatement ::= CREATE KEYSPACE name
WITH optionName = optionVal
( AND optionName = optionVal )*
   ;
optionName ::= identifier
  | optionName : identifier
  | optionName : integer
  ;
optionVal ::= stringLiteral
 | identifier
 | integer
 ;

The string strategy_options:us-west=1; matches the following syntax:

optionName : identifier = integer

Thus, us-west is an *identifier*, and again according to the documentation:
An identifier is a letter followed by any sequence of letters, digits, or the 
underscore (_).


Re: Replication factor

2012-05-23 Thread Daning Wang
Thanks guys.

Aaron, I am confused about this. from wiki
http://wiki.apache.org/cassandra/ReadRepair, looks for any consistency
level. Read Repair will be done either before or after responding data.

  Read Repair does not run at CL ONE

Daning

On Wed, May 23, 2012 at 3:51 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

   When RF == number of nodes, and you read at CL ONE you will always be
 reading locally.

 “always be reading locally” – only if Dynamic Snitch is “off”. With
 dynamic snitch “on” request may be redirected to other node, which may
 introduce latency spikes.

 ** **

 ** **


Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 What is Adform: watch this short video http://vimeo.com/adform/display
  [image: Adform News] http://www.adform.com

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* aaron morton [mailto:aa...@thelastpickle.com]
 *Sent:* Wednesday, May 23, 2012 13:00
 *To:* user@cassandra.apache.org
 *Subject:* Re: Replication factor

 ** **

 RF is normally adjusted to modify availability (see
 http://thelastpickle.com/2011/06/13/Down-For-Me/)

 ** **

 for example, if I have 4 nodes cluster in one data center, how can RF=2 vs
 RF=4 affect read performance? If consistency level is ONE, looks reading
 does not need to go to another hop to get data if RF=4, but it would do
 more work on read repair in the background.

  Read Repair does not run at CL ONE.

 When RF == number of nodes, and you read at CL ONE you will always be
 reading locally. But with a low consistency.

 If you read with QUORUM when RF == number of nodes you will still get some
 performance benefit from the data being read locally.

 ** **

 Cheers

 ** **

 ** **

 -

 Aaron Morton

 Freelance Developer

 @aaronmorton

 http://www.thelastpickle.com

 ** **

 On 23/05/2012, at 9:34 AM, Daning Wang wrote:



 

 Hello,

 What is the pros and cons to choose different number of replication factor
 in term of performance? if space is not a concern.

 for example, if I have 4 nodes cluster in one data center, how can RF=2 vs
 RF=4 affect read performance? If consistency level is ONE, looks reading
 does not need to go to another hop to get data if RF=4, but it would do
 more work on read repair in the background.

 Can you share some insights about this?

 Thanks in advance,

 Daning 

 ** **

signature-logo7789.png

Re: Number of keyspaces

2012-05-23 Thread Rob Coli
On Tue, May 22, 2012 at 4:56 AM, samal samalgo...@gmail.com wrote:
 Not ideally, now cass has global memtable tuning. Each cf correspond to
 memory  in ram. Year wise cf means it will be in read only state for next
 year, memtable  will still consume ram.

An empty memtable seems unlikely to consume a meaningful amount of
RAM. I'm sure by reading the code I could estimate how little memory
is involved, but I'd be surprised if it is over a few megabytes. This
is independent from the other overhead associated with a CF being
defined, of course.

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: Correct way to set strategy options in cqlsh?

2012-05-23 Thread paul cannon
I agree, this is a bug.  I opened
https://issues.apache.org/jira/browse/CASSANDRA-4278 to track it.

The workaround for now is to use the CLI or the thrift interface to create
your keyspace.

p

On Wed, May 23, 2012 at 8:29 AM, Damick, Jeffrey jeffrey.dam...@neustar.biz
 wrote:

  Since this is the EC2MultiRegionSnitch, how do you suggest I change
 name?  It needs to match the datacenter name that is bound to aws region
 names, so it seems like this is a bug to me..





 On 5/23/12 2:33 AM, Romain HARDOUIN romain.hardo...@urssaf.fr wrote:


 You *must* remove the hyphen.
 According to the csql 2.0 documentation, here is the correct syntax to
 create keyspace:

 createKeyspaceStatement ::= CREATE KEYSPACE name
 WITH optionName = optionVal
 ( AND optionName = optionVal )*
;
 optionName ::= identifier
   | optionName : identifier
   | optionName : integer
   ;
 optionVal ::= stringLiteral
  | identifier
  | integer
  ;

 The string strategy_options:us-west=1; matches the following syntax:

 optionName : identifier = integer

 Thus, us-west is an *identifier*, and again according to the
 documentation:
 An identifier is a letter followed by any sequence of letters, digits,
 or the underscore (_).




Re: Replication factor

2012-05-23 Thread Brandon Williams
On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:

   When RF == number of nodes, and you read at CL ONE you will always be
 reading locally.

 “always be reading locally” – only if Dynamic Snitch is “off”. With
 dynamic snitch “on” request may be redirected to other node, which may
 introduce latency spikes.


Actually it's preventing spikes, since if it won't read locally that means
the local replica is in worse shape than the rest (compacting, repairing,
etc.)

-Brandon


Error loading data: Internal error processing get_range_slices / Unavailable Exception

2012-05-23 Thread Abhijit Chanda
Hi All,
i am facing problem while setting up my database. The error under mentioned
is reflected every time i try to
setup the DB. Unable to understand why these are occurring?  though
previously it was working fine, i guess
it is some connection related issues.

UnknownException: [host=192.168.2.13(192.168.2.13):9160, latency=11(31),
attempts=1] SchemaDisagreementException()
NoAvailableHostsException: [host=None(0.0.0.0):0, latency=0(0), attempts=0]
No hosts to borrow from
Describe Keyspace: vCRIME
OperationTimeoutException: [host=192.168.2.13(192.168.2.13):9160,
latency=10002(10002), attempts=1] TimedOutException()
OperationTimeoutException: [host=192.168.2.13(192.168.2.13):9160,
latency=10001(10001), attempts=1] TimedOutException()
OperationTimeoutException: [host=192.168.2.13(192.168.2.13):9160,
latency=1(1), attempts=1] TimedOutException()
OperationTimeoutException: [host=192.168.2.13(192.168.2.13):9160,
latency=10001(10001), attempts=1] TimedOutException()
OperationTimeoutException: [host=192.168.2.13(192.168.2.13):9160,
latency=10001(10001), attempts=1] TimedOutException()
OperationTimeoutException: [host=192.168.2.13(192.168.2.13):9160,
latency=10001(10001), attempts=1] TimedOutException()
TokenRangeOfflineException: [host=192.168.2.13(192.168.2.13):9160,
latency=2(2), attempts=1] UnavailableException()
TokenRangeOfflineException: [host=192.168.2.13(192.168.2.13):9160,
latency=2(2), attempts=1] UnavailableException()
TokenRangeOfflineException: [host=192.168.2.13(192.168.2.13):9160,
latency=2(2), attempts=1] UnavailableException()
TokenRangeOfflineException: [host=192.168.2.13(192.168.2.13):9160,
latency=2(2), attempts=1] UnavailableException()
TokenRangeOfflineException: [host=192.168.2.13(192.168.2.13):9160,
latency=2(2), attempts=1] UnavailableException()

Regards,
Abhijit


RE: Replication factor

2012-05-23 Thread Viktor Jevdokimov
Depends on use case. For ours we have another experience and statistics, when 
turning dynamic snitch off makes overall latency and spikes much, much lower.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Brandon Williams [mailto:dri...@gmail.com]
Sent: Thursday, May 24, 2012 02:35
To: user@cassandra.apache.org
Subject: Re: Replication factor

On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
 When RF == number of nodes, and you read at CL ONE you will always be reading 
 locally.
always be reading locally - only if Dynamic Snitch is off. With dynamic 
snitch on request may be redirected to other node, which may introduce 
latency spikes.

Actually it's preventing spikes, since if it won't read locally that means the 
local replica is in worse shape than the rest (compacting, repairing, etc.)

-Brandon
inline: signature-logo29.png

Re: unknown exception with hector

2012-05-23 Thread Deno Vichas
i've notice the my nodes seem to have a large (?, not really sure what 
acceptable numbers are) read dropped count from tpstats.  could they be 
related?


On 5/23/2012 2:55 AM, aaron morton wrote:

No sure but

   at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)


Looks like the client is not using framed transport. The server 
defaults to framed.


Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 5:35 AM, Deno Vichas wrote:

could somebody clue me in to the cause of this exception?  i see 
these randomly.


AnalyzerService-2 2012-05-22 13:28:00,385 :: WARN 
 cassandra.connection.HConnectionManager  - Exception:
me.prettyprint.hector.api.exceptions.HectorTransportException: 
org.apache.thrift.transport.TTransportException
   at 
me.prettyprint.cassandra.service.ExceptionsTranslatorImpl.translate(ExceptionsTranslatorImpl.java:39)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$23.execute(KeyspaceServiceImpl.java:851)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$23.execute(KeyspaceServiceImpl.java:840)
   at 
me.prettyprint.cassandra.service.Operation.executeAndSetResult(Operation.java:99)
   at 
me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:243)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl.getColumn(KeyspaceServiceImpl.java:857)
   at 
me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:57)
   at 
me.prettyprint.cassandra.model.thrift.ThriftColumnQuery$1.doInKeyspace(ThriftColumnQuery.java:52)
   at 
me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
   at 
me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
   at 
me.prettyprint.cassandra.model.thrift.ThriftColumnQuery.execute(ThriftColumnQuery.java:51)
   at 
com.stocktouch.dao.StockDaoImpl.getHistorical(StockDaoImpl.java:365)
   at 
com.stocktouch.dao.StockDaoImpl.getHistoricalQuote(StockDaoImpl.java:433)
   at 
com.stocktouch.service.StockHistoryServiceImpl.getHistoricalQuote(StockHistoryServiceImpl.java:480)
   at 
com.stocktouch.service.AnalyzerServiceImpl.getClose(AnalyzerServiceImpl.java:180)
   at 
com.stocktouch.service.AnalyzerServiceImpl.calcClosingPrices(AnalyzerServiceImpl.java:90)
   at 
com.stocktouch.service.AnalyzerServiceImpl.nightlyRollup(AnalyzerServiceImpl.java:66)
   at 
com.stocktouch.service.AnalyzerServiceImpl$2.run(AnalyzerServiceImpl.java:55)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

   at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.thrift.transport.TTransportException
   at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
   at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
   at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
   at 
org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
   at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
   at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
   at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
   at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get(Cassandra.java:509)
   at 
org.apache.cassandra.thrift.Cassandra$Client.get(Cassandra.java:492)
   at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$23.execute(KeyspaceServiceImpl.java:846)

   ... 20 more


thanks,
deno