RE: Is the updating compaction strategy from 'sized tiered' to 'leveled' automatic or need to be done manually? [heur]

2014-05-06 Thread Viktor Jevdokimov
I mean insert/write data. When data fills memtable, memtable is flushed to disk 
as sstable, when new sstable is created, Cassandra will check if compaction is 
needed and triggers one.

From: Yatong Zhang [mailto:bluefl...@gmail.com]
Sent: Monday, May 5, 2014 9:54 AM
To: user@cassandra.apache.org
Subject: Re: Is the updating compaction strategy from 'sized tiered' to 
'leveled' automatic or need to be done manually? [heur]

What you mean 'you need write to this CF'? I've changed the schema by using 
CQL3 'alter table' statments.

On Mon, May 5, 2014 at 2:28 PM, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
To trigger LCS you need to write to this CF and wait when new sstable flushes. 
I can’t find any other way to start LCS.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063tel:%2B370%205%20212%203063, Fax +370 5 261 
0453tel:%2B370%205%20261%200453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Experience Adform DNAhttp://vimeo.com/76421547

[Adform News]http://www.adform.com
[Adform awarded the Best Employer 
2012]http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Yatong Zhang [mailto:bluefl...@gmail.commailto:bluefl...@gmail.com]
Sent: Sunday, May 4, 2014 5:22 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Is the updating compaction strategy from 'sized tiered' to 'leveled' 
automatic or need to be done manually? [heur]

Hi there,

I am updating compaction strategy from 'sized tiered' to 'leveled' and from 
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra it is 
said:
When updating an existing column family, reads and writes can continue as usual 
while leveling of existing sstables is performed in the background.

But I still see many old sstables with very large size and an old file date. So 
I am wondering is the updating of compaction done automatically? If yes, is 
there an estimate of time it will take? If not, what's the steps to do it 
manually?
I've googled a lot but can't find something helpful. Thanks for any feedbacks 
and any help is of great appreciation.



RE: Is the updating compaction strategy from 'sized tiered' to 'leveled' automatic or need to be done manually? [heur]

2014-05-06 Thread Viktor Jevdokimov
Enough to write 1 column and run nodetool flush.

From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com]
Sent: Tuesday, May 6, 2014 12:00 PM
To: user@cassandra.apache.org
Subject: RE: Is the updating compaction strategy from 'sized tiered' to 
'leveled' automatic or need to be done manually? [heur]

I mean insert/write data. When data fills memtable, memtable is flushed to disk 
as sstable, when new sstable is created, Cassandra will check if compaction is 
needed and triggers one.

From: Yatong Zhang [mailto:bluefl...@gmail.com]
Sent: Monday, May 5, 2014 9:54 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Is the updating compaction strategy from 'sized tiered' to 
'leveled' automatic or need to be done manually? [heur]

What you mean 'you need write to this CF'? I've changed the schema by using 
CQL3 'alter table' statments.

On Mon, May 5, 2014 at 2:28 PM, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
To trigger LCS you need to write to this CF and wait when new sstable flushes. 
I can’t find any other way to start LCS.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063tel:%2B370%205%20212%203063, Fax +370 5 261 
0453tel:%2B370%205%20261%200453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Experience Adform DNAhttp://vimeo.com/76421547

[Adform News]http://www.adform.com
[Adform awarded the Best Employer 
2012]http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Yatong Zhang [mailto:bluefl...@gmail.commailto:bluefl...@gmail.com]
Sent: Sunday, May 4, 2014 5:22 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Is the updating compaction strategy from 'sized tiered' to 'leveled' 
automatic or need to be done manually? [heur]

Hi there,

I am updating compaction strategy from 'sized tiered' to 'leveled' and from 
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra it is 
said:
When updating an existing column family, reads and writes can continue as usual 
while leveling of existing sstables is performed in the background.

But I still see many old sstables with very large size and an old file date. So 
I am wondering is the updating of compaction done automatically? If yes, is 
there an estimate of time it will take? If not, what's the steps to do it 
manually?
I've googled a lot but can't find something helpful. Thanks for any feedbacks 
and any help is of great appreciation.



RE: Is the updating compaction strategy from 'sized tiered' to 'leveled' automatic or need to be done manually?

2014-05-05 Thread Viktor Jevdokimov
To trigger LCS you need to write to this CF and wait when new sstable flushes. 
I can’t find any other way to start LCS.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Experience Adform DNAhttp://vimeo.com/76421547

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Yatong Zhang [mailto:bluefl...@gmail.com]
Sent: Sunday, May 4, 2014 5:22 AM
To: user@cassandra.apache.org
Subject: Is the updating compaction strategy from 'sized tiered' to 'leveled' 
automatic or need to be done manually? [heur]

Hi there,

I am updating compaction strategy from 'sized tiered' to 'leveled' and from 
http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra it is 
said:
When updating an existing column family, reads and writes can continue as usual 
while leveling of existing sstables is performed in the background.

But I still see many old sstables with very large size and an old file date. So 
I am wondering is the updating of compaction done automatically? If yes, is 
there an estimate of time it will take? If not, what's the steps to do it 
manually?
I've googled a lot but can't find something helpful. Thanks for any feedbacks 
and any help is of great appreciation.


RE: Exporting column family data to csv

2014-04-02 Thread Viktor Jevdokimov
http://mail-archives.apache.org/mod_mbox/cassandra-user/201309.mbox/%3C9AF3ADEDDFED4DDEA840B8F5C6286BBA@vig.local%3E

http://stackoverflow.com/questions/18872422/rpc-timeout-error-while-exporting-data-from-cql

Google for more.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Experience Adform DNAhttp://vimeo.com/76421547

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: ng [mailto:pipeli...@gmail.com]
Sent: Wednesday, April 2, 2014 6:04 PM
To: user@cassandra.apache.org
Subject: Exporting column family data to csv


I want to export all the data of particular column family to the text file from 
Cassandra cluster.

I tried

copy keyspace.mycolumnfamily to '/root/ddd/xx.csv';

It gave me timeout error

I tried below in Cassandra.yaml

request_timeout_in_ms: 1000
read_request_timeout_in_ms: 1000
range_request_timeout_in_ms: 1000
truncate_request_timeout_in_ms: 1000

I still have no luck. Any advise how to achieve this? I am NOT limited to copy 
command.  What is the best way to achieve this? Thanks in advance for the help.
ng
inline: signature-logo29.pnginline: signature-best-employer-logo4823.png

RE: Upserting the same values multiple times

2014-01-21 Thread Viktor Jevdokimov
It's not about tombstones. Tombstones are virtually markers for deleted columns 
(using delete or ttl) in new sstables after compaction to keep such columns for 
gcgrace period.

Updates do not create tombstones for previous records, latest version upon 
timestamp will be saved from memtable or when merged from sstables upon 
compaction.

While data is in the memtable, latest timestamp wins, only latest version will 
flush to disk. Then everything depends on how fast you flush memtables and how 
compaction works thereafter. Do not expect any tombstones with updates, except 
when delete columns.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Experience Adform DNAhttp://vimeo.com/76421547

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Sanjeeth Kumar [mailto:sanje...@exotel.in]
Sent: Wednesday, January 22, 2014 5:37 AM
To: user@cassandra.apache.org
Subject: Upserting the same values multiple times

Hi,
   I have a table A, one of the fields of which is a text column called body.
 This text's length could vary somewhere between 120 characters to say 400 
characters. The contents of this column can be the same for millions of rows.
To prevent the repetition of the same data, I thought I will add another table 
B, which stores MD5Hash(body), body\.
Table A {
some fields;

digest text,
.
}


TABLE B (
  digest text,
  body text,
  PRIMARY KEY (digest)
)
Whenever I insert into table A, I calculate the digest of body, and blindly 
call a insert into table B also. I'm not doing any read on B. This could result 
in the same digest, body being inserted millions of times in a short span of 
time.
Couple of questions.
1) Would this cause an issue due to the number of tombstones created in a short 
span of time .I'm assuming for every insert , there would be a tombstone 
created for the previous record.
2) Or should I just replicate the same data in Table A itself multiple times 
(with compression, space aint that big an issue ?)

- Sanjeeth
inline: signature-logo1dfe.pnginline: signature-best-employer-logo12bc.png

RE: Cassandra mad GC

2014-01-15 Thread Viktor Jevdokimov
Simply don't use G1 GC, it will not be better on Cassandra than CMS, it could 
be worse.


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Dimetrio [mailto:dimet...@flysoft.ru]
Sent: Tuesday, January 14, 2014 3:16 PM
To: cassandra-u...@incubator.apache.org
Subject: Cassandra mad GC

Hi all.
I have many GC freezes on my cassandra cluster

Im using G1 GC and CMS gives similar freezes JVM_OPTS=$JVM_OPTS -XX:+UseG1GC
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=1
JVM_OPTS=$JVM_OPTS -XX:NewRatio=1
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=15
JVM_OPTS=$JVM_OPTS -XX:-UseAdaptiveSizePolicy
JVM_OPTS=$JVM_OPTS -XX:G1HeapRegionSize=32m

Heap 8GB
10 nodes aws c3.4xlarge
60Gb per node

Cassandra logs


GC logs


sometimes node freeze and told that one or two nodes are down cpu load  1000% 
LA = 6-15

pending tasks: 4
  compaction typekeyspace   table   completed
total  unit  progress
   Compaction  Social   home_timeline  4097097092
4920701908 bytes83.26%
   Compaction  Social   home_timeline  2713279974
6272012039 bytes43.26%
Active compaction remaining time :   0h00m32s

200-300 requests per second on each node with many inserts and deletes (batch 
lower than 50)

How can i reduce GC freezes?

--regards





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-mad-GC-tp7592248.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Cassandra mad GC

2014-01-15 Thread Viktor Jevdokimov
Forgot to ask, what do you want to achieve by changing default GC settings?


-Original Message-
From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] 
Sent: Wednesday, January 15, 2014 10:18 AM
To: user@cassandra.apache.org
Subject: RE: Cassandra mad GC

Simply don't use G1 GC, it will not be better on Cassandra than CMS, it could 
be worse.


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Dimetrio [mailto:dimet...@flysoft.ru]
Sent: Tuesday, January 14, 2014 3:16 PM
To: cassandra-u...@incubator.apache.org
Subject: Cassandra mad GC

Hi all.
I have many GC freezes on my cassandra cluster

Im using G1 GC and CMS gives similar freezes JVM_OPTS=$JVM_OPTS -XX:+UseG1GC
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=1
JVM_OPTS=$JVM_OPTS -XX:NewRatio=1
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=15
JVM_OPTS=$JVM_OPTS -XX:-UseAdaptiveSizePolicy
JVM_OPTS=$JVM_OPTS -XX:G1HeapRegionSize=32m

Heap 8GB
10 nodes aws c3.4xlarge
60Gb per node

Cassandra logs


GC logs


sometimes node freeze and told that one or two nodes are down cpu load  1000% 
LA = 6-15

pending tasks: 4
  compaction typekeyspace   table   completed
total  unit  progress
   Compaction  Social   home_timeline  4097097092
4920701908 bytes83.26%
   Compaction  Social   home_timeline  2713279974
6272012039 bytes43.26%
Active compaction remaining time :   0h00m32s

200-300 requests per second on each node with many inserts and deletes (batch 
lower than 50)

How can i reduce GC freezes?

--regards





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Cassandra-mad-GC-tp7592248.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Sorting keys for batch reads to minimize seeks

2013-10-18 Thread Viktor Jevdokimov
 Sorting a random set of keys will not help.
False

 If you know that you set of keys are on a particular node, then sorting might 
 help.
True


Two different answers to the same question.


 But I doubt that it is a sound practice, given that sets of keys can be moved 
 - as nodes are added or removed from the cluster
Just be aware, get token ranges from Cassandra.



Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.


RE: Sorting keys for batch reads to minimize seeks

2013-10-18 Thread Viktor Jevdokimov
Read latency depends on many factors, don't forget physics.
If it meets your requirements, it is good.


-Original Message-
From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com] 
Sent: Friday, October 18, 2013 1:03 PM
To: user@cassandra.apache.org
Subject: Re: Sorting keys for batch reads to minimize seeks

Hi,

Thanks for your reply. Our latency currently is 23.618ms. However I simply read 
that off one node just now while it wasn't under a load test. I am going to be 
able to get a better number after the next test run.

What is a good value for read latency?


On 18/10/13 08:31, Viktor Jevdokimov wrote:
 The only thing you may win - avoid unnecessary network hops if:
 - request sorted keys (by token) from appropriate replica with 
 ConsistencyLevel.ONE and dynamic_snitch: false.
 - nodes has the same load
 - replica not doing GC, and GC pauses are much higher than internode 
 communication.

 For multiple keys request C* will do multiple single key reads, except for 
 range scan requests, where only starting key and batch size is used in 
 request.

 Consider multiple key request as a slow request by design, try to model your 
 data for low latency single key requests.

 So, what latencies do you want to achieve?



 Best regards / Pagarbiai

 Viktor Jevdokimov
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063
 Fax: +370 5 261 0453

 J. Jasinskio 16C,
 LT-03163 Vilnius,
 Lithuania



 Disclaimer: The information contained in this message and attachments 
 is intended solely for the attention and use of the named addressee 
 and may be confidential. If you are not the intended recipient, you 
 are reminded that the information remains the property of the sender. 
 You must not use, disclose, distribute, copy, print or rely on this 
 e-mail. If you have received this message in error, please contact the 
 sender immediately and irrevocably delete this message and any 
 copies.-Original Message-
 From: Artur Kronenberg [mailto:artur.kronenb...@openmarket.com]
 Sent: Thursday, October 17, 2013 7:40 PM
 To: user@cassandra.apache.org
 Subject: Sorting keys for batch reads to minimize seeks

 Hi,

 I am looking to somehow increase read performance on cassandra. We are still 
 playing with configurations but I was thinking if there would be solutions in 
 software that might help us speed up our read performance.

 E.g. one idea, not sure how sane that is, was to sort read-batches by 
 row-keys before submitting them to cassandra. The idea is that row-keys 
 should be closer together on the physical disk and therefor this may minimize 
 the amount of random seeks we have to do when querying say 1000 entries from 
 cassandra. Does that make any sense?

 Is there anything else that we can do in software to improve performance? 
 Like specific batch sizes for reads? We are using the astyanax library to 
 access cassandra.

 Thanks!





RE: Problems in the cassandra bulk loader

2013-10-10 Thread Viktor Jevdokimov
You overwrite your columns by writing new row/supercolumn.

Remove new row/supercolumn from for statement, which is for columns:


int rowKey = 10;
int superColumnKey = 20;
usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
for (int i = 0; i  10; i++) {
usersWriter.addColumn(
ByteBufferUtil.bytes(i+1),
ByteBufferUtil.bytes(i+1),
System.currentTimeMillis());
 }
 usersWriter.close();

Next time ask such questions in user mail list, not C* devs, which is for C* 
development, not usage/your code development around Cassandra.





Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: José Elias Queiroga da Costa Araújo [mailto:je...@cesar.org.br]
Sent: Wednesday, October 9, 2013 11:22 PM
To: dev
Subject: Problems in the cassandra bulk loader

Hi all,

I'm trying to use the bulk insertion with the 
SSTableSimpleUnsortedWriter class from cassandra API and I facing some 
problems.  After generating and uploading the .db files by using the 
./sstableloader command , I noticed the data didn't match with inserted one.

I put the used code below to try to explain the bahaviour.

 I'm trying to generate the data files using only one rowkey and one 
supercolumn. Where the super column has 10 columns.

IPartitioner p = new Murmur3Partitioner(); CFMetaData scf = new 
CFMetaData(myKeySpace, Column,  ColumnFamilyType.Super, BytesType.instance, 
BytesType.instance);

SSTableSimpleUnsortedWriter usersWriter = new SSTableSimpleUnsortedWriter(new 
File(./), scf, p,64);

int rowKey = 10;
int superColumnKey = 20;
for (int i = 0; i  10; i++) {
 usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
 usersWriter.addColumn(ByteBufferUtil.bytes(i+1),ByteBufferUtil.bytes(i+1),
System.currentTimeMillis());
 }
 usersWriter.close();

After uploading,  the result is:

RowKey: 000a
   = (super_column=0014,
  (name=0001, value=0001,
timestamp=1381348293144))

1 Row Returned.

In this case, my super column should have 10 columns? With 
values between 0001 to 0011?  Since I'm using the same super column.  
The documentation says the newRow method could be invoked many times, it 
impacts only the performance.

The second question is: If this is the correct behavior, the 
column value should be 0011, since it is the last value passed as argument 
to addColumn(...) method in the loop?

  Thanks in the advance,

   Elias.


RE: Problems in the cassandra bulk loader

2013-10-10 Thread Viktor Jevdokimov
SSTableSimpleUnsortedWriter is a sstable writer, not Cassandra, so it just 
writes to file what you give as it is, you need to ensure the consistency.

You can check the file before running sstableloader, all the data is within 
sstable, but instead of 1 row it will have 10 rows with the same key. Probably 
the same will arrive to Cassandra upon import.

But when Cassandra reads sstable sequentially when searches for the key, the 
only first row will be returned (with first column), since it is found and no 
reason to scan more, it will not return many rows with the same key, because 
Cassandra does not expect more rows with the same key in sstable.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: José Elias Queiroga da Costa Araújo [mailto:je...@cesar.org.br]
Sent: Thursday, October 10, 2013 4:33 PM
To: user@cassandra.apache.org
Subject: Re: Problems in the cassandra bulk loader


Hi, I thought the bulk API could handle this, merging all columns for 
the same super column. I did something like this in the java client (Hector) 
where it is able to solve this conflict only appending the columns.

Regarding to the column value, if the code is overwriting the columns I 
expected the column had the last value of my collection, but it is considering 
the first one.

Regards,

Elias.

2013/10/10 Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
You overwrite your columns by writing new row/supercolumn.

Remove new row/supercolumn from for statement, which is for columns:


int rowKey = 10;
int superColumnKey = 20;
usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
for (int i = 0; i  10; i++) {
usersWriter.addColumn(
ByteBufferUtil.bytes(i+1),
ByteBufferUtil.bytes(i+1),
System.currentTimeMillis());
 }
 usersWriter.close();

Next time ask such questions in user mail list, not C* devs, which is for C* 
development, not usage/your code development around Cassandra.





Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-03163 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: José Elias Queiroga da Costa Araújo 
[mailto:je...@cesar.org.brmailto:je...@cesar.org.br]
Sent: Wednesday, October 9, 2013 11:22 PM
To: dev
Subject: Problems in the cassandra bulk loader

Hi all,

I'm trying to use the bulk insertion with the 
SSTableSimpleUnsortedWriter class from cassandra API and I facing some 
problems.  After generating and uploading the .db files by using the 
./sstableloader command , I noticed the data didn't match with inserted one.

I put the used code below to try to explain the bahaviour.

 I'm trying to generate the data files using only one rowkey and one 
supercolumn. Where the super column has 10 columns.

IPartitioner p = new Murmur3Partitioner(); CFMetaData scf = new 
CFMetaData(myKeySpace, Column,  ColumnFamilyType.Super, BytesType.instance, 
BytesType.instance);

SSTableSimpleUnsortedWriter usersWriter = new SSTableSimpleUnsortedWriter(new 
File(./), scf, p,64);

int rowKey = 10;
int superColumnKey = 20;
for (int i = 0; i  10; i++) {
 usersWriter.newRow(ByteBufferUtil.bytes(rowKey));
usersWriter.newSuperColumn(ByteBufferUtil.bytes(superColumnKey));
 usersWriter.addColumn(ByteBufferUtil.bytes(i

RE: Error during startup - java.lang.OutOfMemoryError: unable to create new native thread

2013-09-09 Thread Viktor Jevdokimov
For start:
- check (cassandra-env.sh) -Xss size, you may need to increase it for your JVM;
- check (cassandra-env.sh) -Xms and -Xmx size, you may need to increase it for 
your data load/bloom filter/index sizes.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer


[Adform News] http://www.adform.com

Visit us at Dmexco: Hall 6 Stand B-52
September 18-19 Cologne, Germany


Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-03163 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Dmexco 2013] http://www.dmexco.de/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: srmore [mailto:comom...@gmail.com]
Sent: Tuesday, September 10, 2013 6:16 AM
To: user@cassandra.apache.org
Subject: Error during startup - java.lang.OutOfMemoryError: unable to create 
new native thread [heur]


I have a 5 node cluster with a load of around 300GB each. A node went down and 
does not come up. I can see the following exception in the logs.

ERROR [main] 2013-09-09 21:50:56,117 AbstractCassandraDaemon.java (line 139) 
Fatal exception in thread Thread[main,5,main]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at 
java.util.concurrent.ThreadPoolExecutor.addIfUnderCorePoolSize(ThreadPoolExecutor.java:703)
at 
java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1392)
at 
org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:77)
at 
org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor.init(JMXEnabledThreadPoolExecutor.java:65)
at 
org.apache.cassandra.concurrent.JMXConfigurableThreadPoolExecutor.init(JMXConfigurableThreadPoolExecutor.java:34)
at 
org.apache.cassandra.concurrent.StageManager.multiThreadedConfigurableStage(StageManager.java:68)
at 
org.apache.cassandra.concurrent.StageManager.clinit(StageManager.java:42)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:344)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:173)

The ulimit -u output is
515042
Which is far more than what is recommended [1] (10240) and I am skeptical to 
set it to unlimited as recommended here [2]
Any pointers as to what could be the issue and how to get the node up.



[1] 
http://www.datastax.com/documentation/cassandra/1.2/webhelp/index.html?pagename=docsversion=1.2file=install/recommended_settings#cassandra/install/installRecommendSettings.html

[2] 
http://mail-archives.apache.org/mod_mbox/cassandra-user/201303.mbox/%3CCAPqEvGE474Omea1BFLJ6U_pbAkOwWxk=dwo35_pc-atwb4_...@mail.gmail.com%3E
Thanks !
inline: signature-logo734.pnginline: dmexco4bc1.png

RE: What is the effect of reducing the thrift message sizes on GC

2013-06-18 Thread Viktor Jevdokimov
Our experience shows that write load (memtables) impacts ParNew GC most. More 
writes, more frequent ParNew GC. Time of ParNew GC depends on how many writes 
was made during cycle between ParNew GC's and size of NEW_HEAP (young gen).

Basicly ParNew GC itself takes longer when more objects have to be copied from 
young to old space. So reads and compactions will not promote objects to old 
space (short living objects) and you can see that increased reads and 
compactions during the same write load will increase GC frequency but decrease 
GC pause time.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Ananth Gundabattula [mailto:agundabatt...@threatmetrix.com]
Sent: Tuesday, June 18, 2013 10:31 AM
To: user@cassandra.apache.org
Subject: What is the effect of reducing the thrift message sizes on GC

We are currently running on 1.1.10 and planning to migrate to a higher
version 1.2.4.

The question pertains to tweaking all the knobs to reduce GC related issues
( we have been fighting a lot of really bad GC issues on 1.1.10 and met with 
little
success all the way using 1.1.10)

Taking into consideration GC tuning is a black art, I was wondering if we
can have some good effect on the GC by tweaking the following settings:
*thrift_framed_transport_size_in_mb  thrift_max_message_length_in_mb*
*
*
Our system is a very short column (both in number of columns and data sizes
) tables but having millions/billions of rows in each column family. The typical
number of columns in each column family is 4. The typical lookup involves
specifying the row key and fetching one column most of the times. The
writes are also similar except for one keyspace where the number of columns
are 50 but very small data sizes per column.

Assuming we can tweak the config values :
*
*
*  thrift_framed_transport_size_in_mb  *
*   thrift_max_message_length_in_mb *

to lower values in the above context, I was wondering if it helps in the GC
being invoked less if the thrift settings reflect our data model reads and 
writes ?

For example: What is the impact by reducing the above config values on the
GC to say 1 mb rather than say 15 or 16 ?

Thanks a lot for your inputs and thoughts.


Regards,
Ananth
inline: signature-logo2eb9.pnginline: signature-best-employer-logo7ec4.png

RE: consistency level for create keyspace?

2013-06-04 Thread Viktor Jevdokimov
We use describe_schema_versions thrift request:

  /**
   * for each schema version present in the cluster, returns a list of nodes at 
that version.
   * hosts that do not respond will be under the key 
DatabaseDescriptor.INITIAL_VERSION.
   * the cluster is all on the same version if the size of the map is 1.
   */
  mapstring, liststring describe_schema_versions()
   throws (1: InvalidRequestException ire),




Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: John R. Frank [mailto:j...@mit.edu]
Sent: Wednesday, June 5, 2013 4:28 AM
To: user@cassandra.apache.org
Subject: Re: consistency level for create keyspace?



Further the question below, the same thing seems to happen with
ColumnFamily:  If I make a ColumnFamily, and then don't wait long enough, an 
attempt to query it can fail if the particular node that gets queried does not 
know about it yet.  Is there something smarter to do than just try/except all 
such failures and sleep it off?

This is particularly cumbersome for writing tests that setup/teardown keyspaces 
repeatedly.

jrf



On Mon, 3 Jun 2013, John R. Frank wrote:

 C*

 When I create a keyspace with pycassa on a multi-node cluster, it
 takes some time before all the nodes know about the keyspace.

 So, if I do this:

sm = SystemManager(random.choice(server_list))
sm.create_keyspace(keyspace, SIMPLE_STRATEGY, {'replication_factor':
 '1'})
sm.close()

 and then immediately pick a different node, it often will raise
 InvalidRequestException(why=Keyspace 'foo' does not exist)


 Is there a better way to handle this than just avoiding immediately
 asking other nodes for the keyspace?


 John



RE: (unofficial) Community Poll for Production Operators : Repair

2013-05-14 Thread Viktor Jevdokimov
 1) What version of Cassandra do you run, on what hardware?
1.0.12 (upgrade to 1.2.x is planned)
Blade servers with
  1x6 CPU cores with HT (12 vcores) (upgradable to 2x CPUs)
  96GB RAM (upgrade is planned to 128GB, 256GB max)
  1x300GB 15k Data and 1x300GB 10k CommitLog/System SAS HDDs

 2) What consistency level do you write at? Do you do DELETEs?
Write/Delete failover policy (where needed): try QUORUM then ONE finally ANY.

 3) Do you run a regularly scheduled repair?
NO, read repair is enough (where needed).

 4) If you answered yes to 3, what is the frequency of the repair?
If we'll do it, we'll do it once a day.

 5) What has been your subjective experience with the performance of
 repair? (Does it work as you would expect? Does its overhead have a
 significant impact on the performance of your cluster?)
For our use case it has too much significant impact on performance of the 
cluster without real value.




Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.


RE: Cassandra Experts

2013-05-09 Thread Viktor Jevdokimov
Consulting company is a body shop that looking for a job candidates for their 
clients, shortly - recruiters, so not interested in support or learning, just 
selling bodies with some brains.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Steven Siebert [mailto:smsi...@gmail.com]
Sent: Friday, May 10, 2013 00:02
To: user@cassandra.apache.org
Subject: Re: Cassandra Experts

Hi Liz,

Are you looking for a reference to professional cassandra services/support...or 
looking to learn cassandra to provide said support?  If the former, I highly 
recommend DataStax 
(http://www.datastax.com/what-we-offer/products-services/consulting).  I'm a 
non-affiliated future customer (adoption delay is on our side), and have thus 
far received great support from their sales and technical teams - they have 
spent a lot of time out of hide to capture my needs and answer my questions.

...not to mention their direct ties to apache cassandra 
(http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra),
 they obviously have the technical capabilities current and future.

From one cassandra user -- if you're looking for paid support, that's where I 
would go.

Regards,

Steve
On Thu, May 9, 2013 at 4:25 PM, Liz Lee 
liz@redoaktech.commailto:liz@redoaktech.com wrote:
Hello,

My name is Liz and I just subscribed to the mailing list for Cassandra.  I work 
for a consulting company by the name of Red Oak Technologies, and we have a 
world-class client who is in need of Cassandra professional services expertise. 
 If anyone has any tips or leads for me, I surely would be grateful! :)

Liz

Liz Lee
Red Oak Technologies
liz@redoaktech.commailto:liz@redoaktech.com

inline: signature-logo36a.pnginline: signature-best-employer-logo5ef2.png

RE: compaction throughput rate not even close to 16MB

2013-04-25 Thread Viktor Jevdokimov
Our experience with compactions shows that more columns to merge for the same 
row, more CPU it takes.

For example, testing and choosing between 2 data models with supercolumns (we 
still need supercolumns since composite columns lacks some functionality):
  1. supercolumns with many columns
  2.  supercolumns with one column (columns from model 1 merged to one blob 
value)
We found that model 2 compaction performs 4 times faster.

The same for regular column families.





Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: Hiller, Dean [mailto:dean.hil...@nrel.gov]
 Sent: Wednesday, April 24, 2013 23:38
 To: user@cassandra.apache.org
 Subject: Re: compaction throughput rate not even close to 16MB

 Thanks much!!!  Better to hear at least one other person sees the same thing
 ;).  Sometimes these posts just go silent.

 Dean

 From: Edward Capriolo
 edlinuxg...@gmail.commailto:edlinuxg...@gmail.com
 Reply-To:
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Wednesday, April 24, 2013 2:33 PM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: compaction throughput rate not even close to 16MB

 I have noticed the same. I think in the real world your compaction
 throughput is limited by other things. If I had to speculate I would say that
 compaction can remove expired tombstones, however doing this requires
 bloom filter checks, etc.

 I think that setting is more important with multi threaded compaction and/or
 more compaction slots. In those cases it may actually throttle something.


 On Wed, Apr 24, 2013 at 3:54 PM, Hiller, Dean
 dean.hil...@nrel.govmailto:dean.hil...@nrel.gov wrote:
 I was wondering about the compactionthroughput.  I never see ours get
 even close to 16MB and I thought this is supposed to throttle compaction,
 right?  Ours is constantly less than 3MB/sec from looking at our logs or do I
 have this totally wrong?  How can I see the real throughput so that I can
 understand how to throttle it when I need to?

 94,940,780 bytes to 95,346,024 (~100% of original) in 38,438ms =
 2.365603MB/s.  2,350,114 total rows, 2,350,022 unique.  Row merge counts
 were {1:2349930, 2:92, }

 Thanks,
 Dean






RE: 'sstableloader' is not recognized as an internal or external command,

2013-04-23 Thread Viktor Jevdokimov
If your Cassandra cluster is on Linux, I believe that streaming is not 
supported in mixed environment, i.e. Cassandra nodes can't stream between 
Windows and Linux and sstableloader can't stream feom Windows to Linux.

If your Cassandra also on Windows, just try to create bat file for 
sstableloader using other bat files for example.
I don't know if sstableloader will support Windows directory structure.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Techy Teck [mailto:comptechge...@gmail.com]
Sent: Tuesday, April 23, 2013 09:10
To: user
Subject: 'sstableloader' is not recognized as an internal or external command,

I have bunch of `SSTables` with me that I got from somebody within my team. Now 
I was trying to push those `SSTABLES` into `Cassandra database`.

I created corresponding keyspace and column family successfully.

Now as soon as I execute `SSTableLoader` command, I always get below exception?


S:\Apache Cassandra\apache-cassandra-1.2.3\binsstableloader
C:\CassandraClient-LnP\20130405\profileks\PROFILECF 'sstableloader' is
not recognized as an internal or external command, operable program or
batch file.


Can anyone tell me what wrong I am doing here? I am running Cassandra 1.2.3. 
And this is my first time working with `SSTableLoader`. I am working in windows 
environment.

Is sstableloader supported in windows, looking at the source it seems to be 
unix shell file?
inline: signature-logo29.pnginline: signature-best-employer-logo4823.png

RE: Reduce Cassandra GC

2013-04-16 Thread Viktor Jevdokimov
How one could provide any help without any knowledge about your cluster, node 
and environment settings?

40GB was calculated from 2 nodes with RF=2 (each has 100% data range), 2.4-2.5M 
rows * 6 cols * 3kB as a minimum without compression and any overhead (sstable, 
bloom filters and indexes).

With ParNew GC time such as yours even if it is a swapping issue I could say 
only that heap size is too small.

Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is JNA 
installed and used? What is total amount of RAM?

Just for a DEV environment we use 3 virtual machines with 4GB RAM and use 2GB 
heap without any GC issue with amount of data from 0 to 16GB compressed on each 
node. Memtable space sized to 100MB, New Heap 400MB.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Joel Samuelsson [mailto:samuelsson.j...@gmail.com]
Sent: Tuesday, April 16, 2013 12:52
To: user@cassandra.apache.org
Subject: Re: Reduce Cassandra GC

How do you calculate the heap / data size ratio? Is this a linear ratio?

Each node has slightly more than 12 GB right now though.

2013/4/16 Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
For a 40GB of data 1GB of heap is too low.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063tel:%2B370%205%20212%203063, Fax +370 5 261 
0453tel:%2B370%205%20261%200453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News]http://www.adform.com
[Adform awarded the Best Employer 
2012]http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Joel Samuelsson 
[mailto:samuelsson.j...@gmail.commailto:samuelsson.j...@gmail.com]
Sent: Tuesday, April 16, 2013 10:47
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Reduce Cassandra GC

Hi,

We have a small production cluster with two nodes. The load on the nodes is 
very small, around 20 reads / sec and about the same for writes. There are 
around 2.5 million keys in the cluster and a RF of 2.

About 2.4 million of the rows are skinny (6 columns) and around 3kb in size 
(each). Currently, scripts are running, accessing all of the keys in timeorder 
to do some calculations.

While running the scripts, the nodes go down and then come back up 6-7 minutes 
later. This seems to be due to GC. I get lines like this in the log:
INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) GC 
for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600

However, the heap is not full. The heap usage has a jagged pattern going from 
60% up to 70% during 5 minutes and then back down to 60% the next 5 minutes and 
so on. I get no Heap is X full... messages. Every once in a while at one of 
these peaks, I get these stop-the-world GC for 6-7 minutes. Why does GC take up 
so much time even though the heap isn't full?

I am aware that my access patterns make key caching very unlikely to be high. 
And indeed, my average key cache hit ratio during the run of the scripts is 
around 0.5%. I tried disabling key caching on the accessed column family 
(UPDATE COLUMN FAMILY cf WITH caching=none;) through the cassandra-cli but I 
get the same behaviour. Is the turning key cache off effective immediately?

Stop-the-world GC is fine

RE: Does Memtable resides in Heap?

2013-04-11 Thread Viktor Jevdokimov
Memtables resides in heap, write rate impacts GC, more writes - more frequent 
and longer ParNew GC pauses.


From: Jay Svc [mailto:jaytechg...@gmail.com]
Sent: Friday, April 12, 2013 01:03
To: user@cassandra.apache.org
Subject: Does Memtable resides in Heap?

Hi Team,

I have got this 8GB of RAM out of that 4GB allocated to Java Heap. My question 
is the size of Memtable does it contribute to heap size? or they are part of 
off-heap?

Does bigger Memtable would have impact on GC and overall memory management?

I am using DSE 3.0 / Cassandra 1.1.9.

Thanks,
Jay

Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.


sstableloader throughput

2013-04-10 Thread Viktor Jevdokimov
Hi,

We're using Casandra 1.0.12 sstableloader to import data from dedicated machine 
located in DC1 into the cluster of 32 nodes (RF=4), 16 nodes in DC1 and 16 
nodes in DC2.

To disable throttle for sstableloader we set in casasndra.yaml:
stream_throughput_outbound_megabits_per_sec: 0

Outgoing network throughput is about 1Gbit, file copy from dedicated 
sstableloader machine throughput is 90MB/s into DC1 node and 50MB/s into DC2 
node.

But with sstableloader outgoing network traffic is only 9MB/s, importing 1 
sstable of 480MB into cluster (~60MB/node) takes 8-9 minutes. Even with 50MB/s 
import should take less than a minute.

Why sstableloader throughput is so low/slow?



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo4489.pnginline: signature-best-employer-logoc08.png

RE: sstableloader throughput

2013-04-10 Thread Viktor Jevdokimov
Found https://issues.apache.org/jira/browse/CASSANDRA-3668
Weird.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com]
Sent: Wednesday, April 10, 2013 11:12
To: user@cassandra.apache.org
Subject: sstableloader throughput

Hi,

We're using Casandra 1.0.12 sstableloader to import data from dedicated machine 
located in DC1 into the cluster of 32 nodes (RF=4), 16 nodes in DC1 and 16 
nodes in DC2.

To disable throttle for sstableloader we set in casasndra.yaml:
stream_throughput_outbound_megabits_per_sec: 0

Outgoing network throughput is about 1Gbit, file copy from dedicated 
sstableloader machine throughput is 90MB/s into DC1 node and 50MB/s into DC2 
node.

But with sstableloader outgoing network traffic is only 9MB/s, importing 1 
sstable of 480MB into cluster (~60MB/node) takes 8-9 minutes. Even with 50MB/s 
import should take less than a minute.

Why sstableloader throughput is so low/slow?



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News]http://www.adform.com
[Adform awarded the Best Employer 
2012]http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: image001.pnginline: image002.pnginline: signature-logo56e1.pnginline: signature-best-employer-logo63d8.png

RE: sstableloader throughput

2013-04-10 Thread Viktor Jevdokimov
Rsync is not for our case.

Is sstableloader for 1.2.x faster?



From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Wednesday, April 10, 2013 15:52
To: user@cassandra.apache.org
Subject: Re: sstableloader throughput

Stables loader was slow in 1:0:x I had better luck with rsync. It was not fixed 
in the 1.0.x series.

On Wednesday, April 10, 2013, Viktor Jevdokimov viktor.jevdoki...@adform.com 
wrote:
 Found https://issues.apache.org/jira/browse/CASSANDRA-3668

 Weird.



 Hi,



 We're using Casandra 1.0.12 sstableloader to import data from dedicated 
 machine located in DC1 into the cluster of 32 nodes (RF=4), 16 nodes in DC1 
 and 16 nodes in DC2.



 To disable throttle for sstableloader we set in casasndra.yaml:

 stream_throughput_outbound_megabits_per_sec: 0



 Outgoing network throughput is about 1Gbit, file copy from dedicated 
 sstableloader machine throughput is 90MB/s into DC1 node and 50MB/s into 
 DC2 node.



 But with sstableloader outgoing network traffic is only 9MB/s, importing 1 
 sstable of 480MB into cluster (~60MB/node) takes 8-9 minutes. Even with 
 50MB/s import should take less than a minute.



 Why sstableloader throughput is so low/slow?


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.


RE: sstableloader throughput

2013-04-10 Thread Viktor Jevdokimov
Thanks, that could help to consider migration of the cluster to a newer 
version. We'll check the difference.


From: Edward Capriolo [mailto:edlinuxg...@gmail.com] 
Sent: Wednesday, April 10, 2013 18:03
To: user@cassandra.apache.org
Subject: Re: sstableloader throughput

Yes. I did confirm that the 1.1 sstable loader works much better then the 1.0 
version. The changes were not easy to backport to the 1.0.X branch so it did 
not happen. It is likely that 1.2 is even better :)

On Wed, Apr 10, 2013 at 10:38 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.com wrote:
Rsync is not for our case.

Is sstableloader for 1.2.x faster?



From: Edward Capriolo [mailto:edlinuxg...@gmail.com]
Sent: Wednesday, April 10, 2013 15:52
To: user@cassandra.apache.org
Subject: Re: sstableloader throughput

Stables loader was slow in 1:0:x I had better luck with rsync. It was not fixed 
in the 1.0.x series.

On Wednesday, April 10, 2013, Viktor Jevdokimov viktor.jevdoki...@adform.com 
wrote:
 Found https://issues.apache.org/jira/browse/CASSANDRA-3668

 Weird.



 Hi,



 We're using Casandra 1.0.12 sstableloader to import data from dedicated 
 machine located in DC1 into the cluster of 32 nodes (RF=4), 16 nodes in DC1 
 and 16 nodes in DC2.



 To disable throttle for sstableloader we set in casasndra.yaml:

 stream_throughput_outbound_megabits_per_sec: 0



 Outgoing network throughput is about 1Gbit, file copy from dedicated 
 sstableloader machine throughput is 90MB/s into DC1 node and 50MB/s into 
 DC2 node.



 But with sstableloader outgoing network traffic is only 9MB/s, importing 1 
 sstable of 480MB into cluster (~60MB/node) takes 8-9 minutes. Even with 
 50MB/s import should take less than a minute.



 Why sstableloader throughput is so low/slow?


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



RE: cannot start Cassandra on Windows7

2013-03-22 Thread Viktor Jevdokimov
You NEED to edit cassandra.yaml and log4j-server.properties paths before 
starting on Windows.

There're a LOT of things to learn for starters. Google for Cassandra on Windows.



Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: Marina [mailto:ppi...@yahoo.com]
 Sent: Friday, March 22, 2013 17:21
 To: user@cassandra.apache.org
 Subject: cannot start Cassandra on Windows7

 Hi,
 I have downloaded apache-cassandra-1.2.3-bin.tar.gz and un-zipped it on my
 Windows7 machine (I did not find a Windows-specific distributable...). Then, I
 tried to start Cassandra as following and got an error:

 C:\Marina\Tools\apache-cassandra-1.2.3\bincassandra.bat -f Starting
 Cassandra Server Exception in thread main
 java.lang.ExceptionInInitializerError
 Caused by: java.lang.RuntimeException: Couldn't figure out log4j
 configuration:
 log4j-server.properties
 at
 org.apache.cassandra.service.CassandraDaemon.initLog4j(CassandraDaemo
 n.java:81)
 at
 org.apache.cassandra.service.CassandraDaemon.clinit(CassandraDaemon
 .java:57)
 Could not find the main class:
 org.apache.cassandra.service.CassandraDaemon.  Pr ogram will exit.

 C:\Marina\Tools\apache-cassandra-1.2.3\bin

 It looks similar to the Cassandra issue that was already fixed:
 https://issues.apache.org/jira/browse/CASSANDRA-2383

 however I am still getting this error
 I am an Administrator on my machine, and have access to all files in the
 apache- cassandra-1.2.3\conf dir, including the log4j ones.

 Do I need to configure anything else on Winows ? I did not find any
 Windows- specific installation/setup/startup instructions - if there are such
 documents somewhere, please let me know!

 Thanks,
 Marina





RE: Using Cassandra for read operations

2013-02-21 Thread Viktor Jevdokimov
Bill de hÓra already answered, I'd like to add:

To achieve ~4ms reads (from client standpoint):
1. You can't use multi-slice, since different keys may occur on different nodes 
that require internode communication. Design you data and reads to use one 
key/row.
2. Use ConsistencyLevel.ONE to avoid waiting for other nodes.
3. Use smart client that selects endpoints by token (key) to put request to 
appropriate node, Astyanax (Java) or write such client yourself.
4. Turn off dynamic snitch. While coordinator node may read locally, dynamic 
snitch may redirect it to another replica.
5. Use SSD's to avoid re-cache issue when sstables are compacted.
6. If you do writes, the rest issue is GC. If you're not on Azul Zing JVM, 
which I can't confirm to be better than Oracle HotSpot or JRockit (both has GC 
issues), you can't tune JVM to avoid Young Gen GC pauses to be as low as you 
need. You will fight pause frequency VS time.
So if you can afford Zing, check also Aerospike (ex-CitrusLeaf) alternative to 
Cassandra, which is written in C and has no GC issues.


 From: Bill de hÓra [mailto:b...@dehora.net]
 Sent: Thursday, February 21, 2013 22:07
 To: user@cassandra.apache.org
 Subject: Re: Using Cassandra for read operations

 In a nutshell -

 - Start with defaults and tune based on small discrete adjustments and leave
 time to see the effect of each change. No-one will know your workload
 better than you and the questions you are asking are workload sensitive.

 - Allow time for tuning and spending time understanding the memory model
 and JVM GC.

 - Be very careful with caches. Leave enough room in the OS for its own disk
 cache.

 - Get an SSD


 Bill


 On 21 Feb 2013, at 19:03, amulya rattan talk2amu...@gmail.com wrote:

  Dear All,
 
  We are currently evaluating Cassandra for an application involving strict
 SLAs(Service level agreements). We just need one column family with a long
 key and approximately 70-80 bytes row. We are not concerned about write
 performance but are primarily concerned about read. For our SLAs, a read of
 max 15-20 rows at once(using multi slice), should not take more than 4 ms.
 Till now, on a single node setup, using cassandra' stress tool, the numbers 
 are
 promising. But I am guessing that's because there is no network latency
 involved there and since we set memtable around 2gb(4 gb heap), we never
 had to get to Disk I/O.
 
  Assuming our nodes having 32GB RAM, a couple of questions regarding
 read:
 
  * To avoid disk I/Os, the best option we thought is to have data in memory.
 Is it a good idea to have memtable setup around 1/2 or 3/4 of heap size?
 Obviously flushing will take a lot of time but would that hurt that node's
 performance big time?
 
  * Cassandra stress tool only gives out average read latency. Is there a way
 to figure out max read-latency for a bunch of read operations?
 
  * How big a row cache can one have? Given that cassandra provides off-
 heap row caching, in a machine 32 gb RAM, would it be wise to have a 10
 gb row cache with 8 gb java heap? And how big should the corresponding key
 cache be then?
 
  Any response is appreciated.
 
  ~Amulya
 


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.


RE: Testing compaction strategies on a single production server?

2013-02-19 Thread Viktor Jevdokimov
Just turn off dynamic snitch on survey node and make read requests from it 
directly with CL.ONE, watch histograms, compare.

Regarding switching compaction strategy there're a lot of info already.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Henrik Schröder [mailto:skro...@gmail.com]
Sent: Tuesday, February 19, 2013 15:57
To: user
Subject: Testing compaction strategies on a single production server?

Hey,

Version 1.1 of Cassandra introduced live traffic sampling, which allows you to 
measure the performance of a node without it really joining the cluster: 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling
That page mentions that you can change the compaction strategy through jmx if 
you want to test out a different strategy on your survey node.

That's great, but it doesn't give you a complete view of how your performance 
would change, since you're not doing reads from the survey node. But what would 
happen if you used jmx to change the compaction strategy of a column family on 
a single *production* node? Would that be a safe way to test it out or are 
there side-effects of doing that live?

And if you do that, would running a major compaction transform the entire 
column family to the new format?
Finally, if the test was a success, how do you proceed from there? Just change 
the schema?

/Henrik
inline: signature-logo18be.pnginline: signature-best-employer-logo6784.png

RE: Why do Datastax docs recommend Java 6?

2013-02-05 Thread Viktor Jevdokimov
I would prefer Oracle to own an Azul's Zing JVM over any other (GC) to provide 
it for free for anyone :)

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: jef...@gmail.com [mailto:jef...@gmail.com]
Sent: Wednesday, February 06, 2013 02:23
To: user@cassandra.apache.org
Subject: Re: Why do Datastax docs recommend Java 6?

Oracle now owns the sun hotspot team, which is inarguably the highest powered 
java vm team in the world. Its still really the epicenter of all java vm 
development.
Sent from my Verizon Wireless BlackBerry

From: Ilya Grebnov i...@metricshub.commailto:i...@metricshub.com
Date: Tue, 5 Feb 2013 14:09:33 -0800
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
ReplyTo: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: RE: Why do Datastax docs recommend Java 6?

Also, what is particular reason to use Oracle JDK over Open JDK? Sorry, I could 
not find this information online.

Thanks,
Ilya
From: Michael Kjellman [mailto:mkjell...@barracuda.com]
Sent: Tuesday, February 05, 2013 7:29 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Why do Datastax docs recommend Java 6?

There have been tons of threads/convos on this.

In the early days of Java 7 it was pretty unstable and there was pretty much no 
convincing reason to use Java 7 over Java 6.

Now that Java 7 has stabilized and Java 6 is EOL it's a reasonable decision to 
use Java 7 and we do it in production with no issues to speak of.

That being said there was one potential situation we've seen as a community 
where bootstrapping new node was using 3x more CPU and getting significantly 
less throughput. However, reproducing this consistently never happened AFAIK.

I think until more people use Java 7 in production and prove it doesn't cause 
any additional bugs/performance issues Datastax will update their docs. Until 
now I'd say it's a safe bet to use Java 7 with Vanilla C* 1.2.1. I hope this 
helps!

Best,
Michael

From: Baron Schwartz ba...@xaprb.commailto:ba...@xaprb.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, February 5, 2013 7:21 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Why do Datastax docs recommend Java 6?

The Datastax docs repeatedly say (e.g. 
http://www.datastax.com/docs/1.2/install/install_jre) that Java 7 is not 
recommended, but they don't say why. It would be helpful to know this. Does 
anyone know?

The same documentation is referenced from the Cassandra wiki, for example, 
http://wiki.apache.org/cassandra/GettingStarted

- Baron
inline: signature-logo18be.pnginline: signature-best-employer-logo6784.png

RE: data not shown up after some time

2013-01-28 Thread Viktor Jevdokimov
Are you sure your app is setting TTL correctly?
TTL is in seconds. For 90 days it have to be 90*24*60*60=7776000.
What If you set by accident 777600 (10 times less) - that will be 9 days, 
almost what you see.

Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Matthias Zeilinger [mailto:matthias.zeilin...@bwinparty.com]
Sent: Monday, January 28, 2013 15:57
To: user@cassandra.apache.org
Subject: data not shown up after some time

Hi,

I´m a simple operations guy and new to Cassandra.
I have the problem that one of our application is writing data into Cassandra 
(but not deleting them, because we should have a 90 days TTL).
The application operates in 1 KS with 5 CF. my current setup:

3 node cluster and KS has a RF of 3 (I know it´s not the best setup)

I can see now the problem that after 10 days most (nearly all) data are not 
showing anymore in the cli and also our application cannot see the data.
I assume that it has something to do with the gc_grace_seconds, it is set to 10 
days.

I have read many documentations about tombstones, but our application doesn´t 
perform deletes.
How can I see in the cli, if I row key has any tombstone or not.

Could it be that there are some ghost tombstones?

Thx for your help

Br,
Matthias Zeilinger
Production Operation - Shared Services

P: +43 (0) 50 858-31185
M: +43 (0) 664 85-34459
E: matthias.zeilin...@bwinparty.commailto:matthias.zeilin...@bwinparty.com

bwin.party services (Austria) GmbH
Marxergasse 1B
A-1030 Vienna

www.bwinparty.comhttp://www.bwinparty.com

inline: signature-logo4c86.pnginline: signature-best-employer-logo4d95.png

RE: JMX CF Beans

2013-01-25 Thread Viktor Jevdokimov
src/java/org/apache/цassandra/db/DataTracker.java:
public double getBloomFilterFalseRatio()
{
…
return (double) falseCount / (trueCount + falseCount);
…
}


ReadCount/WriteCount on CF is for this CF on this node only, so it’s 
local/internal only reads/writes for the node’s range.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Nicolas Lalevée [mailto:nicolas.lale...@hibnet.org]
Sent: Friday, January 25, 2013 16:08
To: user@cassandra.apache.org
Subject: JMX CF Beans

Just a quick question about the attributes exposed via JMX. I have some doc [1] 
but it doesn't help about CF beans.

The BloomFilterFalseRatio, is that the ratio of found vs missed, or the ratio 
of false positive vs the number of tests, or something else ?

The ReadCount and WriteCount, how do they count regarding the replication 
factor ? As far as I understand, the read and write on the StorageProxy is the 
actual number of requests coming from clients. So judging that the sum on all 
cf of the read and write is near equal to the replication factor multiply by 
the number of read and write on the StorageProxy, I am guessing that the read 
and write per cf are the replicas one. Am I right ?

Nicolas

[1] http://wiki.apache.org/cassandra/JmxInterface

inline: signature-logo914.pnginline: signature-best-employer-logo3b0d.png

RE: Concurrent write performance

2013-01-21 Thread Viktor Jevdokimov
Do you experience any performance problems?

This will be the last thing to look at.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Jay Svc [mailto:jaytechg...@gmail.com]
Sent: Monday, January 21, 2013 17:28
To: user@cassandra.apache.org
Subject: Concurrent write performance


Folks,



I would like to write(insert or update) to a single row in a column family. I 
have concurrent requests which will write to a single row. Do we see any 
performance implications because of concurrent writes to a single row where 
comparator has to sort the columns at the same time?



Please share your thoughts.



Thanks,

Jay
inline: signature-logo7f56.pnginline: signature-best-employer-logo42f0.png

RE: High Read and write through put

2013-01-21 Thread Viktor Jevdokimov
For such a generic question without technical details of requirements, the 
answer - use defaults.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Jay Svc [mailto:jaytechg...@gmail.com]
Sent: Monday, January 21, 2013 17:31
To: user@cassandra.apache.org
Subject: High Read and write through put


Folks,



For given situation I am expecting multiple read and write request to a same 
cluster. What are primary design or configuration consideration we should make?



Any thoughts or links to such documentation is appreciated.



Thanks,

Jay

inline: signature-logo7f53.pnginline: signature-best-employer-logo18f3.png

RE: LCS not removing rows with all TTL expired columns

2013-01-17 Thread Viktor Jevdokimov
@Bryan,

To keep data size as low as possible with TTL columns we still use STCS and 
nightly major compactions.

Experience with LCS was not successful in our case, data size keeps too high 
along with amount of compactions.

IMO, before 1.2, LCS was good for CFs without TTL or high delete rate. I have 
not tested 1.2 LCS behavior, we're still on 1.0.x


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, January 17, 2013 06:24
To: user@cassandra.apache.org
Subject: Re: LCS not removing rows with all TTL expired columns

Minor compaction (with Size Tiered) will only purge tombstones if all fragments 
of a row are contained in the SSTables being compacted. So if you have a long 
lived row, that is present in many size tiers, the columns will not be purged.

 (thus compacted compacted) 3 days after all columns for that row had expired
Tombstones have to get on disk, even if you set the gc_grace_seconds to 0. If 
not they do not get a chance to delete previous versions of the column which 
already exist on disk. So when the compaction ran your ExpiringColumn was 
turned into a DeletedColumn and placed on disk.

I would expect the next round of compaction to remove these columns.

There is a new feature in 1.2 that may help you here. It will do a special 
compaction of individual sstables when they have a certain proportion of dead 
columns https://issues.apache.org/jira/browse/CASSANDRA-3442

Also interested to know if LCS helps.

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 2:55 PM, Bryan Talbot 
btal...@aeriagames.commailto:btal...@aeriagames.com wrote:


According to the timestamps (see original post) the SSTable was written (thus 
compacted compacted) 3 days after all columns for that row had expired and 6 
days after the row was created; yet all columns are still showing up in the 
SSTable.  Note that the column shows now rows when a get for that key is run 
so that's working correctly, but the data is lugged around far longer than it 
should be -- maybe forever.


-Bryan

On Wed, Jan 16, 2013 at 5:44 PM, Andrey Ilinykh 
ailin...@gmail.commailto:ailin...@gmail.com wrote:
To get column removed you have to meet two requirements
1. column should be expired
2. after that CF gets compacted

I guess your expired columns are propagated to high tier CF, which gets 
compacted rarely.
So, you have to wait when high tier CF gets compacted.

Andrey


On Wed, Jan 16, 2013 at 11:39 AM, Bryan Talbot 
btal...@aeriagames.commailto:btal...@aeriagames.com wrote:
On cassandra 1.1.5 with a write heavy workload, we're having problems getting 
rows to be compacted away (removed) even though all columns have expired TTL.  
We've tried size tiered and now leveled and are seeing the same symptom: the 
data stays around essentially forever.

Currently we write all columns with a TTL of 72 hours (259200 seconds) and 
expect to add 10 GB of data to this CF per day per node.  Each node currently 
has 73 GB for the affected CF and shows no indications that old rows will be 
removed on their own.

Why aren't rows being removed?  Below is some data from a sample row which 
should have been removed several days ago but is still around even though it 
has been involved in numerous compactions since being expired.

$ ./bin/nodetool -h localhost getsstables metrics request_summary 
459fb460-5ace-11e2-9b92-11d67b6163b4
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$ ls -alF 
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
-rw-rw-r-- 1 sandra sandra 5252320 Jan 16 08:42 
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db

$ ./bin/sstable2json 
/virtual/cassandra/data/data/metrics/request_summary/metrics-request_summary-he-386179-Data.db
 -k $(echo -n 459fb460-5ace-11e2-9b92

RE: Partition maintenance

2012-12-18 Thread Viktor Jevdokimov
No way to read the taped data with TTL later - will disappear from tapes :)


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com
[Adform awarded the Best Employer 2012] 
http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Keith Wright [mailto:kwri...@nanigans.com]
Sent: Tuesday, December 18, 2012 18:33
To: user@cassandra.apache.org
Subject: Re: Partition maintenance

My understanding was that TTLs only apply to columns and not on a per row 
basis.  This means that for each column insert you would need to set that TTL.  
Does this mean that the amount of data space used in such a case would be the 
TTL * the number of columns?  I was hoping there was a way to set a row TTL.  
See older post:  http://comments.gmane.org/gmane.comp.db.cassandra.user/12701

From: Christopher Keller cnkel...@gmail.commailto:cnkel...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Tuesday, December 18, 2012 11:16 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Partition maintenance

If I'm understanding you correctly, you can write TTL's on each insert.

18 months would be roughly 540 days which would be 46656000 seconds. I've not 
tried that number, but I use smaller TTL's all the time and they work fine. 
Once they are expired they get tombstones and are no longer searchable. Space 
is reclaimed as with any tombstone.

--Chris


On Dec 18, 2012, at 11:08 AM, 
stephen.m.thomp...@wellsfargo.commailto:stephen.m.thomp...@wellsfargo.com 
wrote:


Hi folks.  Still working through the details of building out a Cassandra 
solution and I have an interesting requirement that I'm not sure how to 
implement in Cassandra:

In our current Oracle world, we have the data for this system partitioned by 
month, and each month the data that are now 18-months old are archived to 
tape/cold storage and then the partition for that month is dropped.  Is there a 
way to do something similar with Cassandra without destroying our overall 
performance?

Thanks in advance,
Steve

--
The downside of being better than everyone else is that people tend to assume 
you're pretentious.

inline: signature-logo29.pnginline: signature-best-employer-logo4823.png

RE: cassandra vs couchbase benchmark

2012-12-12 Thread Viktor Jevdokimov
Pure marketing comparing apples to oranges.

Was Cassandra usage optimized?
- What consistency level was used? (fastest reads with ONE)
- Does Cassandra client used was token aware? (make request to appropriate node)
- Was dynamic snitch turned off? (prevent forward request to other replica if 
can be processed locally)
- Does Cassandra data model was used to mimic Couchbase data model? (Couchbase 
has only 1 value for 1 row)
- What caching was used on Cassandra? (Couchbase uses memcache built-in)

For our use case we've seen much better results upon testing from single node.
Throughput grows almost linearly adding nodes and growing amount of data to the 
same level as for single node.
Single node stats:
- A column family with 30 GiB compressed data per node (100 GiB uncompressed)
- 1 row with 10-30 columns weighted 0.5-2 KiB uncompressed
- 1 node with 6 cores 24 GiB RAM, 8 GiB heap, 1600 MiB new heap
- key cache only, 2M keys
- random reads 70%, random writes 30%
- Read latencies 10ms AVE100 with CPU 95%, 50k reads/s
- Read latencies 5ms AVE100 with CPU 60%, 20k reads/s

In reality, with many column families with different amount of data, read/write 
rates, performance results may significantly vary.
Just need to know what and how to optimize for Cassandra to get best results.


Couchbase is not for our use case because of its data model (requires reads for 
updates/inserts), so we can't compare it to Cassandra.



Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: Radim Kolar [mailto:h...@filez.com]
 Sent: Tuesday, December 11, 2012 17:42
 To: user@cassandra.apache.org
 Subject: cassandra vs couchbase benchmark

 http://www.slideshare.net/Couchbase/benchmarking-couchbase#btnNext


RE: Cassandra nodes failing with OOM

2012-11-19 Thread Viktor Jevdokimov
We've seen OOM in a situation, when OS was not properly prepared in production.

http://www.datastax.com/docs/1.1/install/recommended_settings



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: some.unique.lo...@gmail.com [mailto:some.unique.lo...@gmail.com] On 
Behalf Of Ивaн Cобoлeв
Sent: Saturday, November 17, 2012 08:08
To: user@cassandra.apache.org
Subject: Cassandra nodes failing with OOM

Dear Community,

advice from you needed.

We have a cluster, 1/6 nodes of which died for various reasons(3 had OOM 
message).
Nodes died in groups of 3, 1, 2. No adjacent died, though we use SimpleSnitch.

Version: 1.1.6
Hardware:  12Gb RAM / 8 cores(virtual)
Data:  40Gb/node
Nodes:   36 nodes

Keyspaces:2(RF=3, R=W=2) + 1(OpsCenter)
CFs:36, 2 indexes
Partitioner:  Random
Compaction:   Leveled(we don't want 2x space for housekeeping)
Caching:  Keys only

All is pretty much standard apart from the one CF receiving writes in 64K 
chunks and having sstable_size_in_mb=100.
No JNA installed - this is to be fixed soon.

Checking sysstat/sar I can see 80-90% CPU idle, no anomalies in io and the only 
change - network activity spiking.
All the nodes before dying had the following on logs:
 INFO [ScheduledTasks:1] 2012-11-15 21:35:05,512 StatusLogger.java (line 72) 
 MemtablePostFlusher   1 4 0
 INFO [ScheduledTasks:1] 2012-11-15 21:35:13,540 StatusLogger.java (line 72) 
 FlushWriter   1 3 0
 INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 72) 
 HintedHandoff 1 6 0
 INFO [ScheduledTasks:1] 2012-11-15 21:36:32,162 StatusLogger.java (line 77) 
 CompactionManager 5 9

GCInspector warnings were there too, they went from ~0.8 to 3Gb heap in 
5-10mins.

So, could you please give me a hint on:
1. How much GCInspector warnings per hour are considered 'normal'?
2. What should be the next thing to check?
3. What are the possible failure reasons and how to prevent those?

Thank you very much in advance,
Ivan
inline: signature-logo49d2.png

RE: what is more important (RAM vs Cores)

2012-10-12 Thread Viktor Jevdokimov
IMO, in most cases you'll be limited by the RAM first.
Take into account size of sstables, you will need to keep bloom filters and 
indexes in RAM and if it will not fit, 4 cores, or 24 cores doesn't matter, 
except you're on SSD.

You need to design first, stress test second, conclude last.


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: Hagos, A.S. [mailto:a.s.ha...@tue.nl]
 Sent: Friday, October 12, 2012 12:17
 To: user@cassandra.apache.org
 Subject: RE: what is more important (RAM vs Cores)

 Hi there,
 My application is uses Cassandra to store abstracted sensor data from a
 sensor network in large building (up to 3000 sensors).
 For now I am starting one node in one floor of the building, for the future it
 will definitely be a cluster. Some of the sensors have up 16HZ sampling rate.

 And now I want to make a decision if I have to focus on big RAM or large
 number of cores.
 greetings
 Ambes
 
 From: Romain HARDOUIN [romain.hardo...@urssaf.fr]
 Sent: Friday, October 12, 2012 10:57 AM
 To: user@cassandra.apache.org
 Subject: Re: what is more important (RAM vs Cores)

 Hi,

 Sure it depends... but IMHO 6 GB is suboptimal for big data because it means
 1,5 GB or  2 GB for Cassandra.
 Maybe you could elaborate your use case. You really want a one node cluster
 ?

 cheers,
 Romain

 wang liang wla...@gmail.com a écrit sur 12/10/2012 10:36:15 :

  Hi, Hagos,
 
  I think it depends on your business case. Big RAM reduce latency and
  improve responsibility, High number of cores increase concurrency of
  your app. thanks.

  On Fri, Oct 12, 2012 at 4:23 PM, Hagos, A.S. a.s.ha...@tue.nl wrote:
  Hi All,
  For of my projects I want to buy a machine to host Casssandra database.
  The options I am offered are machines with 16GB RAM with Quad-Core
  processor and 6GB RAM with Hexa-Core processor.
  Which one do you recommend, big RAM or  high number of cores?
 
  greetings
  Ambes
 

 
  --
  Best wishes,
  Helping others is to help myself.



RE: Super columns and arrays

2012-10-12 Thread Viktor Jevdokimov
struct SuperColumn {
   1: required binary name,
   2: required listColumn columns,
}


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: Thierry Templier [mailto:thierry.templ...@restlet.com]
 Sent: Friday, October 12, 2012 13:44
 To: user@cassandra.apache.org
 Subject: Super columns and arrays

 Hello,

 I wonder if it's possible to specify an array of values as a value of a super
 column... If it's not possible, is there another way to do that?
 Thanks very much for your help.

 Thierry


RE: unbalanced ring

2012-10-11 Thread Viktor Jevdokimov
To run, or not to run? All this depends on use case. There're no problems 
running major compactions (we do it nightly) in one case, there could be 
problems in another. Just need to understand, how everything works.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Mobile: +370 650 19588, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com

Visit us at IAB RTB workshop
October 11, 4 pm in Sala Rossa
[iab forum] http://www.iabforum.it/iab-forum-milano-2012/agenda/11-ottobre/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Sent: Thursday, October 11, 2012 09:17
To: user@cassandra.apache.org
Subject: Re: unbalanced ring

Tamar be carefull. Datastax doesn't recommand major compactions in production 
environnement.

If I got it right, performing major compaction will convert all your SSTables 
into a big one, improving substantially your reads performence, at least for a 
while... The problem is that will disable minor compactions too (because of the 
difference of size between this SSTable and the new ones, if I remeber well). 
So your reads performance will decrease until your others SSTable reach the 
size of this big one you've created or until you run an other major compaction, 
transforming them into a maintenance normal process like repair is.

But, knowing that, I still don't know if we both (Tamar and I) shouldn't run it 
anyway (In my case it will greatly decrease the size of my data  133 GB - 35GB 
and maybe load the cluster evenly...)

Alain

2012/10/10 B. Todd Burruss bto...@gmail.commailto:bto...@gmail.com
it should not have any other impact except increased usage of system resources.

and i suppose, cleanup would not have an affect (over normal compaction) if all 
nodes contain the same data

On Wed, Oct 10, 2012 at 12:12 PM, Tamar Fraenkel 
ta...@tok-media.commailto:ta...@tok-media.com wrote:
Hi!
Apart from being heavy load (the compact), will it have other effects?
Also, will cleanup help if I have replication factor = number of nodes?
Thanks

Tamar Fraenkel
Senior Software Engineer, TOK Media
[Inline image 1]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Wed, Oct 10, 2012 at 6:12 PM, B. Todd Burruss 
bto...@gmail.commailto:bto...@gmail.com wrote:
major compaction in production is fine, however it is a heavy operation on the 
node and will take I/O and some CPU.

the only time i have seen this happen is when i have changed the tokens in the 
ring, like nodetool movetoken.  cassandra does not auto-delete data that it 
doesn't use anymore just in case you want to move the tokens again or otherwise 
undo.

try nodetool cleanup

On Wed, Oct 10, 2012 at 2:01 AM, Alain RODRIGUEZ 
arodr...@gmail.commailto:arodr...@gmail.com wrote:
Hi,

Same thing here:

2 nodes, RF = 2. RCL = 1, WCL = 1.
Like Tamar I never ran a major compaction and repair once a week each node.

10.59.21.241eu-west 1b  Up Normal  133.02 GB   50.00%   
   0
10.58.83.109eu-west 1b  Up Normal  98.12 GB50.00%   
   85070591730234615865843651857942052864

What phenomena could explain the result above ?

By the way, I have copy the data and import it in a one node dev cluster. There 
I have run a major compaction and the size of my data has been significantly 
reduced (to about 32 GB instead of 133 GB).

How is that possible ?
Do you think that if I run major compaction in both nodes it will balance the 
load evenly ?
Should I run major compaction in production ?

2012/10/10 Tamar Fraenkel ta...@tok-media.commailto:ta...@tok-media.com
Hi!
I am re-posting this, now that I have more data and still unbalanced ring:

3 nodes,
RF=3, RCL=WCL=QUORUM


Address DC  RackStatus State   LoadOwns
Token
   
113427455640312821154458202477256070485
x.x.x.xus-east 1c  Up Normal  24.02 GB33.33%  0
y.y.y.y us-east 1c  Up Normal  33.45 GB33.33%  
56713727820156410577229101238628035242
z.z.z.zus-east 1c  Up Normal  29.85 GB33.33

RE: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Viktor Jevdokimov
9160 is a client port. Nodes are using messaging service on storage_port (7000) 
for intra-node communication.


Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: Niteesh kumar [mailto:nitees...@directi.com]
 Sent: Tuesday, October 02, 2012 12:32
 To: user@cassandra.apache.org
 Subject: Persistent connection among nodes to communicate and redirect
 request

 while looking at netstat table i observed that my cluster nodes not using
 persistent connection  to talk among themselves on port 9160 to redirect
 request. I also observed that local write latency is around
 30-40 microsecond, while its takes around .5 miliseconds if the chosen node
 is not the node responsible for the key for 50K QPS. I think this attributes 
 to
 connection making time among servers as my servers are on same rack.

 how can i configure my servers to use persistent connection on port 9160
 thus exclude connection making time for each request that is redirected...


RE: Persistent connection among nodes to communicate and redirect request

2012-10-02 Thread Viktor Jevdokimov
Never seen connections between nodes on 9160 port, 7000 only.

From the source code, for example, thrift request goes to rpc port 9160 
(org.apache.cassandra.thrift.CassandraDaemon, 
org.apache.cassandra.thrift.CassandraServer), then to StorageProxy 
(org.apache.cassandra.service.StorageProxy), which forward request (if needed) 
to other endpoints via MessagingService 
(org.apache.cassandra.net.MessagingService), which uses storage_port from 
yaml, not a thrift port (rpc_port in yaml). What else could be wrong? Wiki or 
source code?




Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: rohit bhatia [mailto:rohit2...@gmail.com]
 Sent: Tuesday, October 02, 2012 14:35
 To: user@cassandra.apache.org
 Subject: Re: Persistent connection among nodes to communicate and
 redirect request

 i guess 7000 is only for gossip protocol. Cassandra still uses 9160 for RPC 
 even
 among nodes
 Also, I see Connections over port 9160 among various cassandra Nodes in my
 cluster.
 Please correct me if i am wrong..

 PS: mentioned Here http://wiki.apache.org/cassandra/CloudConfig

 On Tue, Oct 2, 2012 at 4:56 PM, Viktor Jevdokimov
 viktor.jevdoki...@adform.com wrote:
  9160 is a client port. Nodes are using messaging service on storage_port
 (7000) for intra-node communication.
 
 
  Best regards / Pagarbiai
 
  Viktor Jevdokimov
  Senior Developer
 
  Email: viktor.jevdoki...@adform.com
  Phone: +370 5 212 3063
  Fax: +370 5 261 0453
 
  J. Jasinskio 16C,
  LT-01112 Vilnius,
  Lithuania
 
 
 
  Disclaimer: The information contained in this message and attachments
  is intended solely for the attention and use of the named addressee
  and may be confidential. If you are not the intended recipient, you
  are reminded that the information remains the property of the sender.
  You must not use, disclose, distribute, copy, print or rely on this
  e-mail. If you have received this message in error, please contact the
  sender immediately and irrevocably delete this message and any
  copies. -Original Message-
  From: Niteesh kumar [mailto:nitees...@directi.com]
  Sent: Tuesday, October 02, 2012 12:32
  To: user@cassandra.apache.org
  Subject: Persistent connection among nodes to communicate and
  redirect request
 
  while looking at netstat table i observed that my cluster nodes not
  using persistent connection  to talk among themselves on port 9160 to
  redirect request. I also observed that local write latency is around
  30-40 microsecond, while its takes around .5 miliseconds if the
  chosen node is not the node responsible for the key for 50K QPS. I
  think this attributes to connection making time among servers as my
 servers are on same rack.
 
  how can i configure my servers to use persistent connection on port
  9160 thus exclude connection making time for each request that is
 redirected...



RE: High commit size

2012-09-10 Thread Viktor Jevdokimov
Do not use mmap/auto on Windows, standard access mode only. In cassandra.yaml:
disk_access_mode: standard


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com

Visit us at Dmexco: Hall: 7, Aisle: A, Stand: 030
September 12-13 Cologne, Germany
[dmexco] http://www.dmexco.de/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Rene Kochen [mailto:rene.koc...@emea.schange.com]
Sent: Monday, September 10, 2012 14:47
To: user@cassandra.apache.org
Subject: Re: High commit size

The problem is that the system just freezes and nodes are dying. The system 
becomes very unresponsive and it always happens when the shareable amount of 
RAM reaches the total number of bytes in the system.

Is there something in Windows that I can tune in order to avoid this behavior? 
I cannot easily migrate to Linux right now.

Thanks,

Rene

2012/9/10 Oleg Dulin oleg.du...@gmail.commailto:oleg.du...@gmail.com
It is memory-mapped I/O. I wouldn't worry about it.

BTW, Windows might not be the best choice to run Cassandra on. My experience 
running Cassandra on Windows has not been positive one. We no longer support 
Windows as our production platform.

Regards,
Oleg


On 2012-09-10 09:00:02 +, Rene Kochen said:
Hi all,

On my test cluster I have three Windows Server 2008 R2 machines running 
Cassandra 1.0.11

If i use memory mapped IO (the default), then the nodes freeze after a while. 
Paging is disabled.

The private bytes are OK (8GB). That is the amount I use in the -Xms and -Xmx 
arguments. The virtual size is big as expected because of the memory mapped IO. 
However, the working set size (size in RAM) is 24 GB (my total RAM usage). If I 
look with Process Explorer to the physical memory section I see a very high 
value in the WS Sharable section.

Anyone has a clue what is going om here?

Many thanks!

Rene
image


--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/


inline: signature-logoc64.pnginline: dmexco-logofa6.png

RE: High commit size

2012-09-10 Thread Viktor Jevdokimov
We used Cassandra on Windows for more than a year in production for RTB and 
other staff, that requires lowest possible latency. We used mmap before issues 
like yours, switched to mmap index only and finally to standard. No big 
difference in performance, standard was most stable. The huge difference is to 
run C* on Linux instead of Windows. Migration was pretty easy.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com

Visit us at Dmexco: Hall: 7, Aisle: A, Stand: 030
September 12-13 Cologne, Germany
[dmexco] http://www.dmexco.de/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Rene Kochen [mailto:rene.koc...@emea.schange.com]
Sent: Monday, September 10, 2012 17:03
To: user@cassandra.apache.org
Subject: Re: High commit size

For performance reasons I switched to memory mapped IO. Is there a way to make 
memory-mapped IO work in Windows?

Thanks!

2012/9/10 Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Do not use mmap/auto on Windows, standard access mode only. In cassandra.yaml:
disk_access_mode: standard


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063tel:%2B370%205%20212%203063, Fax +370 5 261 
0453tel:%2B370%205%20261%200453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News]http://www.adform.com

Visit us at Dmexco: Hall: 7, Aisle: A, Stand: 030
September 12-13 Cologne, Germany
[dmexco]http://www.dmexco.de/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Rene Kochen 
[mailto:rene.koc...@emea.schange.commailto:rene.koc...@emea.schange.com]
Sent: Monday, September 10, 2012 14:47
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: High commit size

The problem is that the system just freezes and nodes are dying. The system 
becomes very unresponsive and it always happens when the shareable amount of 
RAM reaches the total number of bytes in the system.

Is there something in Windows that I can tune in order to avoid this behavior? 
I cannot easily migrate to Linux right now.

Thanks,

Rene
2012/9/10 Oleg Dulin oleg.du...@gmail.commailto:oleg.du...@gmail.com
It is memory-mapped I/O. I wouldn't worry about it.

BTW, Windows might not be the best choice to run Cassandra on. My experience 
running Cassandra on Windows has not been positive one. We no longer support 
Windows as our production platform.

Regards,
Oleg


On 2012-09-10 09:00:02 +, Rene Kochen said:
Hi all,

On my test cluster I have three Windows Server 2008 R2 machines running 
Cassandra 1.0.11

If i use memory mapped IO (the default), then the nodes freeze after a while. 
Paging is disabled.

The private bytes are OK (8GB). That is the amount I use in the -Xms and -Xmx 
arguments. The virtual size is big as expected because of the memory mapped IO. 
However, the working set size (size in RAM) is 24 GB (my total RAM usage). If I 
look with Process Explorer to the physical memory section I see a very high 
value in the WS Sharable section.

Anyone has a clue what is going om here?

Many thanks!

Rene
image


--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/


inline: image001.pnginline: image002.pnginline: signature-logo5395.pnginline: dmexco-logo634e.png

RE: Read latency skyrockets after 95-98th percentile

2012-07-31 Thread Viktor Jevdokimov
What is a data load? Does it fits in RAM?

I bet it's due to GC. Since this is a 1 node only, dynamic snitch is out of 
scope.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Andras Szerdahelyi [mailto:andras.szerdahe...@ignitionone.com]
Sent: Tuesday, July 31, 2012 15:53
To: user@cassandra.apache.org
Subject: Read latency skyrockets after 95-98th percentile

hey list,

i've been trying to understand why we are seeing rather nasty read latency 
peaks in as much as 2% of our total read ( not sure what the underlying thrift 
call is, should be get_slice ) requests that we have been **unable to tie to 
garbage collection or blocking I/O.**
This is what i mean by nasty :
Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.02424999, 
95th: 2.11034977, 98th: 5.21026034, 99th: 13.248351, 
999th: 41.8945400041

This is measured around the Astyanax / Hector wrappers for running column slice 
queries and iterating over their results. FWIW they perform very similarly.

Now, the mean is awesome - 1.2ms for a read is really really fast, i think 
thrift overhead makes up for probably 1ms alone. What i can't live with is 
pretty much everything thats beyond the 98th percentile. Perhaps my 
expectations are misaligned to begin with and this is great, but the dataset 
under test that yielded this result ( see below ) is extremely small - only a 
1000 items per test made up of a few small dynamic composite columns  and the 
read is being done sequentially, from a single threaded client. I just don't 
understand whats causing the peaks.

Here's the complete run-down on the setup:
The data is dynamic composite columns with string or integer values. The 
DynamicComposites themselves are made up of 3 Components each: 
String,String,Integer and String,Integer,Integer

The query is getting all columns ( in this particular test, that will be a 
single column ) of  one of the composite types ( String, String, Integer or 
String, Integer, Integer ) for a particular string key.
( How this is achieved is more or less at https://gist.github.com/3207894 )


The production cluster is set up as follows:
30-40 read/sec clients from multiple hosts, NetworkTopologyStrategy, 2 replicas 
per DC, read consistency: ONE, 2DCs, 2 nodes per DC )
8GB multi-core physical hosts
most settings on default in cassandra.yaml


In an attempt to isolate the problem, i've done the following ( this is where 
the results above are coming from but we see roughly the same numbers in prod )

- created a short unit test around the ( two ) classes that wrap the Astyanax 
and Hector libraries in our application, in which i'm only testing the read 
performance
- in setUp() i start Cassandra with

CassandraStoreKeeperTest.cassandra = new CassandraDaemon();
CassandraStoreKeeperTest.cassandra.init(null);
CassandraStoreKeeperTest.cassandra.start();

- same cassandra yaml as in prod, file at https://gist.github.com/3207894 ( 
mostly defaults )
- i pass the following flags to the JVM running my unit test:  -server -Xmx2G 
and all the GC flags from cassandra-env.sh, plus the jamm java agent
- i warm up the embedded cassandra with 1k reads and 1k writes
- the test is 1000 single threaded, sequential reads via Hector and Astyanax ( 
no significant differences between the two ) via thrift to the embedded 
single-node Cassandra instance
- i read back the 1000 items saved in the warm-up

The results are
Mean: 1.186341, Max: 41.912 stDev: 2.2490502737495333, 75th:1.02424999, 
95th: 2.11034977, 98th: 5.21026034, 99th: 13.248351, 
999th: 41.8945400041

here's the GC output during the read

[GC [ParNew
Desired survivor size 1081344 bytes, new threshold 1 (max 1)
- age   1: 281480 bytes, 281480 total
: 17322K-308K(19136K), 0.0037375 secs] 28742K-11885K(83008K), 0.0038170 secs] 
[Times: user=0.01 sys=0.00, real=0.01 secs]
[GC [ParNew
Desired survivor size 1081344 bytes, new threshold 1 (max 1)
- age   1: 261360 bytes, 261360 total
: 17332K-386K(19136K), 0.0034069 secs] 28909K-12098K(83008K), 0.0034849 secs] 
[Times: user=0.00 sys=0.01, real=0.00

RE: how to disable compression ?

2012-07-20 Thread Viktor Jevdokimov
First you update schema for CF, then you run nodetool upgradesstables on each 
node:

nodetool -h [HOST] -p [JMXPORT] upgradesstables [keyspace] [cfnames]

For me sometimes it works only after node restart (upgrade leaves previous 
format, compressed or uncompressed).





Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Илья Шипицин [mailto:chipits...@gmail.com]
Sent: Friday, July 20, 2012 10:16
To: user@cassandra.apache.org
Subject: how to disable compression ?

Hello!

how can I run update command on column family to disable compression (without 
re-creating CF) ?

Cheers,
Ilya Shipitsin
inline: signature-logo29.png

RE: how to disable compression ?

2012-07-20 Thread Viktor Jevdokimov
From cassandra-cli help:

To disable compression just set compression_options to null like this
compression_options = null

so

[default@XXXKeyspace] update column family YYY with compression_options = null;




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Илья Шипицин [mailto:chipits...@gmail.com]
Sent: Friday, July 20, 2012 10:37
To: user@cassandra.apache.org
Subject: Re: how to disable compression ?

[default@XXXKeyspace] update column family YYY with compression_options =[{}];
Command not found: `update column family YYY with compression_options =[{}];`. 
Type 'help;' or '?' for help.
[default@XXXKeyspace]
2012/7/20 Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
First you update schema for CF, then you run nodetool upgradesstables on each 
node:

nodetool -h [HOST] -p [JMXPORT] upgradesstables [keyspace] [cfnames]

For me sometimes it works only after node restart (upgrade leaves previous 
format, compressed or uncompressed).




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#%21/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News]http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Илья Шипицин [mailto:chipits...@gmail.commailto:chipits...@gmail.com]
Sent: Friday, July 20, 2012 10:16
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: how to disable compression ?

Hello!

how can I run update command on column family to disable compression (without 
re-creating CF) ?

Cheers,
Ilya Shipitsin

inline: image001.pnginline: signature-logo29.png

RE: Tools for analize cassandra gc.log

2012-07-04 Thread Viktor Jevdokimov
All of written below is my personal experience, as goals I wanted to achieve.

We use these parameters in PROD for GC logs:
JVM_OPTS=$JVM_OPTS -XX:+PrintGCTimeStamps
JVM_OPTS=$JVM_OPTS -XX:+PrintGCDetails
JVM_OPTS=$JVM_OPTS -Xloggc:/var/log/cassandra/gc-`date +%s`.log

Log rotation works from Oracle Java 1.7.0_02:
JVM_OPTS=$JVM_OPTS -XX:+UseGCLogFileRotation
JVM_OPTS=$JVM_OPTS -XX:NumberOfGCLogFiles=10
JVM_OPTS=$JVM_OPTS -XX:GCLogFileSize=10M

Adding other parameters may generate log not readable by many/any analyzer.
Adding JVM_OPTS=$JVM_OPTS -XX:+PrintHeapAtGC will not break analizers below, 
but useful only to review manually.


1. GCHisto  http://java.net/projects/gchisto/ - to load different logs and 
compare some numbers (pauses and number of GC's). Very useful for comparison.
2. GCViewer http://www.tagtraum.com/gcviewer.html - shows memory consumption 
and pauses stats and graphs.

3. PrintGCStats http://java.net/projects/printgcstats/ - text only, Linux only.
4. Garbagecat - a garbage, very little info, text only, while GCHisto  
GCViewer much more info and graphical.

Considering what to play with, understand Java GC first:
http://www.scribd.com/doc/37127094/GCTuningPresentationFISL10
http://developers.sun.com/mobility/midp/articles/garbagecollection2/
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html

Most objects taking time in GC pause: Memtables, Caches; other objects dying 
fast - cleaned, instead of copied/promoted. So play with memtable size also, 
cache invalidation.

GC frequency will grow when compaction is running, but not pauses (very short 
living objects).

Testing a lot of scenarios we found standard Cassandra GC settings are fine for 
most cases, you may need to play with max heap and young heap size. Other than 
that you may play with:
-XX:+UseNUMA
-XX:SurvivorRatio=8 (4-16)

Not worth to play with:
-XX:MaxTenuringThreshold=1 (0 or more than 1 is not a good thing)

So mostly changing standard settings is a subject of achievement. Tuning GC is 
a last resort to gain something.





Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: ruslan usifov [mailto:ruslan.usi...@gmail.com]
 Sent: Wednesday, July 04, 2012 14:08
 To: user@cassandra.apache.org
 Subject: Tools for analize cassandra gc.log

 Hello

 Please dear community share some tools for gc.log analize. I found
 http://code.google.com/a/eclipselabs.org/p/garbagecat/, but it doesn't
 work


RE: upgrade issue

2012-06-29 Thread Viktor Jevdokimov
Replace tabs with spaces in Cassandra.yaml



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Adeel Akbar [mailto:adeel.ak...@panasiangroup.com]
Sent: Friday, June 29, 2012 12:53
To: user@cassandra.apache.org
Subject: upgrade issue

Hi, I have upgraded cassndra from 0.8.6 to 1.0.10 and found following errors 
once i started service;

INFO 05:11:50,948 Logging initialized
INFO 05:11:50,953 JVM vendor/version: OpenJDK 64-Bit Server VM/1.6.0_24
INFO 05:11:50,954 Heap size: 511705088/511705088
INFO 05:11:50,955 Classpath: 
/opt/apache-cassandra-1.0.10/bin/../conf:/opt/apache-cassandra-1.0.10/bin/../build/classes/main:/opt/apache-cassandra-1.0.10/bin/../build/classes/thrift:/opt/apache-cassandra-1.0.10/bin/../lib/antlr-3.2.jar:/opt/apache-cassandra-1.0.10/bin/../lib/apache-cassandra-1.0.10.jar:/opt/apache-cassandra-1.0.10/bin/../lib/apache-cassandra-clientutil-1.0.10.jar:/opt/apache-cassandra-1.0.10/bin/../lib/apache-cassandra-thrift-1.0.10.jar:/opt/apache-cassandra-1.0.10/bin/../lib/avro-1.4.0-fixes.jar:/opt/apache-cassandra-1.0.10/bin/../lib/avro-1.4.0-sources-fixes.jar:/opt/apache-cassandra-1.0.10/bin/../lib/commons-cli-1.1.jar:/opt/apache-cassandra-1.0.10/bin/../lib/commons-codec-1.2.jar:/opt/apache-cassandra-1.0.10/bin/../lib/commons-lang-2.4.jar:/opt/apache-cassandra-1.0.10/bin/../lib/compress-lzf-0.8.4.jar:/opt/apache-cassandra-1.0.10/bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:/opt/apache-cassandra-1.0.10/bin/../lib/guava-r08.jar:/opt/apache-cassandra-1.0.10/bin/../lib/high-scale-lib-1.1.2.j
 
ar:/opt/apache-cassandra-1.0.10/bin/../lib/jackson-core-asl-1.4.0.jar:/opt/apache-cassandra-1.0.10/bin/../lib/jackson-mapper-asl-1.4.0.jar:/opt/apache-cassandra-1.0.10/bin/../lib/jamm-0.2.5.jar:/opt/apache-cassandra-1.0.10/bin/../lib/jline-0.9.94.jar:/opt/apache-cassandra-1.0.10/bin/../lib/json-simple-1.1.jar:/opt/apache-cassandra-1.0.10/bin/../lib/libthrift-0.6.jar:/opt/apache-cassandra-1.0.10/bin/../lib/log4j-1.2.16.jar:/opt/apache-cassandra-1.0.10/bin/../lib/servlet-api-2.5-20081211.jar:/opt/apache-cassandra-1.0.10/bin/../lib/slf4j-api-1.6.1.jar:/opt/apache-cassandra-1.0.10/bin/../lib/slf4j-log4j12-1.6.1.jar:/opt/apache-cassandra-1.0.10/bin/../lib/snakeyaml-1.6.jar:/opt/apache-cassandra-1.0.10/bin/../lib/snappy-java-1.0.4.1.jar
INFO 05:11:50,957 JNA not found. Native methods will be disabled.
INFO 05:11:50,966 Loading settings from 
file:/opt/apache-cassandra-1.0.10/conf/cassandra.yamlfile:///\\opt\apache-cassandra-1.0.10\conf\cassandra.yaml
ERROR 05:11:51,057 Fatal configuration error error
while scanning for the next token
found character '\t' that cannot start any token
 in reader, line 102, column 1:
  - seeds: 172.16.100.244,172. ...
^

at 
org.yaml.snakeyaml.scanner.ScannerImpl.fetchMoreTokens(ScannerImpl.java:357)
at 
org.yaml.snakeyaml.scanner.ScannerImpl.checkToken(ScannerImpl.java:180)
at 
org.yaml.snakeyaml.parser.ParserImpl$ParseBlockMappingValue.produce(ParserImpl.java:591)
at org.yaml.snakeyaml.parser.ParserImpl.peekEvent(ParserImpl.java:162)
at org.yaml.snakeyaml.parser.ParserImpl.checkEvent(ParserImpl.java:147)
at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:131)
at 
org.yaml.snakeyaml.composer.Composer.composeMappingNode(Composer.java:229)
at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:159)
at 
org.yaml.snakeyaml.composer.Composer.composeSequenceNode(Composer.java:203)
at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:157)
at 
org.yaml.snakeyaml.composer.Composer.composeMappingNode(Composer.java:229)
at org.yaml.snakeyaml.composer.Composer.composeNode(Composer.java:159)
at 
org.yaml.snakeyaml.composer.Composer.composeDocument(Composer.java:121)
at org.yaml.snakeyaml.composer.Composer.getSingleNode(Composer.java:104)
at 
org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:117)
at org.yaml.snakeyaml.Loader.load(Loader.java:52)
at org.yaml.snakeyaml.Yaml.load(Yaml.java:166

RE: cassandra 1.0.x and java 1.7

2012-06-18 Thread Viktor Jevdokimov
We use 7u3 in production long enough with no problems.

7u4 requires larger minimum stack size, 160KB vs 128KB, but 160KB still not 
enough for Cassandra, with 192KB better, but needs more testing.

https://issues.apache.org/jira/browse/CASSANDRA-4275 suggests 256KB





Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: ruslan usifov [mailto:ruslan.usi...@gmail.com]
 Sent: Monday, June 18, 2012 15:13
 To: user@cassandra.apache.org
 Subject: cassandra 1.0.x and java 1.7

 Hello!

 Is it safe to use java 1.7 with cassandra 1.0.x Reason why i want do that, is
 that in java 1.7 appear options for rotate GC log:

 http://bugs.sun.com/bugdatabase/view_bug.do;jsessionid=ff824681055961f
 fffe1f62393b68deb5?bug_id=6941923



RE: portability between enterprise and community version

2012-06-13 Thread Viktor Jevdokimov
Do not mix Linux and Windows nodes.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Abhijit Chanda [mailto:abhijit.chan...@gmail.com]
Sent: Wednesday, June 13, 2012 09:21
To: user@cassandra.apache.org
Subject: portability between enterprise and community version

Hi All,

Is it possible to communicate from a  datastax enterprise edition to datastax 
community edition.
Actually i want to set one of my node in linux box and other in windows. Please 
suggest.


With Regards,
--
Abhijit Chanda
VeHere Interactive Pvt. Ltd.
+91-974395
inline: signature-logo368b.png

RE: portability between enterprise and community version

2012-06-13 Thread Viktor Jevdokimov
Repair (streaming) will not work.

Probably schema update will not work also, it was long time ago, don't remember.

Migration of the cluster between Windows and Linux also not an easy task, a lot 
of manual work.

Finally, mixed Cassandra environments are not supported as by DataStax as by 
anyone else.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Abhijit Chanda [mailto:abhijit.chan...@gmail.com]
Sent: Wednesday, June 13, 2012 10:54
To: user@cassandra.apache.org
Subject: Re: portability between enterprise and community version

Hi Viktor Jevadokimov,

May i know what are the issues i may face if i mix windows cluster along with 
linux cluster.


Regards,
--
Abhijit Chanda
VeHere Interactive Pvt. Ltd.
+91-974395
inline: signature-logo29.png

RE: portability between enterprise and community version

2012-06-13 Thread Viktor Jevdokimov
I remember that join and decommission didn’t worked since using streaming. All 
problems was due to paths differences between Windows and Linux styles.

So how do you move keyspaces? Using streaming (join/decommission), or manually?




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Sasha Dolgy [mailto:sdo...@gmail.com]
Sent: Wednesday, June 13, 2012 12:04
To: user@cassandra.apache.org
Subject: Re: portability between enterprise and community version

I consistently move keyspaces from linux machines onto windows machines for 
development purposes.  I've had no issues ... but would probably be hesitant in 
rolling this out into a productive instance.  Depends on the level of risk you 
want to take.  : )  Run some tests ... mix things up and share your experiences 
... Personally, I could see some value in not really caring what OS my 
cassandra instances are running on ... just that the JVM's are consistent and 
the available hardware resources are sufficient

I don't speak for the vendors mentioned in this thread, but traditionally, the 
first step towards supportability is finding the problems / identifying the 
risks and see if they can be resolved ...

-sd
On Wed, Jun 13, 2012 at 10:26 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
Repair (streaming) will not work.

Probably schema update will not work also, it was long time ago, don’t remember.

Migration of the cluster between Windows and Linux also not an easy task, a lot 
of manual work.

Finally, mixed Cassandra environments are not supported as by DataStax as by 
anyone else.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063tel:%2B370%205%20212%203063, Fax +370 5 261 
0453tel:%2B370%205%20261%200453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#%21/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News]http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Abhijit Chanda 
[mailto:abhijit.chan...@gmail.commailto:abhijit.chan...@gmail.com]
Sent: Wednesday, June 13, 2012 10:54
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: portability between enterprise and community version

Hi Viktor Jevadokimov,

May i know what are the issues i may face if i mix windows cluster along with 
linux cluster.

inline: image001.pnginline: signature-logo2149.png

RE: portability between enterprise and community version

2012-06-13 Thread Viktor Jevdokimov
Clients are clients, servers are servers. Why do you need mixed environment 
Cassandra cluster? Isn't enough mixed clients?



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Abhijit Chanda [mailto:abhijit.chan...@gmail.com]
Sent: Wednesday, June 13, 2012 12:41
To: user@cassandra.apache.org
Subject: Re: portability between enterprise and community version

Hi Sasha, Viktor,

In my case i have a project in which both java and .NET modules are there. For 
Java i m using Astyanax API and for .NET Fluent Cassandra. I am using DSE 2.1 
for development purpose as i need partial searching in some of queries, which i 
m doing with the help of solr which is integrated in DSE latest versions.  And 
this is working fine for all the Java modules, but i want the same for .NET 
modules also. Thats why i was looking for a mixed environment.
Finally i want to ask whether i am moving in the right direction or not? please 
suggest.

Thanks,
--
Abhijit Chanda
VeHere Interactive Pvt. Ltd.
+91-974395
inline: signature-logo29.png

RE: nodetool repair -pr enough in this scenario?

2012-06-05 Thread Viktor Jevdokimov
Understand simple mechanics first, decide how to act later.

Without -PR there's no difference from which host to run repair, it runs for 
the whole 100% range, from start to end, the whole cluster, all nodes, at once.

With -PR it runs only for a primary range of a node you are running a repair.
Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) N1=C-A, 
N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC aware.
So running repair with -PR on node N2 will only repair a range A-B, for which 
node N2 is a primary and N3 is a backup. N2 and N3 will synchronize A-B range 
one with other. For other ranges you need to run on other nodes.

Without -PR running on any node will repair all ranges, A-B, B-C, C-A. A node 
you run a repair without -PR is just a repair coordinator, so no difference, 
which one will be next time.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: David Daeschler [mailto:david.daesch...@gmail.com]
Sent: Tuesday, June 05, 2012 08:59
To: user@cassandra.apache.org
Subject: nodetool repair -pr enough in this scenario?

Hello,

Currently I have a 4 node cassandra cluster on CentOS64. I have been running 
nodetool repair (no -pr option) on a weekly schedule like:

Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri

In this scenario, if I were to add the -pr option, would this still be 
sufficient to prevent forgotten deletes and properly maintain consistency?

Thank you,
- David
inline: signature-logo29.png

RE: nodetool repair -pr enough in this scenario?

2012-06-05 Thread Viktor Jevdokimov
But in any case, repair is a two way process?
I mean that repair without -PR on node N1 will repair N1 and N2 and N3, because 
N2 is a replica of N1 range and N1 is a replica of N3 range?
And if there're more ranges, that not belongs to N1, that ranges and nodes will 
not be repaired?


Am I understood correctly, that repair with or without -PR is not a repair 
selected node process, but synchronize data range(s) between replicas 
process?
Single DC scenario:
With -PR: synchronize data for only primary data range of selected node between 
all nodes for that range (max number of nodes for the range = RF).
Without -PR: synchronize data for all data ranges of selected node (primary and 
replica) between all nodes of that ranges (max number of nodes for the ranges = 
RF*RF). Not effective since ranges overlaps, the same ranges will be 
synchronized more than once (max = RF times).
Multiple DC with 100% data range in each DC scenario: the same, only RF = sum 
of RF from all DC's.
Is that correct?

Finally - is this process for SSTables only, excluding memtables and hints?





Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Tuesday, June 05, 2012 11:02
To: user@cassandra.apache.org
Subject: Re: nodetool repair -pr enough in this scenario?

On Tue, Jun 5, 2012 at 8:44 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
Understand simple mechanics first, decide how to act later.

Without -PR there's no difference from which host to run repair, it runs for 
the whole 100% range, from start to end, the whole cluster, all nodes, at once.

That's not exactly true. A repair without -pr will repair all the ranges of the 
node on which repair is ran. So it will only repair the ranges that the node is 
a replica for. It will *not* repair the whole cluster (unless the replication 
factor is equal to the number of nodes in the cluster but that's a degenerate 
case). And hence it does matter on which host repair is run (it always matter, 
whether you use -pr or not).

In general you want to use repair without -pr in case where you want to repair 
a specific node. Typically, if a node was dead for a reasonably long time, you 
may want to run a repair (without -pr) on that specific node to have him catch 
up faster (faster that if you were only relying on read-repair and 
hinted-handoff).

For repairing a whole cluster, as is the case for the weekly scheduled repairs 
in the initial question, you want to use -rp. You *do not* want to use repair 
without -pr in that case. You do not because for that task using -pr is more 
efficient (and to be clear, not using -pr won't cause problems, but it does is 
less efficient).

--
Sylvain



With -PR it runs only for a primary range of a node you are running a repair.
Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) N1=C-A, 
N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC aware.
So running repair with -PR on node N2 will only repair a range A-B, for which 
node N2 is a primary and N3 is a backup. N2 and N3 will synchronize A-B range 
one with other. For other ranges you need to run on other nodes.

Without -PR running on any node will repair all ranges, A-B, B-C, C-A. A node 
you run a repair without -PR is just a repair coordinator, so no difference, 
which one will be next time.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063tel:%2B370%205%20212%203063, Fax +370 5 261 
0453tel:%2B370%205%20261%200453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News]http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must

RE: repair

2012-06-04 Thread Viktor Jevdokimov
Why without -PR when recovering from crash?

Repair without -PR runs full repair of the cluster, the node which receives a 
command is a repair controller, ALL nodes synchronizes replicas at the same 
time, streaming data between each other.
The problems may arise:

· When streaming hangs (it tends to hang even on a stable network), 
repair session hangs (any version does re-stream?)

· Network will be highly saturated

· In case of high inconsistency some nodes may receive a lot of data, 
disk usage much more than 2x (depends on RF)

· A lot of compactions will be pending

IMO, best way to run repair is from script with -PR for single CF from single 
node at a time and monitoring progress, like:
repair -pr node1 ks1 cf1
repair -pr node2 ks1 cf1
repair -pr node3 ks1 cf1
repair -pr node1 ks1 cf2
repair -pr node2 ks1 cf2
repair -pr node3 ks1 cf2
With some progress or other control in between, your choice.

Use repair with care, do not let your cluster go down.





Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: R. Verlangen [mailto:ro...@us2.nl]
Sent: Monday, June 04, 2012 15:17
To: user@cassandra.apache.org
Subject: Re: repair

The repair -pr only repairs the nodes primary range: so is only usefull in 
day to day use. When you're recovering from a crash use it without -pr.
2012/6/4 Romain HARDOUIN 
romain.hardo...@urssaf.frmailto:romain.hardo...@urssaf.fr

Run repair -pr in your cron.

Tamar Fraenkel ta...@tok-media.commailto:ta...@tok-media.com a écrit sur 
04/06/2012 13:44:32 :

 Thanks.

 I actually did just that with cron jobs running on different hours.

 I asked the question because I saw that when one of the logs was
 running the repair, all nodes logged some repair related entries in
 /var/log/cassandra/system.log

 Thanks again,
 Tamar Fraenkel
 Senior Software Engineer, TOK Media



--
With kind regards,

Robin Verlangen
Software engineer

W www.robinverlangen.nlhttp://www.robinverlangen.nl
E ro...@us2.nlmailto:ro...@us2.nl

Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo29.png

FYI: Java 7u4 on Linux requires higher stack size

2012-05-25 Thread Viktor Jevdokimov
Hell all,

We've started to test Oracle Java 7u4 (currently we're on 7u3) on Linux to try 
G1 GC.

Cassandra can't start on 7u4 with exception:

The stack size specified is too small, Specify at least 160k
Cannot create Java VM

Changing in cassandra-env.sh -Xss128k to -Xss160k allowed to start Cassandra, 
but when Thrift client disconnects, Cassandra log fills with exceptions:

ERROR 17:08:56,300 Fatal exception in thread Thread[Thrift:13,5,main]
java.lang.StackOverflowError
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at 
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
at 
org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at 
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
at 
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at 
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Increasing stack size from 160k to 192k eliminated such excepitons.


Just wanted you to know if someone tries to migrate to Java 7u4.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo29.png

RE: Replication factor

2012-05-24 Thread Viktor Jevdokimov
All data is in the page cache. No repairs. Compactions not hitting disk for 
read. CPU 50%. ParNew GC 100 ms in average.

After one compaction completes, new sstable is not in page cache, there may be 
a disk usage spike before data is cached, so local reads gets slower for a 
moment, comparing with other nodes. Redirecting almost all requests to other 
nodes finally ends up with a huge latency spike almost on all nodes, especially 
when ParNew GC may spike on one node (200ms). We call it cluster hiccup, 
when incoming and outgoing network traffic drops for a moment.

And such hiccups happens several times an hour, few seconds long. Playing with 
badness threshold did not gave a lot better results, but turning DS off 
completely fixed all problems with latencies, node spikes, cluster hiccups and 
network traffic drops.

In our case, our client is selecting endpoints for a key by calculating a 
token, so we always hit a replica.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Thursday, May 24, 2012 13:00
To: user@cassandra.apache.org
Subject: Re: Replication factor

Your experience is when using CL ONE the Dynamic Snitch is moving local reads 
off to other nodes and this is causing spikes in read latency ?

Did you notice what was happening on the node for the DS to think it was so 
slow ? Was compaction or repair going on ?

Have you played with the badness threshold 
https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L472 ?

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 24/05/2012, at 5:28 PM, Viktor Jevdokimov wrote:


Depends on use case. For ours we have another experience and statistics, when 
turning dynamic snitch off makes overall latency and spikes much, much lower.



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

signature-logo29.pnghttp://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Brandon Williams 
[mailto:dri...@gmail.com]mailto:[mailto:dri...@gmail.com]
Sent: Thursday, May 24, 2012 02:35
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: Replication factor

On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
 When RF == number of nodes, and you read at CL ONE you will always be reading 
 locally.
always be reading locally - only if Dynamic Snitch is off. With dynamic 
snitch on request may be redirected to other node, which may introduce 
latency spikes.

Actually it's preventing spikes, since if it won't read locally that means the 
local replica is in worse shape than the rest (compacting, repairing, etc.)

-Brandon

inline: signature-logo18be.png

RE: Replication factor

2012-05-23 Thread Viktor Jevdokimov
 When RF == number of nodes, and you read at CL ONE you will always be reading 
 locally.
always be reading locally - only if Dynamic Snitch is off. With dynamic 
snitch on request may be redirected to other node, which may introduce 
latency spikes.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Wednesday, May 23, 2012 13:00
To: user@cassandra.apache.org
Subject: Re: Replication factor

RF is normally adjusted to modify availability (see 
http://thelastpickle.com/2011/06/13/Down-For-Me/)

for example, if I have 4 nodes cluster in one data center, how can RF=2 vs RF=4 
affect read performance? If consistency level is ONE, looks reading does not 
need to go to another hop to get data if RF=4, but it would do more work on 
read repair in the background.
Read Repair does not run at CL ONE.
When RF == number of nodes, and you read at CL ONE you will always be reading 
locally. But with a low consistency.
If you read with QUORUM when RF == number of nodes you will still get some 
performance benefit from the data being read locally.

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/05/2012, at 9:34 AM, Daning Wang wrote:


Hello,

What is the pros and cons to choose different number of replication factor in 
term of performance? if space is not a concern.

for example, if I have 4 nodes cluster in one data center, how can RF=2 vs RF=4 
affect read performance? If consistency level is ONE, looks reading does not 
need to go to another hop to get data if RF=4, but it would do more work on 
read repair in the background.

Can you share some insights about this?

Thanks in advance,

Daning

inline: signature-logo7789.png

RE: Replication factor

2012-05-23 Thread Viktor Jevdokimov
Depends on use case. For ours we have another experience and statistics, when 
turning dynamic snitch off makes overall latency and spikes much, much lower.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Brandon Williams [mailto:dri...@gmail.com]
Sent: Thursday, May 24, 2012 02:35
To: user@cassandra.apache.org
Subject: Re: Replication factor

On Wed, May 23, 2012 at 5:51 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
 When RF == number of nodes, and you read at CL ONE you will always be reading 
 locally.
always be reading locally - only if Dynamic Snitch is off. With dynamic 
snitch on request may be redirected to other node, which may introduce 
latency spikes.

Actually it's preventing spikes, since if it won't read locally that means the 
local replica is in worse shape than the rest (compacting, repairing, etc.)

-Brandon
inline: signature-logo29.png

RE: Safely Disabling Compaction

2012-05-20 Thread Viktor Jevdokimov
To temporarily turn off compactions without schema update, use
nodetool –h node_ip -p port setcompactionthreshold keyspace cfname 
minthreshold maxthreshold
for every node and every column family you need.

If nodetool throws same exception, do in 2 steps:

1.   nodetool setcompactionthreshold keyspace cfname 0 32 (32 – use 
yours instead)

2.   nodetool setcompactionthreshold keyspace cfname 0 0
To restore, set your normal values.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Vijay [mailto:vijay2...@gmail.com]
Sent: Friday, May 18, 2012 06:09
To: user@cassandra.apache.org
Cc: cassandra-u...@incubator.apache.org
Subject: Re: Safely Disabling Compaction

I would rather set the Keyspace setting min_compaction_threshold and 
max_compaction_threshold to be a higher number and once i am ready i will put 
the value back... This way i dont need to restart.
Having said that why not set the compaction throughput to 1 (low enough to not 
have contention) and complete the stream?

Regards,
/VJ


On Wed, May 16, 2012 at 2:43 PM, sj.climber 
sj.clim...@gmail.commailto:sj.clim...@gmail.com wrote:
Hi,

In an effort to minimize IO contention, I'd like to disable compactions
while I'm streaming SSTables to the cluster.  When done streaming, I intend
on forcing a major compaction through nodetool.

Elsewhere in the forums, various folks suggest setting
max_compaction_threshold = 0 to disable compaction.  While this works
sometimes (via 'update column family family with
max_compaction_threshold=0'), I've observed a number of serious issues with
this approach:

1) You can't create a column family with max_compaction_threshold = 0.  The
CLI reports that min_compaction_threshold must have a value = 2, and
max_compaction_threshold can't be lower than it.  Worse yet, trying to
create a column family with max_compaction_threshold = 0 gets the cluster
into a Schema Disagreement Exception (since the node on which you issue the
migration command fails with a fatal error).

2) Cassandra will allow me to update an existing column family with
max_compaction_threshold = 0.  But if I restart the node, it will crash on
startup.
java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at
org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:160)
Caused by: java.lang.RuntimeException:
java.lang.reflect.InvocationTargetException
...
org.apache.cassandra.config.CFMetaData.createCompactionStrategyInstance(CFMetaData.java:839)
   ... 14 more
Caused by: java.lang.RuntimeException: The max_compaction_threshold cannot
be smaller than the min.
   at
org.apache.cassandra.db.ColumnFamilyStore.setMaximumCompactionThreshold(ColumnFamilyStore.java:1740)
   at org.apache.


Is there another solution for more safely enabling/disabling compaction?

Thanks!

--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Safely-Disabling-Compaction-tp7562777.html
Sent from the 
cassandra-u...@incubator.apache.orgmailto:cassandra-u...@incubator.apache.org 
mailing list archive at Nabble.com.

inline: signature-logo29.png

RE: cassandra read latency help

2012-05-17 Thread Viktor Jevdokimov
 Gurpreet Singh wrote:
 Any ideas on what could help here bring down the read latency even more ?

Avoid Cassandra forwarding request to other nodes:
- Use consistency level ONE;
- Create data model to do single request with single key, since different keys 
may belong to different nodes and requires forwarding requests to them;
- Use smart client to calculate token for key and select appropriate node 
(primary or replica) by token range;
- Turn off Dynamic Snitch (it may forward request to other replica even it has 
the data);
- Have all or hot data in page cache (no HDD disk IO) or use SSD;
- If you do regular updates to key, do not use row cache, otherwise you may try.




Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.


RE: Exception when truncate

2012-05-17 Thread Viktor Jevdokimov
Truncate flushes all memtables to free up commit logs, and that on all nodes. 
So this takes time. Discussed on this list not so long ago.

Watch for:
https://issues.apache.org/jira/browse/CASSANDRA-3651
https://issues.apache.org/jira/browse/CASSANDRA-4006



Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies. -Original Message-
 From: ruslan usifov [mailto:ruslan.usi...@gmail.com]
 Sent: Thursday, May 17, 2012 13:06
 To: user@cassandra.apache.org
 Subject: Re: Exception when truncate

 Also i miss understand why on empty CF(no any SStable) truncate heavy
 loads disk??

 2012/5/17 ruslan usifov ruslan.usi...@gmail.com:
  Hello
 
  I have follow situation on our test server:
 
  from cassandra-cli i try to use
 
  truncate purchase_history;
 
  3 times i got:
 
  [default@township_6waves] truncate purchase_history; null
  UnavailableException()
 at
  org.apache.cassandra.thrift.Cassandra$truncate_result.read(Cassandra.j
  ava:20212)
 at
  org.apache.cassandra.thrift.Cassandra$Client.recv_truncate(Cassandra.j
  ava:1077)
 at
  org.apache.cassandra.thrift.Cassandra$Client.truncate(Cassandra.java:1
  052)
 at
  org.apache.cassandra.cli.CliClient.executeTruncate(CliClient.java:1445
  )
 at
  org.apache.cassandra.cli.CliClient.executeCLIStatement(CliClient.java:
  272)
 at
  org.apache.cassandra.cli.CliMain.processStatementInteractive(CliMain.j
  ava:220)
 at org.apache.cassandra.cli.CliMain.main(CliMain.java:348)
 
 
  So this looks that truncate goes very slow and too long, than
  rpc_timeout_in_ms: 1 (this can happens because we have very slow
  disck on test machine)
 
  But in in cassandra system log i see follow exception:
 
 
  ERROR [MutationStage:7022] 2012-05-17 12:19:14,356
  AbstractCassandraDaemon.java (line 139) Fatal exception in thread
  Thread[MutationStage:7022,5,main]
  java.io.IOError: java.io.IOException: unable to mkdirs
 
 /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-
 pur
  chase_history
 at
 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column
 F
  amilyStore.java:1433)
 at
 
 org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.j
  ava:1462)
 at
 
 org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.j
  ava:1657)
 at
 
 org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandle
 r
  .java:50)
 at
 
 org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.j
  ava:59)
 at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecu
  tor.java:886)
 at
  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.
  java:908)
 at java.lang.Thread.run(Thread.java:662)
  Caused by: java.io.IOException: unable to mkdirs
 
 /home/cassandra/1.0.0/data/township_6waves/snapshots/1337242754356-
 pur
  chase_history
 at
  org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:
  140)
 at
  org.apache.cassandra.io.util.FileUtils.createDirectory(FileUtils.java:
  131)
 at
 
 org.apache.cassandra.db.ColumnFamilyStore.snapshotWithoutFlush(Column
 F
  amilyStore.java:1409)
 ... 7 more
 
 
  Also i see that in snapshort dir already exists
  1337242754356-purchase_history directory, so i think that snapshort
  names that generate cassandra not uniquely.
 
  PS: We use cassandra 1.0.10 on Ubuntu 10.0.4-LTS


RE: cassandra read latency help

2012-05-17 Thread Viktor Jevdokimov
Row cache is ok until keys are not heavily updated, otherwise it frequently 
invalidates and pressures GC.

The high latency is from your batch of 100 keys. Review your data model to 
avoid such reads, if you need low latency.

500M rows on one node, or on the cluster? Reading 100 random rows at total of 
40KB data from a data set of 180GB uncompressed under 30ms is not an easy task.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Gurpreet Singh [mailto:gurpreet.si...@gmail.com]
Sent: Thursday, May 17, 2012 20:24
To: user@cassandra.apache.org
Subject: Re: cassandra read latency help

Thanks Viktor for the advice.
Right now, i just have 1 node that i am testing against and i am using CL one.
Are you suggesting that the page cache might be doing better than the row cache?
I am getting row cache hit of 0.66 right now.

/G

On Thu, May 17, 2012 at 12:26 AM, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
 Gurpreet Singh wrote:
 Any ideas on what could help here bring down the read latency even more ?
Avoid Cassandra forwarding request to other nodes:
- Use consistency level ONE;
- Create data model to do single request with single key, since different keys 
may belong to different nodes and requires forwarding requests to them;
- Use smart client to calculate token for key and select appropriate node 
(primary or replica) by token range;
- Turn off Dynamic Snitch (it may forward request to other replica even it has 
the data);
- Have all or hot data in page cache (no HDD disk IO) or use SSD;
- If you do regular updates to key, do not use row cache, otherwise you may try.




Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063tel:%2B370%205%20212%203063
Fax: +370 5 261 0453tel:%2B370%205%20261%200453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo29.png

RE: zabbix templates

2012-05-14 Thread Viktor Jevdokimov
This is, for example, Zabbix agent config for Linux based Cassandra, just find 
cmdline-jmxclient-0.10.3.jar. Not all items are there, add any you need, if 
missed. Start from JMX to understand, what parameters to use with keys, for 
example,
cassandra.db.Caches[KEYSPACE,CACHE_NAME,COUNTER]


### CASSANDRA USER-DEFINED MONITORED PARAMETERS
UserParameter=cassandra.db.Caches[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
org.apache.cassandra.db:type=Caches,keyspace=$1,cache=$2 $3 21 | awk '{print 
$$6;}'
UserParameter=cassandra.db.ColumnFamilies[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
org.apache.cassandra.db:type=ColumnFamilies,keyspace=$1,columnfamily=$2 $3 21 
| awk '{print $$6;}'
UserParameter=cassandra.db.CompactionManager[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
org.apache.cassandra.db:type=CompactionManager $1 21 | awk '{print $$6;}'
UserParameter=cassandra.db.StorageProxy[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
org.apache.cassandra.db:type=StorageProxy $1 21 | awk '{print $$6;}'
UserParameter=cassandra.db.StorageService[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
org.apache.cassandra.db:type=StorageService $1 21 | awk 
'{split($$6,a,E);if(a[2]!=){split(a[1],c,.);b=c[1]c[2];for(i=1;i=a[2]-length(c[2]);i++)b=b0;}
 else b=a[1];print b;}'
UserParameter=cassandra.memory.Heap[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 java.lang:type=Memory 
HeapMemoryUsage 21 | awk '/$1/ {print $$2;}'
UserParameter=cassandra.memory.NonHeap[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 java.lang:type=Memory 
NonHeapMemoryUsage 21 | awk '/$1/ {print $$2;}'
UserParameter=cassandra.request.MutationStage[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
org.apache.cassandra.request:type=MutationStage $1 21 | awk '{print $$6;}'
UserParameter=cassandra.request.ReadRepairStage[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
org.apache.cassandra.request:type=ReadRepairStage $1 21 | awk '{print $$6;}'
UserParameter=cassandra.request.ReadStage[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
org.apache.cassandra.request:type=ReadStage $1 21 | awk '{print $$6;}'
UserParameter=cassandra.request.RequestResponseStage[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
org.apache.cassandra.request:type=MutationStage $1 21 | awk '{print $$6;}'
UserParameter=cassandra.runtime[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
java.lang:type=Runtime $1 21 | awk '{print $$6;}'
UserParameter=cassandra.threading[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
java.lang:type=Threading $1 21 | awk '{print $$6;}'
UserParameter=cassandra.os[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
java.lang:type=OperatingSystem $1 21 | awk '{print $$6;}'
UserParameter=cassandra.gc.parnew[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
java.lang:type=GarbageCollector,name=ParNew LastGcInfo 21 | awk '/$1/ {print 
$$2;}'
UserParameter=cassandra.gc.cms[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
java.lang:type=GarbageCollector,name=ConcurrentMarkSweep LastGcInfo 21 | awk 
'/$1/ {print $$2;}'
UserParameter=cassandra.db.DynamicSnitchScores[*],java -jar 
/etc/zabbix/cmdline-jmxclient-0.10.3.jar - 127.0.0.1:7199 
org.apache.cassandra.db:type=DynamicEndpointSnitch,instance=* Scores 21 | awk 
'{split($$0,a, );for(i in 
a){if(match(a[i],/$1/)!=0){split(a[i],b,=);sub(/,|}/,,b[2]);print 
b[2];break;}}}'



Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Cord MacLeod [mailto:cordmacl...@gmail.com]
Sent: Saturday, May 12, 2012 06:42
To: user@cassandra.apache.org
Subject: zabbix templates

I've seen some Cacti templates for Cassandra and a JMX bridge called zap cat, 
but has anyone created Zabbix templates for Cassandra?


cassandra.conf
Description: cassandra.conf


RE: how to upgrade my cassadra from SizeTieredCompaction to LeveledCompactiom

2012-05-14 Thread Viktor Jevdokimov
 There is 2T data on each server. Can someone give me some advice?
 do not do it
Best advice!




Best regards / Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.


RE: counter CF and TTL

2012-05-14 Thread Viktor Jevdokimov
There's no TTL on counter columns and no ready-to-use solution I know about.
https://issues.apache.org/jira/browse/CASSANDRA-2774





Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Tamar Fraenkel [mailto:ta...@tok-media.com]
Sent: Sunday, May 13, 2012 18:30
To: cassandra-u...@incubator.apache.org
Subject: counter CF and TTL

Hi!
I saw that when Counter CF were first introduced there was no support for TTL.
But I see that Hector does have TTL for HCounterColumn
So does a counter column have TTL or not?

I actually don't have an issue of big rows, but I don't need the data after a 
two weeks or so, so it seems a shame to clutter the DB with it.
Thanks,

Tamar Fraenkel
Senior Software Engineer, TOK Media
[Inline image 1]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956



inline: image001.pnginline: signature-logo29.png

RE: get dinamicsnith info from php

2012-05-14 Thread Viktor Jevdokimov
I'm not sure, that selecting node upon DS is a good idea. First of all every 
node has values about every node, including self. Self DS values are always 
better than others.

For example, 3 nodes RF=2:


N1

N2

N3

N1

0.5ms

2ms

2ms

N2

2ms

0.5ms

2ms

N3

2ms

2ms

0.5ms


We have monitored many Cassandra counters, including DS values for every node, 
and graphs shows that latencies is not about load.

So the strategy should be based on use case, node count, RF, replica placement 
strategy, read repair chance, and more, and more...

What do you want to achieve?




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: ruslan usifov [mailto:ruslan.usi...@gmail.com]
Sent: Monday, May 14, 2012 16:58
To: user@cassandra.apache.org
Subject: get dinamicsnith info from php

Hello

I want to route request from php client to minimaly loaded node, so i need 
dinamicsnitch info and gosip, how can i get this info fro php. Perhaps need 
some daemon that can communicate with cassandra gosip and translate this info 
to php (socket for example)???
inline: signature-logo29.png

RE: get dinamicsnith info from php

2012-05-14 Thread Viktor Jevdokimov
Let say you have 8 nodes cluster with replication factor 3. If one node is 
down, for its token range you have only 2 nodes left, not 7, which can process 
you requests - other nodes will forward requests to the nearest (depends on 
snitch) or with lower latency (depends on dynamic snitch) of 2 remaining.

I have no idea about PHP and its multithreading capabilities, if it's 
impossible to run background thread to return dead endpoint to the list, 
instead of checking it on HTTP request thread, you're stacked. For the lower 
latencies dynamic snitch already do a job for you, selecting a node with lower 
latencies.

If you'd like Cassandra to avoid forwarding requests to appropriate node, but 
making a direct request to a node where data is, you need smarter client, 
capable to select node by key and other things to do to achieve this.




Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: ruslan usifov [mailto:ruslan.usi...@gmail.com]
Sent: Monday, May 14, 2012 17:41
To: user@cassandra.apache.org
Subject: Re: get dinamicsnith info from php

Sorry for my bad english.


I want to solve follow problem. For example we down one node for maintenance 
reason, for a long time (30 min). Now we use TSocketPool for polling connection 
to cassandra, but this poll implementation is as i think not so good, it have a 
custom parameter setRetryInterval, with allow off broken node (now we set i to 
10sec), but this mean that every 10sec pool will try to connet down node (i 
repeat we shutdown node for maintance reason), because it doesn't know node 
dead or node, but cassandra cluster know this, and this connection attempt is 
senselessly, also when node make compact it can be heavy loaded, and can't 
serve client reqest very good (at this moment we can got little increase of avg 
backend responce time)
2012/5/14 Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
I'm not sure, that selecting node upon DS is a good idea. First of all every 
node has values about every node, including self. Self DS values are always 
better than others.

For example, 3 nodes RF=2:


N1

N2

N3

N1

0.5ms

2ms

2ms

N2

2ms

0.5ms

2ms

N3

2ms

2ms

0.5ms


We have monitored many Cassandra counters, including DS values for every node, 
and graphs shows that latencies is not about load.

So the strategy should be based on use case, node count, RF, replica placement 
strategy, read repair chance, and more, and more...

What do you want to achieve?



Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#%21/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News]http://www.adform.com


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: ruslan usifov 
[mailto:ruslan.usi...@gmail.commailto:ruslan.usi...@gmail.com]
Sent: Monday, May 14, 2012 16:58
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: get dinamicsnith info from php

Hello

I want to route request from php client to minimaly loaded node, so i need 
dinamicsnitch info and gosip, how can i get this info fro php. Perhaps need 
some daemon that can communicate with cassandra gosip and translate this info 
to php (socket for example)???

inline: image001.pnginline: signature-logo29.png

RE: size tiered compaction - improvement

2012-04-18 Thread Viktor Jevdokimov
Our use case requires Column TTL, not CF TTL, because it is variable, not 
constant.


Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Radim Kolar [mailto:h...@filez.com]
Sent: Wednesday, April 18, 2012 12:57
To: user@cassandra.apache.org
Subject: Re: size tiered compaction - improvement


 Any compaction pass over A will first convert the TTL data into tombstones.

 Then, any subsequent pass that includes A *and all other sstables
 containing rows with the same key* will drop the tombstones.
thats why i proposed to attach TTL to entire CF. Tombstones would not be needed


RE: tombstones problem with 1.0.8

2012-03-25 Thread Viktor Jevdokimov
Upon read from S1  S6 rows are merged, T3 timestamp wins.
T1 will be deleted upon S1 compaction with S6 or manual cleanup.
We're running major compactions nightly, a lot of inserts per day with TTL, 
some with deletes from app - no problems with tombstones.


Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Radim Kolar [mailto:h...@filez.com]
Sent: Sunday, March 25, 2012 13:20
To: user@cassandra.apache.org
Subject: Re: tombstones problem with 1.0.8

Scenario 4
T1 write column
T2 Flush memtable to S1
T3 del row
T4 flush memtable to S5
T5 tomstone S5 expires
T6 S5 is compacted but not with S1

Result?


RE: tombstones problem with 1.0.8

2012-03-23 Thread Viktor Jevdokimov
Yes, continued deletions of the same columns/rows will prevent removing them 
from final sstable upon compaction due to new timestamp.
You're getting sliding tombstone gc grace period in that case.

During compaction of selected sstables Cassandra checks the whole Column Family 
for the latest timestamp of the column/row, including other sstables and 
memtable.

You need to review your application logic.




Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Ross Black [mailto:ross.w.bl...@gmail.com]
Sent: Friday, March 23, 2012 07:16
To: user@cassandra.apache.org
Subject: Re: tombstones problem with 1.0.8

Hi Victor,

Thanks for your response.

Is there a possibility that continual deletions during compact could be 
blocking removal of the tombstones?  The full manual compact takes about 4 
hours per node for our data, so there is a large number of deletes occurring 
during that time.

This is the description from cassandra-cli

  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:3]
  Column Families:
ColumnFamily: weekly
  Key Validation Class: org.apache.cassandra.db.marshal.BytesType
  Default column value validator: org.apache.cassandra.db.marshal.BytesType
  Columns sorted by: org.apache.cassandra.db.marshal.BytesType
  Row cache size / save period in seconds / keys to save : 0.0/0/all
  Row Cache Provider: org.apache.cassandra.cache.SerializingCacheProvider
  Key cache size / save period in seconds: 20.0/14400
  GC grace seconds: 3600
  Compaction min/max thresholds: 3/8
  Read repair chance: 1.0
  Replicate on write: true
  Bloom Filter FP chance: default
  Built indexes: []
  Compaction Strategy: 
org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy


Ross


On 23 March 2012 02:55, Viktor Jevdokimov 
viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote:
Just tested 1.0.8 before upgrading from 1.0.7: tombstones created by TTL or by 
delete operation are perfectly deleted after either compaction or cleanup.
Have no idea about any other settings than gc_grace_seconds, check you schema 
from cassandra-cli.






Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com

Phone: +370 5 212 3063tel:%2B370%205%20212%203063. Fax: +370 5 261 
0453tel:%2B370%205%20261%200453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:



[twitter]http://twitter.com/#%21/adforminsider


Visit our bloghttp://www.adform.com/site/blog


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Ross Black [mailto:ross.w.bl...@gmail.commailto:ross.w.bl...@gmail.com]
Sent: Thursday, March 22, 2012 03:38
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: tombstones problem with 1.0.8

Hi,

We recently moved from 0.8.2 to 1.0.8 and the behaviour seems to have changed 
so that tombstones are now not being deleted.

Our application continually adds and removes columns from Cassandra.  We have 
set a short gc_grace time (3600) since our application would automatically 
delete zombies if they appear.
Under 0.8.2, the tombstones remained at a relatively constant number.
Under 1.0.8, the tombstones have been continually increasing so that they 
exceed the size of our real data (at this stage we have over 100G of 
tombstones).
Even after running a full compact the new compacted SSTable contains a massive 
number of tombstones, many that are several weeks old.

Have I missed some new configuration option to allow deletion of tombstones?

I also noticed that one of the changes between 0.8.2

RE: tombstones problem with 1.0.8

2012-03-23 Thread Viktor Jevdokimov
 You are explaining that if i have expired row tombstone and there exists 
 later timestamp on this row that tombstone is not deleted? If this works that 
 way, it will be never deleted.

Exactly. It is merged with new one.

Example 1: a row with 1 column in sstable. delete a row, not a column. after 
compaction or cleanup in sstable will exist an empty row key with tombstone.
Example 2: a row with 1 column in sstable. delete a column. after compaction or 
cleanup in sstable will exist a row with 1 column with tombstone.

Question: why delete operation is requested from application for a row/column 
that is already deleted (can't be returned by get)?



Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo31e4.pnginline: dm-exco16d5.pnginline: tweet3564.png

RE: tombstones problem with 1.0.8

2012-03-23 Thread Viktor Jevdokimov
Should not.

Scenario 1, write  delete in one memtable
T1 write column
T2 delete row
T3 flush memtable, sstable 1 contains empty row tombstone
T4 row tombstone expires
T5 compaction/cleanup, row disappears from sstable 2

Scenario 2, write  delete different sstables
T1 write column
T2 flush memtable, sstable 1 contains row with column
T3 delete row
T4 flush memtable, sstable 2 contains empty row tombstone
T5 row tombstone expires
T6 compaction, rows from sstable 1  2 merged, not saved to sstable 3

Scenario 3, alive tombstone
T1 write column
T2 flush memtable, sstable 1 contains row with column
T3 delete row
T4 flush memtable, sstable 2 contains empty row tombstone
T5 delete row (present in memtable)
T6 row tombstone for T3 expected to be expired
T7 compaction, sstable 3 row tombstone appears because of T5




Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Radim Kolar [mailto:h...@filez.com]
Sent: Friday, March 23, 2012 13:28
To: user@cassandra.apache.org
Subject: Re: tombstones problem with 1.0.8

Example:

T1  T2  T3

at T1 write column
at T2 delete row

at T3  tombstone expiration do compact ( T1 + T2  ) and drop expired tombstone

column from T1 will be alive again?


RE: tombstones problem with 1.0.8

2012-03-22 Thread Viktor Jevdokimov
Just tested 1.0.8 before upgrading from 1.0.7: tombstones created by TTL or by 
delete operation are perfectly deleted after either compaction or cleanup.
Have no idea about any other settings than gc_grace_seconds, check you schema 
from cassandra-cli.






Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Ross Black [mailto:ross.w.bl...@gmail.com]
Sent: Thursday, March 22, 2012 03:38
To: user@cassandra.apache.org
Subject: tombstones problem with 1.0.8

Hi,

We recently moved from 0.8.2 to 1.0.8 and the behaviour seems to have changed 
so that tombstones are now not being deleted.

Our application continually adds and removes columns from Cassandra.  We have 
set a short gc_grace time (3600) since our application would automatically 
delete zombies if they appear.
Under 0.8.2, the tombstones remained at a relatively constant number.
Under 1.0.8, the tombstones have been continually increasing so that they 
exceed the size of our real data (at this stage we have over 100G of 
tombstones).
Even after running a full compact the new compacted SSTable contains a massive 
number of tombstones, many that are several weeks old.

Have I missed some new configuration option to allow deletion of tombstones?

I also noticed that one of the changes between 0.8.2 and 1.0.8 was 
https://issues.apache.org/jira/browse/CASSANDRA-2786 which changed code to 
avoid dropping tombstones when they might still be needed to shadow data in 
another sstable.
Could this be having an impact since we continually add and remove columns even 
while a major compact is executing?


Thanks,
Ross
inline: signature-logo744e.pnginline: dm-exco3c0.pnginline: tweet6005.png

Re: how to increase compaction rate?

2012-03-13 Thread Viktor Jevdokimov
After loosing one node we had to repair, CFs was on leveled compaction.
For one CF each node had about 7GB of data.
Running a repair without primary range switch ended up some nodes exhausted
to about 60-100GB of 5MB sstables for that CF (a lot of files).
After switching back from leveled to tiered we ended up completely blocked
compactions on all nodes since this CF were compacting forever.
On one node a major compaction for that CF is CPU bound and may run with
unlimited compaction speed for 4-7 days at maximum 1MB/s rate, finally
compacting to 3GB of data (some data is deleted by TTL, some merged).

What we did to speedup this process to return all exhausted nodes into
normal state faster:
We have created a 6 temporary virtual single Cassandra nodes with 2 CPU
cores and 8GB RAM.
Stopped completely a compaction for CF on a production node.
Leveled sstables from this production node was divided into 6 ranges and
copied into 6 temporary empty nodes.
On each node we ran a major compaction to compact just 1/6 of data, about
10-14GB. It took 1-2 hours to compact them into 1GB of data.
Then all 6 sstables was copied into one of 6 nodes for a major compaction,
finally getting expected 3GB sstable.
Stopping production node, deleting files that was copied, returning
compacted (may need renaming) and node is back to normal.

Using separate nodes we saved original production nodes time not to compact
exhausted CF forever, blocking compactions for other CFs. With 6 separate
nodes we have compacted 2 productions nodes a day, so maybe it took the
same time, but production nodes were free for regular compactions for other
CFs.

After back to normal for our use case we stick to tiered compaction with a
major compaction nightly.
With our insertion/TTL deletion rates a leveled compaction is a nightmare,
even if amount of data is not very huge, just a few GBs/node.

2012/3/13 Thorsten von Eicken t...@rightscale.com

 On 3/12/2012 6:52 AM, Brandon Williams wrote:
  On Mon, Mar 12, 2012 at 4:44 AM, aaron morton aa...@thelastpickle.com
 wrote:
  I don't understand why I
  don't get multiple concurrent compactions running, that's what would
  make the biggest performance difference.
 
  concurrent_compactors
  Controls how many concurrent compactions to run, by default it's the
 number
  of cores on the machine.
 I'm on a quad-core machine so not setting concurrent_compactors should
 not be a limiting factor...
  With leveled compaction, I don't think you get any concurrency because
  it has to compact an entire level, and it can't proceed to the next
  level without completing the one before it.
 
  In short, if you want maximum throughput, stick with size tiered.
 I switched the CFs to tiered compaction and I still get no concurrency
 for the same CF. I now have two compactions running concurrently but
 always for different CFs. I've briefly seen a third for one of the small
 CFs, so it's willing to run more than two concurrently. Looks like I
 have to wait for a few days for all the compactions to complete. Talk
 about compaction hell!

 
  -Brandon
 



Truncate flushes memtables for all CFs causing timeouts

2012-03-06 Thread Viktor Jevdokimov
Hello,

Truncate uses RPC timeout, which is in my case set to 10 seconds (I want
even less) and it's not enough. I've seen in sources TODO for this case.

What I found is that truncate starting flush for all memtables for all CFs,
not only for a CF to be truncated. When there're a lot of CFs to be
flushed, it takes time.

Is it possible to flush only required CF for truncate, not all? This could
improve truncate time.


Best regards,
Viktor


Re: Truncate flushes memtables for all CFs causing timeouts

2012-03-06 Thread Viktor Jevdokimov
Thank you. To sum up, to free up and discard a commit log - flush all. So
higher timeout for truncate will/should work.

2012/3/6 aaron morton aa...@thelastpickle.com

 Truncate uses RPC timeout, which is in my case set to 10 seconds (I want
 even less) and it's not enough. I've seen in sources TODO for this case.

 created
 https://issues.apache.org/jira/browse/CASSANDRA-4006

 Is it possible to flush only required CF for truncate, not all? This could
 improve truncate time.

 see code comments here
 https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/ColumnFamilyStore.java#L1681

 AFAIK truncate is not considered a regular operation. (All nodes must be
 online for example)

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 7/03/2012, at 1:34 AM, Viktor Jevdokimov wrote:

 Hello,

 Truncate uses RPC timeout, which is in my case set to 10 seconds (I want
 even less) and it's not enough. I've seen in sources TODO for this case.

 What I found is that truncate starting flush for all memtables for all
 CFs, not only for a CF to be truncated. When there're a lot of CFs to be
 flushed, it takes time.

 Is it possible to flush only required CF for truncate, not all? This could
 improve truncate time.


 Best regards,
 Viktor






Re: how stable is 1.0 these days?

2012-03-05 Thread Viktor Jevdokimov
1.0.7 is very stable, weeks in high-load production environment without any
exception, 1.0.8 should be even more stable, check changes.txt for what was
fixed.


2012/3/2 Marcus Eriksson krum...@gmail.com

 beware of https://issues.apache.org/jira/browse/CASSANDRA-3820 though if
 you have many keys per node

 other than that, yep, it seems solid

 /Marcus


 On Wed, Feb 29, 2012 at 6:20 PM, Thibaut Britz 
 thibaut.br...@trendiction.com wrote:

 Thanks!

 We will test it on our test cluster in the coming weeks and hopefully put
 it into production on our 200 node main cluster. :)

 Thibaut

 On Wed, Feb 29, 2012 at 5:52 PM, Edward Capriolo 
 edlinuxg...@gmail.comwrote:

 On Wed, Feb 29, 2012 at 10:35 AM, Thibaut Britz
 thibaut.br...@trendiction.com wrote:
  Any more feedback on larger deployments of 1.0.*?
 
  We are eager to try out the new features in production, but don't want
 to
  run into bugs as on former 0.7 and 0.8 versions.
 
  Thanks,
  Thibaut
 
 
 
  On Tue, Jan 31, 2012 at 6:59 AM, Ben Coverston 
 ben.covers...@datastax.com
  wrote:
 
  I'm not sure what Carlo is referring to, but generally if you have
 done,
  thousands of migrations you can end up in a situation where the
 migrations
  take a long time to replay, and there are some race conditions that
 can be
  problematic in the case where there are thousands of migrations that
 may
  need to be replayed while a node is bootstrapped. If you get into this
  situation it can be fixed by copying migrations from a known good
 schema to
  the node that you are trying to bootstrap.
 
  Generally I would advise against frequent schema updates. Unlike rows
 in
  column families the schema itself is designed to be relatively static.
 
  On Mon, Jan 30, 2012 at 2:14 PM, Jim Newsham jnews...@referentia.com
 
  wrote:
 
 
  Could you also elaborate for creating/dropping column families?
  We're
  currently working on moving to 1.0 and using dynamically created
 tables, so
  I'm very interested in what issues we might encounter.
 
  So far the only thing I've encountered (with 1.0.7 + hector 1.0-2) is
  that dropping a cf may sometimes fail with UnavailableException.  I
 think
  this happens when the cf is busy being compacted.  When I
 sleep/retry within
  a loop it eventually succeeds.
 
  Thanks,
  Jim
 
 
  On 1/26/2012 7:32 AM, Pierre-Yves Ritschard wrote:
 
  Can you elaborate on the composite types instabilities ? is this
  specific to hector as the radim's posts suggests ?
  These one liner answers are quite stressful :)
 
  On Thu, Jan 26, 2012 at 1:28 PM, Carlo Pirescarlopi...@gmail.com
   wrote:
 
  If you need to use composite types and create/drop column families
 on
  the
  fly you must be prepared to instabilities.
 
 
 
 
 
  --
  Ben Coverston
  DataStax -- The Apache Cassandra Company
 
 

 I would call 1.0.7 rock fricken solid. Incredibly stable. It has been
 that way since I updated to 0.8.8  really. TBs of data, billions of
 requests a day, and thanks to JAMM, memtable type auto-tuning, and
 other enhancements I rarely, if ever, find a node in a state where it
 requires a restart. My clusters are beast-ing.

 There always is bugs in software, but coming from a guy who ran
 cassandra 0.6.1.Administration on my Cassandra cluster is like a
 vacation now.






Re: Huge amount of empty files in data directory.

2012-03-05 Thread Viktor Jevdokimov
After running Cassandra for 2 years in production on Windows servers,
starting from 0.7 beta2 up to 1.0.7 we have moved to Linux and forgot all
the hell we had on Windows. Having JNA, off-heap row cache and normally
working MMAP on Linux you're getting a lot better performance and stability
comparing to Windows, and less maintenance.

2012/3/1 Henrik Schröder skro...@gmail.com

 Great, thanks!


 /Henrik


 On Thu, Mar 1, 2012 at 13:08, Sylvain Lebresne sylv...@datastax.comwrote:

 It's a bug, namely: https://issues.apache.org/jira/browse/CASSANDRA-3616
 You'd want to upgrade.

 --
 Sylvain

 On Thu, Mar 1, 2012 at 1:01 PM, Henrik Schröder skro...@gmail.com
 wrote:
  Hi,
 
  We're running Cassandra 1.0.6 on Windows, and noticed that the amount of
  files in the datadirectory just keeps growing. We have about 60GB of
 data
  per node, we do a major compaction about once a week, but after
 compaction
  there's a lot of 0-byte temp files and old files that are kept for some
  reason. After 50 days of uptime there was around 5 files in each
  datadirectory, but when we restarted a server it deleted all the
 unnecessary
  files and it shrunk down to about 200 files.
 
  We're running without compression, and with the regular compaction
 strategy,
  not leveldb. I don't remember seeing this behaviour in older versions of
  Cassandra, shouldn't it delete temp files while running? Is it possible
 to
  force it to delete temp files while running? Is this fixed in a later
  version? Or do we have to periodically restart servers to clean up the
  datadirectories?
 
 
  /Henrik Schröder





Re: Cassandra cache patterns with thiny and wide rows

2012-03-05 Thread Viktor Jevdokimov
Depends on how large is a data set, specifically hot data, comparing to
available RAM, what is a heavy read load, and what are the latency
requirements.


2012/3/6 Maciej Miklas mac.mik...@googlemail.com

 I've asked this question already on stackoverflow but without answer - I
 wll try again:


 My use case expects heavy read load - there are two possible model design
 strategies:

1.

Tiny rows with row cache: In this case row is small enough to fit into
RAM and all columns are being cached. Read access should be fast.
2.

Wide rows with key cache. Wide rows with large columns amount are to
big for row cache. Access to column subset requires HDD seek.

 As I understand using wide rows is a good design pattern. But we would
 need to disable row cache - so  what is the benefit of such wide row
 (at least for read access)?

 Which approach is better 1 or 2?



RE: Internal error processing batch_mutate java.util.ConcurrentModificationException

2012-02-07 Thread Viktor Jevdokimov
Yes, the exception is for CounterColumn on Standard column family.

Created https://issues.apache.org/jira/browse/CASSANDRA-3870





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Monday, February 06, 2012 21:03
To: user@cassandra.apache.org
Subject: Re: Internal error processing batch_mutate 
java.util.ConcurrentModificationException

That looks like a bug. Were you writing counters ?


Can you please add it here https://issues.apache.org/jira/browse/CASSANDRA , 
include some information on the request that caused it and email the bug report 
back to the list.

(note to self) I *think* the problem is the counter WritePerformer 
implementations are put into the REPLICATE_ON_WRITE TP and then update the 
hints on the AbstractWriteResponeHandler asynchronously. This could happen 
after the write thread has move on to wait on the handlers which involves 
waiting on the hints futures.

thanks

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 4/02/2012, at 4:49 AM, Viktor Jevdokimov wrote:


What may be cause of the following exception in 1.0.7 Cassandra:

ERROR [Thrift:134] 2012-02-03 15:51:02,800 Cassandra.java (line 3462) Internal 
error processing batch_mutate
java.util.ConcurrentModificationException
at 
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at 
org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532)
at 
org.apache.cassandra.service.AbstractWriteResponseHandler.waitForHints(AbstractWriteResponseHandler.java:89)
at 
org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:58)
at 
org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:201)
at 
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:639)
at 
org.apache.cassandra.thrift.CassandraServer.internal_batch_mutate(CassandraServer.java:590)
at 
org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:598)
at 
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3454)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)



Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email:  viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063. Fax: +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania




signature-logo4913.pnghttp://www.adform.com/

dm-exco199c.png

Follow:


tweet3130.pnghttp://twitter.com/#!/adforminsider


Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png

RE: Consurrent compactors

2012-02-03 Thread Viktor Jevdokimov
My concern is not anout cleanup, but about supposed tendency of small sstables 
to accumulate during a single long running compactions. When next task is for 
the same column family as currently long-running compaction, other column 
families compactions are freezed and concurrent_compactors  1 setting just not 
working.




Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: aaron morton [mailto:aa...@thelastpickle.com]
Sent: Wednesday, February 01, 2012 21:51
To: user@cassandra.apache.org
Subject: Re: Consurrent compactors

(Assuming 1.0* release)
From the comments in cassandra.yaml

# Number of simultaneous compactions to allow, NOT including
# validation compactions for anti-entropy repair.  Simultaneous
# compactions can help preserve read performance in a mixed read/write
# workload, by mitigating the tendency of small sstables to accumulate
# during a single long running compactions. The default is usually
# fine and if you experience problems with compaction running too
# slowly or too fast, you should look at
# compaction_throughput_mb_per_sec first.
#
# This setting has no effect on LeveledCompactionStrategy.
#
# concurrent_compactors defaults to the number of cores.
# Uncomment to make compaction mono-threaded, the pre-0.8 default.
#concurrent_compactors: 1

If you set it to 1 then only 1 compaction should run at a time, excluding 
validation.

How often do you run a cleanup compaction ? They are only necessary when you 
perform a token move.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 1/02/2012, at 9:48 PM, Viktor Jevdokimov wrote:


Hi,

When concurrent compactors are set to more then 1, it's rare when more than 1 
compaction is running in parallel.

Didn't checked the source code, but it looks like when next compaction task 
(any of minor, major, or cleanup) is for the same CF, it will not start in 
parallel and next tasks are not checked.

Will it be possible to check all tasks, not only the next one, to find which of 
them can be started?

This is actual especially when nightly cleanup is running, a lot of cleanup 
tasks are pending, regular minor compactions are waiting until all cleanup 
compactions are finished.



Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email:  viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063. Fax: +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania




signature-logo29.pnghttp://www.adform.com/

dm-exco4823.png

Follow:


tweet18be.pnghttp://twitter.com/#!/adforminsider


Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png

Internal error processing batch_mutate java.util.ConcurrentModificationException

2012-02-03 Thread Viktor Jevdokimov
What may be cause of the following exception in 1.0.7 Cassandra:

ERROR [Thrift:134] 2012-02-03 15:51:02,800 Cassandra.java (line 3462) Internal 
error processing batch_mutate
java.util.ConcurrentModificationException
at 
java.util.AbstractList$Itr.checkForComodification(AbstractList.java:372)
at java.util.AbstractList$Itr.next(AbstractList.java:343)
at 
org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:532)
at 
org.apache.cassandra.service.AbstractWriteResponseHandler.waitForHints(AbstractWriteResponseHandler.java:89)
at 
org.apache.cassandra.service.AbstractWriteResponseHandler.get(AbstractWriteResponseHandler.java:58)
at 
org.apache.cassandra.service.StorageProxy.mutate(StorageProxy.java:201)
at 
org.apache.cassandra.thrift.CassandraServer.doInsert(CassandraServer.java:639)
at 
org.apache.cassandra.thrift.CassandraServer.internal_batch_mutate(CassandraServer.java:590)
at 
org.apache.cassandra.thrift.CassandraServer.batch_mutate(CassandraServer.java:598)
at 
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(Cassandra.java:3454)
at 
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
at 
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo4913.pnginline: dm-exco199c.pnginline: tweet3130.png

Consurrent compactors

2012-02-01 Thread Viktor Jevdokimov
Hi,

When concurrent compactors are set to more then 1, it's rare when more than 1 
compaction is running in parallel.

Didn't checked the source code, but it looks like when next compaction task 
(any of minor, major, or cleanup) is for the same CF, it will not start in 
parallel and next tasks are not checked.

Will it be possible to check all tasks, not only the next one, to find which of 
them can be started?

This is actual especially when nightly cleanup is running, a lot of cleanup 
tasks are pending, regular minor compactions are waiting until all cleanup 
compactions are finished.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png

RE: SQL DB Integration

2012-01-27 Thread Viktor Jevdokimov
Hello Krassimir,

From a typical programmer you should receive an answer, that this is possible, 
but easy or difficult - depends.
From a typical consultand you should receive a question - why?

 I am working on a project, for which I have to evaluate and recommend the 
 implementation of a new database system, with the following major 
 characteristics:

 * Operational scalability
Not advisable to do this automatically, do it manually.

 * Low cost
Compared to what?

 * Ability to serve both as a data storage facility and an advanced data 
 manipulation tool
Cassandra is not a data manipulaiton tool.

 * Speed of execution
Execution of what?

 * Real-time writing capability, with potential to record millions of client 
 data records in real time
Millions per second/minute/hour/day? Isn't any DB capable of this?

 * Flexibility: ability to support all client data types and formats, 
 structured and unstructured
Data types supported are limited, others as binary arrays.

 * Capability to support multiple data centers and geographies
Capable.

 * Ability to provide data infrastructure solutions for clients with small and 
 Big Data needs
Same soluiton for all? Will it be cost/performance/maintenance/support 
effective for all?

 * Full and flawless integration with the following 3 infrastructures:
   (1) A data mining application (IBM SPSS Modeler) that imports/exports data 
 from/to an SQL database
   (2) A partner platform, based on an Oracle Database (CSV data import/export)
   (3) Various client SQL databases, whose data elements will be uploaded and 
 replicated in the recommended database system
Cassandra (almost any storage) does not provide any integration. Integration is 
built upon storage APIs.

 As a result to my research, I am planning to recommend the implementation of 
 Apache Cassandra NoSQL DB, hosted on Amazon Elastic Compute Cloud (Amazon 
 EC2). I realize that the biggest challenge from the above 3 points is 
 probably the last one, since for each client we need to custom-build and 
 replicate their database, changing the data model from SQL to NoSQL. The 
 reason being that (1) and (2) relate only to transferring data up and down 
 between SQL and NoSQL environments.

 My question is how easy/difficult is it to build a GUI/API that will be able 
 to do the integration in the above 3 points with respect to transferring data 
 (upstream / downstream) between the Cassandra NoSQL NoSQL environments? Do 
 you have any other comments or suggestions that I should consider?
In my opinion you should do your research for Cassandra on specific questions, 
not global. First, define storage requirements from application/functionality 
perspective, then look for a solution.



Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.



Re: How to control location of data?

2012-01-11 Thread Viktor Jevdokimov
The idea behind client that controls location of a data is performance, to
avoid unnecessary network round-trips between nodes and unnecessary caching
of backup ranges. All of this mostly is true for reads at CL.ONE and RF1.

How it works (in our case):

Our client uses describe_ring that returns ring for specified Keyspace with
token ranges and replica endpoints for each range. First node in the list
for a token range is a kind of a primary, others are backup replicas.

The client for most requests for a single key calculates a token and
connects to node that is a primary node for this token. If primary is down,
next endpoint from the list of endpoints for that token range is used.

This way the network load between nodes is much lower. In our case, when
load balancing just rotated all nodes we've seen a 100Mbps load on a node,
while with an approach above 25Mbps only.

Caches on a single node is filled up with data of a primary range of that
node, avoiding of caching replica ranges that also belongs to this node by
RF.

The downside is when primary node is not accessible, backup node has no
cache for a range we're switched to.


2012/1/11 Andreas Rudolph andreas.rudo...@spontech-spine.com

 Hi!

 ...
 Again, it's probably a bad idea.

 I agree on that, now.

 Thank you.


 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 11/01/2012, at 4:56 AM, Roland Gude wrote:

 ** **
 Each node in the cluster is assigned a token (can be done automatically –
 but usually should not)
 The token of a node is the start token of the partition it is responsible
 for (and the token of the next node is the end token of the current tokens
 partition)
 ** **
 Assume you have the following nodes/tokens (which are usually numbers but
 for the example I will use letters)
 ** **
 N1/A
 N2/D
 N3/M
 N4/X
 ** **
 This means that N1 is responsible (primary) for [A-D)
N2 for [D-M)
N3 for [M-X)
 And N4 for [X-A)
 ** **
 If you have a replication factor of 1 data will go on the nodes like this:
 
 ** **
 B - N1
 E-N2
 X-N4
 ** **
 And so on
 If you have a higher replication factor, the placement strategy decides
 which node will take replicas of which partition (becoming secondary node
 for that partition)
 Simple strategy will just put the replica on the next node in the ring
 So same example as above but RF of 2 and simple strategy:
 ** **
 B- N1 and N2
 E - N2 and N3
 X - N4 and N1
 ** **
 Other strategies can factor in things like “put  data in another
 datacenter” or “put data in another rack” or such things.
 ** **
 Even though the terms primary and secondary imply some means of quality or
 consistency, this is not the case. If a node is responsible for a piece of
 data, it will store it.
 ** **
 ** **
 But placement of the replicas is usually only relevant for availability
 reasons (i.e. disaster recovery etc.)
 Actual location should mean nothing to most applications as you can ask
 any node for the data you want and it will provide it to you (fetching it
 from the responsible nodes).
 This should be sufficient in almost all cases.
 ** **
 So in the above example again, you can ask N3 “what data is available” and
 it will tell you: B, E and X, or you could ask it “give me X” and it will
 fetch it from N4 or N1 or both of them depending on consistency
 configuration and return the data to you.
 ** **
 ** **
 So actually if you use Cassandra – for the application the actual storage
 location of the data should not matter. It will be available anywhere in
 the cluster if it is stored on any reachable node.
 ** **
 *Von:* Andreas Rudolph [mailto:andreas.rudo...@spontech-spine.com]
 *Gesendet:* Dienstag, 10. Januar 2012 15:06
 *An:* user@cassandra.apache.org
 *Betreff:* Re: AW: How to control location of data?
 ** **
 Hi!
 ** **
 Thank you for your last reply. I'm still wondering if I got you right...**
 **
 ** **

 ... 
 A partitioner decides into which partition a piece of data belongs

 Does your statement imply that the partitioner does not take any decisions
 at all on the (physical) storage location? Or put another way: What do you
 mean with partition?
 ** **
 To quote http://wiki.apache.org/cassandra/ArchitectureInternals: ... 
 AbstractReplicationStrategy
 controls what nodes get secondary, tertiary, etc. replicas of each key
 range. Primary replica is always determined by the token ring (...)


 
 ... 
 You can select different placement strategies and partitioners for
 different keyspaces, thereby choosing known data to be stored on known
 hosts.
 This is however discouraged for various reasons – i.e.  you need a lot of
 knowledge about your data to keep the cluster balanced. What is your
 usecase for this requirement? there is probably a more suitable solution.*
 ***
  
 What we want is to 

Leveled compaction strategy and expiring columns

2011-12-19 Thread Viktor Jevdokimov
Hi,

We're trying to understand how leveled compaction works.

The documentation written is about new/updated columns only.

What about expiring columns and TTL? When higher level sstables will be 
compacted and expired columns removed?








Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png

RE: [RELEASE] Apache Cassandra 1.0.6 released

2011-12-16 Thread Viktor Jevdokimov
Created https://issues.apache.org/jira/browse/CASSANDRA-3642

-Original Message-
From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com] 
Sent: Thursday, December 15, 2011 18:26
To: user@cassandra.apache.org
Subject: RE: [RELEASE] Apache Cassandra 1.0.6 released

Cassandra 1.0.6 under Windows Server 2008 R2 64bit with disk acces mode 
mmap_index_only failing to delete any *-Index.db files after compaction or 
scrub:

ERROR 13:43:17,490 Fatal exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: java.io.IOException: Failed to delete 
D:\cassandra\data\data\system\LocationInfo-g-29-Index.db
at 
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: 
Failed to delete D:\cassandra\data\data\system\LocationInfo-g-29-Index.db
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141)
at 
org.apache.cassandra.io.sstable.SSTableDeletingTask.runMayThrow(SSTableDeletingTask.java:81)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 8 more

ERROR 17:20:09,701 Fatal exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: java.io.IOException: Failed to delete 
D:\cassandra\data\data\Keyspace1\ColumnFamily1-hc-840-Index.db
at 
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source) Caused by: java.io.IOException: 
Failed to delete D:\cassandra\data\data\ Keyspace1\ColumnFamily1-hc-840-Index.db
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141)
at 
org.apache.cassandra.io.sstable.SSTableDeletingTask.runMayThrow(SSTableDeletingTask.java:81)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 8 more




Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-

From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Wednesday, December 14, 2011 20:23
To: user@cassandra.apache.org
Subject: [RELEASE] Apache Cassandra 1.0.6 released

The Cassandra team is pleased to announce the release of Apache Cassandra 
version 1.0.6.

Cassandra is a highly scalable second-generation distributed database, bringing 
together Dynamo's fully distributed design and Bigtable's ColumnFamily-based 
data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is maintenance/bug fix release[1]. As always, please pay attention

RE: [RELEASE] Apache Cassandra 1.0.6 released

2011-12-15 Thread Viktor Jevdokimov
Cassandra 1.0.6 under Windows Server 2008 R2 64bit with disk acces mode 
mmap_index_only failing to delete any *-Index.db files after compaction or 
scrub:

ERROR 13:43:17,490 Fatal exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: java.io.IOException: Failed to delete 
D:\cassandra\data\data\system\LocationInfo-g-29-Index.db
at 
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Failed to delete 
D:\cassandra\data\data\system\LocationInfo-g-29-Index.db
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141)
at 
org.apache.cassandra.io.sstable.SSTableDeletingTask.runMayThrow(SSTableDeletingTask.java:81)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 8 more

ERROR 17:20:09,701 Fatal exception in thread Thread[NonPeriodicTasks:1,5,main]
java.lang.RuntimeException: java.io.IOException: Failed to delete 
D:\cassandra\data\data\Keyspace1\ColumnFamily1-hc-840-Index.db
at 
org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:689)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown
 Source)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown
 Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.io.IOException: Failed to delete D:\cassandra\data\data\ 
Keyspace1\ColumnFamily1-hc-840-Index.db
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:54)
at 
org.apache.cassandra.io.util.FileUtils.deleteWithConfirm(FileUtils.java:44)
at org.apache.cassandra.io.sstable.SSTable.delete(SSTable.java:141)
at 
org.apache.cassandra.io.sstable.SSTableDeletingTask.runMayThrow(SSTableDeletingTask.java:81)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 8 more




Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Wednesday, December 14, 2011 20:23
To: user@cassandra.apache.org
Subject: [RELEASE] Apache Cassandra 1.0.6 released

The Cassandra team is pleased to announce the release of Apache Cassandra 
version 1.0.6.

Cassandra is a highly scalable second-generation distributed database, bringing 
together Dynamo's fully distributed design and Bigtable's ColumnFamily-based 
data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is maintenance/bug fix release[1]. As always, please pay attention 
to the release notes[2] and Let us know[3] if you were to encounter any problem.

Have fun!

[1]: http://goo.gl/Pl1TE (CHANGES.txt)
[2]: http://goo.gl/9xHEC (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA



RE: cassandra most stable version ?

2011-12-07 Thread Viktor Jevdokimov
0.8.7





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Pierre Chalamet [mailto:pie...@chalamet.net]
Sent: Wednesday, December 07, 2011 00:05
To: user@cassandra.apache.org
Subject: cassandra most stable version ?

Hello,

Recent problems with Cassandra 1.0.x versions seems to tell it is still not 
ready for prime time.

We are currently using version 0.8.5 on our development cluster - although we 
have not seen much problems with this one, maybe recent versions of 0.8.x might 
be safer to use.

So what version are you running in production ? What kinds of problems do you 
encounter if any ?

Thanks,
- Pierre
inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png

NetworkTopologyStrategy bug?

2011-12-01 Thread Viktor Jevdokimov
Assume for now we have 1 DC and 1 rack with 3 nodes. Ring will look like:
(we use own snitch, which returns DC=0, Rack=0 for this case).

AddressDC   Rack   Token

  113427455640312821154458202477256070484
10.0.0.1 0  0  0
10.0.0.2 0  0  
56713727820156410577229101238628035242
10.0.0.3 0  0  
113427455640312821154458202477256070484

Schema: ReplicaPlacementStrategy=NetworkTopologyStrategy, options: [0:2] (2 
replicas in DC 0).

When trying to run cleanup (same problem with repair), Cassandra reports:

From 10.0.0.1:
DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.3,10.0.0.2 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

From 10.0.0.2:
DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.3,10.0.0.1 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

From 10.0.0.3:
DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.2,10.0.0.1 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

For me this means, that one node thinks that whole data range is on other two 
nodes.

As a result:

WRITE request with any key/any token sent to 10.0.0.1 controller will be 
forwarded and saved on 10.0.0.2 and 10.0.0.3
READ request with CL.One with any key/any token sent to 10.0.0.2 controller 
will be forwarded to 10.0.0.1 or 10.0.0.3, and since 10.0.0.1 can't have data 
for write above, some requests fails, some don't (if 10.0.0.3 answers).
More of it, every READ request to any node will be forwarded to other node.

That what we have right now with 0.8.6 and up to 1.0.5 as with 3 nodes in 1 DC, 
as with 8x2 nodes.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png

RE: NetworkTopologyStrategy bug?

2011-12-01 Thread Viktor Jevdokimov
Sorry, the bug was in our snitch. We're using getHostName() instead of 
getCanonicalHostName() to determine DC  Rack and since for local it returns 
alias, instead of reverse DNS, DC  Rack numbers are not as expected.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com]
Sent: Thursday, December 01, 2011 14:05
To: user@cassandra.apache.org
Subject: NetworkTopologyStrategy bug?

Assume for now we have 1 DC and 1 rack with 3 nodes. Ring will look like:
(we use own snitch, which returns DC=0, Rack=0 for this case).

AddressDC   Rack   Token

  113427455640312821154458202477256070484
10.0.0.1 0  0  0
10.0.0.2 0  0  
56713727820156410577229101238628035242
10.0.0.3 0  0  
113427455640312821154458202477256070484

Schema: ReplicaPlacementStrategy=NetworkTopologyStrategy, options: [0:2] (2 
replicas in DC 0).

When trying to run cleanup (same problem with repair), Cassandra reports:

From 10.0.0.1:
DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.2,10.0.0.3 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.3,10.0.0.2 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

From 10.0.0.2:
DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.1,10.0.0.3 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.3,10.0.0.1 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

From 10.0.0.3:
DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 0
DEBUG [time] 10.0.0.1,10.0.0.2 endpoints in datacenter 0 for token 
56713727820156410577229101238628035242
DEBUG [time] 10.0.0.2,10.0.0.1 endpoints in datacenter 0 for token 
113427455640312821154458202477256070484
INFO [time] Cleanup cannot run before a node has joined the ring

For me this means, that one node thinks that whole data range is on other two 
nodes.

As a result:

WRITE request with any key/any token sent to 10.0.0.1 controller will be 
forwarded and saved on 10.0.0.2 and 10.0.0.3
READ request with CL.One with any key/any token sent to 10.0.0.2 controller 
will be forwarded to 10.0.0.1 or 10.0.0.3, and since 10.0.0.1 can't have data 
for write above, some requests fails, some don't (if 10.0.0.3 answers).
More of it, every READ request to any node will be forwarded to other node.

That what we have right now with 0.8.6 and up to 1.0.5 as with 3 nodes in 1 DC, 
as with 8x2 nodes.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider


Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: image001.pnginline: image002.pnginline: image003.pnginline: signature-logo46e2.pnginline: dm-exco578c.pnginline: tweet7db.png

Can't run cleanup

2011-11-30 Thread Viktor Jevdokimov
Cassandra version 0.8.7, after adding new nodes we can't run cleanup on any 
node.

Log reports: Cleanup cannot run before a node has joined the ring

New nodes has joined (one by one), all nodes up  running, reading, writing...
Not sending/receiving any streams on any node for more than 12 hours.
Nodetool's info/ring/tpstats/netstats for all nodes looks fine.

Restart don't help.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: signature-logo29.pnginline: dm-exco4823.pnginline: tweet18be.png

RE: Can't run cleanup

2011-11-30 Thread Viktor Jevdokimov
Nodetool repair also don't start on all nodes, log is reporting:
INFO 15:57:51,070 Starting repair command #2, repairing 0 ranges.
INFO 15:57:51,070 Repair command #2 completed successfully
Regular read repairs are working as reads and writes.




Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Viktor Jevdokimov [mailto:viktor.jevdoki...@adform.com]
Sent: Wednesday, November 30, 2011 15:14
To: user@cassandra.apache.org
Subject: Can't run cleanup

Cassandra version 0.8.7, after adding new nodes we can't run cleanup on any 
node.

Log reports: Cleanup cannot run before a node has joined the ring

New nodes has joined (one by one), all nodes up  running, reading, writing...
Not sending/receiving any streams on any node for more than 12 hours.
Nodetool's info/ring/tpstats/netstats for all nodes looks fine.

Restart don't help.





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[Adform news]http://www.adform.com/

[Visit us!]

Follow:


[twitter]http://twitter.com/#!/adforminsider


Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

inline: image001.pnginline: image002.pnginline: image003.pnginline: signature-logo5507.pnginline: dm-exco2d8.pnginline: tweet465.png

RE: Cassandra 1.x and proper JNA setup

2011-11-02 Thread Viktor Jevdokimov
Up, also interested in answers to questions below.


Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Maciej Miklas [mailto:mac.mik...@googlemail.com]
Sent: Tuesday, November 01, 2011 11:15
To: user@cassandra.apache.org
Subject: Cassandra 1.x and proper JNA setup

Hi all,

is there any documentation about proper JNA configuration?

I do not understand few things:

1) Does JNA use JVM heap settings?

2) Do I need to decrease max heap size while using JNA?

3) How do I limit RAM allocated by JNA?

4) Where can I see / monitor row cache size?

5) I've configured JNA just for test on my dev computer and so far I've noticed 
serious performance issues (high cpu usage on heavy write load), so I must be 
doing something wrong I've just copied JNA jars into Cassandra/lib, without 
installing any native libs. This should not work at all, right?

Thanks,
Maciej



RE: [RELEASE] Apache Cassandra 1.0 released

2011-10-18 Thread Viktor Jevdokimov
Congrats!!!


Best regards/ Pagarbiai

Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.com
Phone: +370 5 212 3063
Fax: +370 5 261 0453

J. Jasinskio 16C,
LT-01112 Vilnius,
Lithuania



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.-Original Message-
From: Sylvain Lebresne [mailto:sylv...@datastax.com]
Sent: Tuesday, October 18, 2011 15:02
To: user@cassandra.apache.org
Subject: [RELEASE] Apache Cassandra 1.0 released

The Cassandra team is very pleased to announce the release of Apache Cassandra 
version 1.0.0. Cassandra 1.0.0 is a new major release that build upon the 
awesomeness of previous versions and adds numerous improvements[1,2], amongst
which:
  - Compression of on-disk data files (SSTables), with checksummed blocks to
protect against bitrot[4].
  - Improvements to memory management through off-heap caches, arena
allocation and automatic self-tuning, for less GC pauses and more
predictable performances[5].
  - Better disk-space management: better control of the space taken by commit
logs and immediate deletion of obsolete data files.
  - New optional leveled compaction strategy with more predictable performance
and fixed sstable size[6].
  - Improved hinted handoffs, leading to less need for read repair for
better read performances.
  - Lots of improvements to performance[7], CQL, repair, easier operation,
etc[8]...

And as is the rule for some time now, rolling upgrades from previous versions 
are supported, so there is nothing stopping you to get all those goodies right 
now!

Both source and binary distributions of Cassandra 1.0.0 can be downloaded at:

 http://cassandra.apache.org/download/

Or you can use the debian package available from the project APT repository[3] 
(you will need to use the 10x series).

The download page also link to the CQL drivers that, from this release on, are 
maintained out of tree[9].


That's all folks!

[1]: http://goo.gl/t3qpw (CHANGES.txt)
[2]: http://goo.gl/6t0qN (NEWS.txt)
[3]: http://wiki.apache.org/cassandra/DebianPackaging
[4]: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression
[5]: 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-improved-memory-and-disk-space-management
[6]: http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
[7]: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-performance
[8]: 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-windows-service-new-cql-clients-and-more
[9]: http://acunu.com/blogs/eric-evans/cassandra-drivers-released/




Build Cassandra under Windows

2011-09-23 Thread Viktor Jevdokimov
Hello,

I'm trying to build Cassandra 0.8 and 1.0.0 branches with no success on
Windows, getting errors:

...
maven-ant-tasks-retrieve-build:
[artifact:dependencies] Downloading: asm/asm/3.2/asm-3.2-sources.jar from
repository central at http://repo1.maven.org/maven2
[artifact:dependencies] Unable to locate resource in repository
[artifact:dependencies] [INFO] Unable to find resource
'asm:asm:java-source:sources:3.2' in repository central (
http://repo1.maven.org/maven2)
[artifact:dependencies] Downloading: asm/asm/3.2/asm-3.2-sources.jar from
repository apache at
https://repository.apache.org/content/repositories/releases
[artifact:dependencies] Unable to locate resource in repository
...
and so on.

I have checked build/build-dependencies.xml and all files referenced are
downloaded to local maven repository (${user.home}/.m2/repository)
successfully.

Environment:
Windows 7 Professional x64
Ant 1.8.2
JDK 1.6.0 b27

I'm a .NET developer with no experience building JAVA projects with ant.

What have I missed?


Thanks,
Viktor


RE: Build Cassandra under Windows

2011-09-23 Thread Viktor Jevdokimov
Spolved - just used appropriate ant's targets to get jars built.




Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[cid:signature-logo793c.png]http://www.adform.com/

[cid:dm-exco5a15.png]

Follow:


[cid:tweet354f.png]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Viktor Jevdokimov [mailto:vjevdoki...@gmail.com]
Sent: Friday, September 23, 2011 10:02
To: user@cassandra.apache.org
Subject: Build Cassandra under Windows

Hello,

I'm trying to build Cassandra 0.8 and 1.0.0 branches with no success on 
Windows, getting errors:

...
maven-ant-tasks-retrieve-build:
[artifact:dependencies] Downloading: asm/asm/3.2/asm-3.2-sources.jar from 
repository central at http://repo1.maven.org/maven2
[artifact:dependencies] Unable to locate resource in repository
[artifact:dependencies] [INFO] Unable to find resource 
'asm:asm:java-source:sources:3.2' in repository central 
(http://repo1.maven.org/maven2)
[artifact:dependencies] Downloading: asm/asm/3.2/asm-3.2-sources.jar from 
repository apache at https://repository.apache.org/content/repositories/releases
[artifact:dependencies] Unable to locate resource in repository
...
and so on.

I have checked build/build-dependencies.xml and all files referenced are 
downloaded to local maven repository (${user.home}/.m2/repository) successfully.

Environment:
Windows 7 Professional x64
Ant 1.8.2
JDK 1.6.0 b27

I'm a .NET developer with no experience building JAVA projects with ant.

What have I missed?


Thanks,
Viktor
inline: signature-logo793c.pnginline: dm-exco5a15.pnginline: tweet354f.png

RE: How to enable JNA for Cassandra on Windows?

2011-09-23 Thread Viktor Jevdokimov
I found that there's no C library under Windows, and msvcrt does not provide 
mlockall function, so currently there's no way to use JNA under Windows. That 
way mmap is not a good idea?





Best regards/ Pagarbiai



Viktor Jevdokimov

Senior Developer



Email:  viktor.jevdoki...@adform.com

Phone: +370 5 212 3063. Fax: +370 5 261 0453

J. Jasinskio 16C, LT-01112 Vilnius, Lithuania






[cid:signature-logo3d60.png]http://www.adform.com/

[cid:dm-exco6cf8.png]

Follow:


[cid:tweet5595.png]http://twitter.com/#!/adforminsider

Visit our bloghttp://www.adform.com/site/blog



Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Viktor Jevdokimov [mailto:vjevdoki...@gmail.com]
Sent: Thursday, September 22, 2011 15:01
To: user@cassandra.apache.org
Subject: How to enable JNA for Cassandra on Windows?

Hi,

I'm trying without success to enable JNA for Cassandra on Windows.

Tried to place JNA 3.3.0 libs jna.jar and platform.jar into Cassandra 0.8.6 lib 
dir, but getting in log:
Unable to link C library. Native methods will be disabled.

What is missed or what is wrong?

One thing I've found on inet about JNA and Windows is this sample:



// Library is c for unix and msvcrt for windows
String libName = c;
if (System.getProperty(os.namehttp://os.name).contains(Windows))
{
  libName = msvcrt;


}


// Loading dynamically the library
CInterface demo = (CInterface) Native.loadLibrary(libName, CInterface.class);

from http://www.scriptol.com/programming/jna.php

while in Cassandra:



try

{

Native.register(c);

}

catch (NoClassDefFoundError e)

{

logger.infohttp://logger.info(JNA not found. Native methods will be 
disabled.);

}

catch (UnsatisfiedLinkError e)

{

logger.infohttp://logger.info(Unable to link C library. Native 
methods will be disabled.);

}

catch (NoSuchMethodError e)

{

logger.warn(Obsolete version of JNA present; unable to register C 
library. Upgrade to JNA 3.2.7 or later);

}

Is it true that for Windows Cassandra should do something like:



if (System.getProperty(os.namehttp://os.name).contains(Windows))
{
Native.register(msvcrt);
}

else

{

Native.register(c);
}


Thanks
Viktor
inline: signature-logo3d60.pnginline: dm-exco6cf8.pnginline: tweet5595.png

  1   2   >