date:20150420

Try Brian's cassandra-unloader
https://github.com/brianmhess/cassandra-loader#cassandra-unloader

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi nehajtriv...@gmail.com
wrote:

 Does the nproc,nofile,memlock settings in
 /etc/security/limits.d/cassandra.conf are set to optimum value ?
 it's all default.

 What is the consistency level ?
 CL = Qurom

 Is there any other way to export a table to CSV?

 regards
 Neha

 On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Hi,

 Thanks for the info,

 Does the nproc,nofile,memlock settings in
 /etc/security/limits.d/cassandra.conf are set to optimum value ?

 What is the consistency level ?

 Best Regardds,
 Kiran.M.K.


 On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 hi,

 What is the count of records in the column-family ?
   We have about 38,000 Rows in the column-family for which we are
 trying to export
 What  is the Cassandra Version ?
  We are using Cassandra 2.0.11

 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
 The Server is 8 GB.

 regards
 Neha

 On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Hi,

 check  the MAX_HEAP_SIZE configuration in cassandra-env.sh environment
 file

 Also HEAP_NEWSIZE ?

 What is the Consistency Level you are using ?

 Best REgards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Seems like the is related to JAVA HEAP Memory.

 What is the count of records in the column-family ?

 What  is the Cassandra Version ?

 Best Regards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com
  wrote:

 Hello all,

 We are getting the OutOfMemoryError on one of the Node and the Node
 is down, when we run the export command to get all the data from a table.


 Regards
 Neha




 ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java
 (line 199) Exception in thread Thread[ReadStage:532074,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
 at
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
 at
 org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
 at
 org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
 at
 org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
 at
 org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
 at
 org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88)
 at
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
 at
 org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82)
 at
 org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
 at
 org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
 at
 org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200)
 at

Bootstrap performance.

2015-04-20 Thread Dikang Gu

Hi guys,

We have a 100+ nodes cluster, each node has about 400G data, and is running
on a flash disk. We are running 2.1.2.

When I bring in a new node into the cluster, it introduces significant load
to the cluster. For the new node, the cpu usage is 100%, but disk write io
is only around 50MB/s, while we have 10G network.

Does it sound normal to you?

Here are some iostat and vmstat metrics:
 iostat 
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  88.523.994.110.000.003.38

Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn
sda   1.00 0.00 0.04  0  0
sdb 156.50 0.0055.62  01

 vmstat =
138  0  0 86781912 438780 10152336800 0 31893 264496 247316
95  4  1  0  0  2015-04-21 01:04:01 UTC
147  0  0 86562400 438780 10160724800 0 90510 456635 245849
91  5  4  0  0  2015-04-21 01:04:03 UTC
143  0  0 86341168 438780 10169222400 0 32392 284495 273656
92  4  4  0  0  2015-04-21 01:04:05 UTC

Thanks.
-- 
Dikang

Re: timeout creating table

Can you grep for GCInspector in your system.log? Maybe you have long GC
pauses.

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Apr 20, 2015 at 12:19 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 Yes, sometimes it is create table and sometime it is create index.
 It doesn't happen all the time, but feel like if multiple tests trying to
 do schema change(create or drop), Cassandra has a long delay on the schema
 change statements.

 I also just read about auto_snapshot, and I turn it off but still no
 luck.



 On Mon, Apr 20, 2015 at 6:42 AM, Jim Witschey jim.witsc...@datastax.com
 wrote:

 Jimmy,

 What's the exact command that produced this trace? Are you saying that
 the 16-second wait in your trace what times out in your CREATE TABLE
 statements?

 Jim Witschey

 Software Engineer in Test | jim.witsc...@datastax.com

 On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:
  hi,
  we have some unit tests that run parallel that will create tmp
 keyspace, and
  tables and then drop them after tests are done.
 
  From time to time, our create table statement run into All hosts(s) for
  query failed... Timeout during read (from datastax driver) error.
 
  We later turn on tracing, and record something  in the following.
  See below between === , Native_Transport-Request thread and
 MigrationStage
  thread, there was like 16 seconds doing something.
 
  Any idea what that 16 seconds Cassandra was doing? We can work around
 that
  but increasing our datastax driver timeout value, but wondering if
 there is
  actually better way to solve this?
 
  thanks
 
 
 
   tracing --
 
 
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d97-e6e2-11e4-823d-93572f3db015
  |
  Key cache hit for sstable 95588 | 127.0.0.1 |   1592 |
  Native-Transport-Requests:102
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d98-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition beginning in data file | 127.0.0.1 |   1593 |
  Native-Transport-Requests:102
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d99-e6e2-11e4-823d-93572f3db015
  |
 Merging
  data from memtables and 3 sstables | 127.0.0.1 |   1595 |
  Native-Transport-Requests:102
 
  =
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d9a-e6e2-11e4-823d-93572f3db015
  |
  Read 3 live and 0 tombstoned cells | 127.0.0.1 |   1610 |
  Native-Transport-Requests:102
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a40-e6e2-11e4-823d-93572f3db015
  |   Executing seq scan across 1 sstables for
  (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 |
  16381594 |  MigrationStage:1
  =
 
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a41-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition beginning in data file | 127.0.0.1 |   16381782 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a42-e6e2-11e4-823d-93572f3db015
  |
  Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381787 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a43-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition beginning in data file | 127.0.0.1 |   16381789 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a44-e6e2-11e4-823d-93572f3db015
  |
  Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381791 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a45-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition beginning in data file | 127.0.0.1 |   16381792 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a46-e6e2-11e4-823d-93572f3db015
  |
  Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381794 |
  MigrationStage:1
  .
  .
  .

Re: COPY command to export a table to CSV file

Blobs are ByteBuffer s  it calls getBytes().toString:

https://github.com/brianmhess/cassandra-loader/blob/master/src/main/java/com/datastax/loader/parser/ByteBufferParser.java#L35

All the best,


[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Apr 20, 2015 at 5:47 PM, Serega Sheypak serega.shey...@gmail.com
wrote:

 hi, what happens if unloader meets blob field?

 2015-04-20 23:43 GMT+02:00 Sebastian Estevez 
 sebastian.este...@datastax.com:

 Try Brian's cassandra-unloader
 https://github.com/brianmhess/cassandra-loader#cassandra-unloader

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Does the nproc,nofile,memlock settings in
 /etc/security/limits.d/cassandra.conf are set to optimum value ?
 it's all default.

 What is the consistency level ?
 CL = Qurom

 Is there any other way to export a table to CSV?

 regards
 Neha

 On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Hi,

 Thanks for the info,

 Does the nproc,nofile,memlock settings in
 /etc/security/limits.d/cassandra.conf are set to optimum value ?

 What is the consistency level ?

 Best Regardds,
 Kiran.M.K.


 On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 hi,

 What is the count of records in the column-family ?
   We have about 38,000 Rows in the column-family for which we are
 trying to export
 What  is the Cassandra Version ?
  We are using Cassandra 2.0.11

 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
 The Server is 8 GB.

 regards
 Neha

 On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Hi,

 check  the MAX_HEAP_SIZE configuration in cassandra-env.sh
 environment file

 Also HEAP_NEWSIZE ?

 What is the Consistency Level you are using ?

 Best REgards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Seems like the is related to JAVA HEAP Memory.

 What is the count of records in the column-family ?

 What  is the Cassandra Version ?

 Best Regards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi 
 nehajtriv...@gmail.com wrote:

 Hello all,

 We are getting the OutOfMemoryError on one of the Node and the Node
 is down, when we run the export command to get all the data from a 
 table.


 Regards
 Neha




 ERROR [ReadStage:532074] 2015-04-09 01:04:00,603
 CassandraDaemon.java (line 199) Exception in thread
 Thread[ReadStage:532074,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
 at
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
 at
 org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
 at
 org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
 at
 org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
 at
 org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
 at
 org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at

Re: CQL 3.x Update ...USING TIMESTAMP...

2015-04-20 Thread Sachin Nikam

Tyler,
I can consider trying out light weight transactions, but here are my
concerns
#1. We have 2 data centers located close by with plans to expand to more
data centers which are even further away geographically.
#2. How will this impact light weight transactions when there is high level
of network contention for cross data center traffic.
#3. Do you know of any real examples where companies have used light weight
transactions in a multi-data center traffic.
Regards
Sachin

On Tue, Mar 24, 2015 at 10:56 AM, Tyler Hobbs ty...@datastax.com wrote:

 do you just mean that it's easy to forget to always set your timestamp
 correctly, and if you goof it up, it makes it difficult to recover from
 (i.e. you issue a delete with system timestamp instead of document version,
 and that's way larger than your document version would ever be, so you can
 never write that document again)?


 Yes, that's basically what I meant.  Plus, if you need to make a manual
 correction to a document, you'll need to increment the version, which would
 presumably cause problems for your application.  It's possible to handle
 all of this correctly if you take care, but I wouldn't trust myself to
 always get this right.


 @Tyler
 With your recommendation, won't I end up saving all the version(s) of the
 document. In my case the document is pretty huge (~5mb) and each document
 has up to 10 versions. And you already highlighted that light weight
 transactions are very expensive.


 You can always delete older versions to free up space.

 Using lightweight transactions may be a decent option if you don't have
 really high write throughput and aren't expecting high contention (which I
 don't think you are).  I recommend testing this out with your application
 to see how it performs for you.


 On Sun, Mar 22, 2015 at 7:02 PM, Sachin Nikam skni...@gmail.com wrote:

 @Eric Stevens
 Thanks for representing my position while I came back to this thread.

 @Tyler
 With your recommendation, won't I end up saving all the version(s) of the
 document. In my case the document is pretty huge (~5mb) and each document
 has up to 10 versions. And you already highlighted that light weight
 transactions are very expensive.

 Also as Eric mentions, can you elaborate on what kind of problems could
 happen when we try to overwrite or delete data?
 Regards
 Sachin

 On Fri, Mar 13, 2015 at 4:23 AM, Brice Dutheil brice.duth...@gmail.com
 wrote:

 I agree with Tyler, in the normal run of a live application I would not
 recommend the use of the timestamp, and use other ways to *version*
 *inserts*. Otherwise you may fall in the *upsert* pitfalls that Tyler
 mentions.

 However I find there’s a legitimate use the USING TIMESTAMP trick, when
 migrating data form another datastore.

 The trick is at some point to enable the application to start writing
 cassandra *without* any timestamp setting on the statements. ⇐ for
 fresh data
 Then start a migration batch that will use a write time with an older
 date (i.e. when there’s *no* possible *collision* with other data). ⇐
 for older data

 *This tricks has been used in prod with billions of records.*
 

 -- Brice

 On Thu, Mar 12, 2015 at 10:42 PM, Eric Stevens migh...@gmail.com
 wrote:

 Ok, but if you're using a system of time that isn't server clock
 oriented (Sachin's document revision ID, and my fixed and necessarily
 consistent base timestamp [B's always know their parent A's exact recorded
 timestamp]), isn't the principle of using timestamps to force a particular
 update out of several to win still sound?

  as using the clocks is only valid if clocks are perfectly sync'ed,
 which they are not

 Clock skew is a problem which doesn't seem to be a factor in either use
 case given that both have a consistent external source of truth for
 timestamp.

 On Thu, Mar 12, 2015 at 12:58 PM, Jonathan Haddad j...@jonhaddad.com
 wrote:

 In most datacenters you're going to see significant variance in your
 server times.  Likely  20ms between servers in the same rack.  Even
 google, using atomic clocks, has 1-7ms variance.  [1]

 I would +1 Tyler's advice here, as using the clocks is only valid if
 clocks are perfectly sync'ed, which they are not, and likely never will be
 in our lifetime.

 [1] http://queue.acm.org/detail.cfm?id=2745385


 On Thu, Mar 12, 2015 at 7:04 AM Eric Stevens migh...@gmail.com
 wrote:

  It's possible, but you'll end up with problems when attempting to
 overwrite or delete entries

 I'm wondering if you can elucidate on that a little bit, do you just
 mean that it's easy to forget to always set your timestamp correctly, and
 if you goof it up, it makes it difficult to recover from (i.e. you issue 
 a
 delete with system timestamp instead of document version, and that's way
 larger than your document version would ever be, so you can never write
 that document again)?  Or is there some bug in write timestamps that can
 cause the wrong entry to win the write contention?

 We're looking at doing

Re: COPY command to export a table to CSV file

Thanks Sebastian, I will try it out.
But I am also curious why is the COPY command failing with Out of Memory
Error.

regards
Neha

On Tue, Apr 21, 2015 at 4:35 AM, Sebastian Estevez 
sebastian.este...@datastax.com wrote:

 Blobs are ByteBuffer s  it calls getBytes().toString:


 https://github.com/brianmhess/cassandra-loader/blob/master/src/main/java/com/datastax/loader/parser/ByteBufferParser.java#L35

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Mon, Apr 20, 2015 at 5:47 PM, Serega Sheypak serega.shey...@gmail.com
 wrote:

 hi, what happens if unloader meets blob field?

 2015-04-20 23:43 GMT+02:00 Sebastian Estevez 
 sebastian.este...@datastax.com:

 Try Brian's cassandra-unloader
 https://github.com/brianmhess/cassandra-loader#cassandra-unloader

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Does the nproc,nofile,memlock settings in
 /etc/security/limits.d/cassandra.conf are set to optimum value ?
 it's all default.

 What is the consistency level ?
 CL = Qurom

 Is there any other way to export a table to CSV?

 regards
 Neha

 On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Hi,

 Thanks for the info,

 Does the nproc,nofile,memlock settings in
 /etc/security/limits.d/cassandra.conf are set to optimum value ?

 What is the consistency level ?

 Best Regardds,
 Kiran.M.K.


 On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com
  wrote:

 hi,

 What is the count of records in the column-family ?
   We have about 38,000 Rows in the column-family for which we are
 trying to export
 What  is the Cassandra Version ?
  We are using Cassandra 2.0.11

 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
 The Server is 8 GB.

 regards
 Neha

 On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Hi,

 check  the MAX_HEAP_SIZE configuration in cassandra-env.sh
 environment file

 Also HEAP_NEWSIZE ?

 What is the Consistency Level you are using ?

 Best REgards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Seems like the is related to JAVA HEAP Memory.

 What is the count of records in the column-family ?

 What  is the Cassandra Version ?

 Best Regards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi 
 nehajtriv...@gmail.com wrote:

 Hello all,

 We are getting the OutOfMemoryError on one of the Node and the
 Node is down, when we run the export command to get all the data from 
 a
 table.


 Regards
 Neha




 ERROR [ReadStage:532074] 2015-04-09 01:04:00,603
 CassandraDaemon.java (line 199) Exception in thread
 Thread[ReadStage:532074,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
 at
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
 at
 org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
 at
 org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
 at
 org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
 at

Re: timeout creating table

2015-04-20 Thread Jimmy Lin

hi,
there were only a few (4 of them across 4 minutes with around 200ms), so
shouldn't be the reason

The system log has tons of
 INFO [MigrationStage:1] 2015-04-20 11:03:21,880 ColumnFamilyStore.java
(line 633) Enqueuing flush of Memtable-schema_keyspaces@2079381036(138/1215
serialized/live bytes, 3 ops)
 INFO [MigrationStage:1] 2015-04-20 11:03:21,900 ColumnFamilyStore.java
(line 633) Enqueuing flush of
Memtable-schema_columnfamilies@1283263314(1036/3946
serialized/live bytes, 24 ops)
 INFO [MigrationStage:1] 2015-04-20 11:03:21,921 ColumnFamilyStore.java
(line 633) Enqueuing flush of Memtable-schema_columns

But that could be just normal given that our unit tests are doing lot of
droping keyspace and creating keyspace/tables.

I read the MigrationStage thread pool is default to one, so wondering if
that could be a reason it may be doing something that block others?



On Mon, Apr 20, 2015 at 2:40 PM, Sebastian Estevez 
sebastian.este...@datastax.com wrote:

 Can you grep for GCInspector in your system.log? Maybe you have long GC
 pauses.

 All the best,


 [image: datastax_logo.png] http://www.datastax.com/

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png] https://www.linkedin.com/company/datastax [image:
 facebook.png] https://www.facebook.com/datastax [image: twitter.png]
 https://twitter.com/datastax [image: g+.png]
 https://plus.google.com/+Datastax/about
 http://feeds.feedburner.com/datastax

 http://cassandrasummit-datastax.com/

 DataStax is the fastest, most scalable distributed database technology,
 delivering Apache Cassandra to the world’s most innovative enterprises.
 Datastax is built to be agile, always-on, and predictably scalable to any
 size. With more than 500 customers in 45 countries, DataStax is the
 database technology and transactional backbone of choice for the worlds
 most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Mon, Apr 20, 2015 at 12:19 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 Yes, sometimes it is create table and sometime it is create index.
 It doesn't happen all the time, but feel like if multiple tests trying to
 do schema change(create or drop), Cassandra has a long delay on the schema
 change statements.

 I also just read about auto_snapshot, and I turn it off but still no
 luck.



 On Mon, Apr 20, 2015 at 6:42 AM, Jim Witschey jim.witsc...@datastax.com
 wrote:

 Jimmy,

 What's the exact command that produced this trace? Are you saying that
 the 16-second wait in your trace what times out in your CREATE TABLE
 statements?

 Jim Witschey

 Software Engineer in Test | jim.witsc...@datastax.com

 On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin y2klyf+w...@gmail.com
 wrote:
  hi,
  we have some unit tests that run parallel that will create tmp
 keyspace, and
  tables and then drop them after tests are done.
 
  From time to time, our create table statement run into All hosts(s)
 for
  query failed... Timeout during read (from datastax driver) error.
 
  We later turn on tracing, and record something  in the following.
  See below between === , Native_Transport-Request thread and
 MigrationStage
  thread, there was like 16 seconds doing something.
 
  Any idea what that 16 seconds Cassandra was doing? We can work around
 that
  but increasing our datastax driver timeout value, but wondering if
 there is
  actually better way to solve this?
 
  thanks
 
 
 
   tracing --
 
 
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d97-e6e2-11e4-823d-93572f3db015
  |
  Key cache hit for sstable 95588 | 127.0.0.1 |   1592 |
  Native-Transport-Requests:102
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d98-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition beginning in data file | 127.0.0.1 |   1593 |
  Native-Transport-Requests:102
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d99-e6e2-11e4-823d-93572f3db015
  |
 Merging
  data from memtables and 3 sstables | 127.0.0.1 |   1595 |
  Native-Transport-Requests:102
 
  =
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d9a-e6e2-11e4-823d-93572f3db015
  |
  Read 3 live and 0 tombstoned cells | 127.0.0.1 |   1610 |
  Native-Transport-Requests:102
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a40-e6e2-11e4-823d-93572f3db015
  |   Executing seq scan across 1 sstables for
  (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 |
  16381594 |  MigrationStage:1
  =
 
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a41-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition beginning in data file | 127.0.0.1 |   16381782 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a42-e6e2-11e4-823d-93572f3db015
  |
  Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381787 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a43-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition

Re: Bootstrap performance.

2015-04-20 Thread Robert Coli

On Mon, Apr 20, 2015 at 6:08 PM, Dikang Gu dikan...@gmail.com wrote:

 When I bring in a new node into the cluster, it introduces significant
 load to the cluster. For the new node, the cpu usage is 100%, but disk
 write io is only around 50MB/s, while we have 10G network.

 Does it sound normal to you?


Have you unthrottled both compaction and streaming via JMX/nodetool?

Streaming is single threaded and can (?) be CPU bound, I would not be
surprised if JIRA contains a ticket on the upper bounds of streaming
performance in current implementation.

=Rob

Re: Bootstrap performance.

2015-04-20 Thread Dikang Gu

Hi Rob,

Why do you say steaming is single threaded? I see a lot of background
streaming threads running, for example:

STREAM-IN-/10.210.165.49 daemon prio=10 tid=0x7f81fc001000
nid=0x107075 runnable [0x7f836b256000]
STREAM-IN-/10.213.51.57 daemon prio=10 tid=0x7f81f0002000
nid=0x107073 runnable [0x7f836b1d4000]
STREAM-IN-/10.213.51.61 daemon prio=10 tid=0x7f81e8001000
nid=0x107070 runnable [0x7f836b11]
STREAM-IN-/10.213.51.63 daemon prio=10 tid=0x7f81dc001800
nid=0x10706f runnable [0x7f836b0cf000]

Thanks
Dikang.

On Mon, Apr 20, 2015 at 6:48 PM, Robert Coli rc...@eventbrite.com wrote:

 On Mon, Apr 20, 2015 at 6:08 PM, Dikang Gu dikan...@gmail.com wrote:

 When I bring in a new node into the cluster, it introduces significant
 load to the cluster. For the new node, the cpu usage is 100%, but disk
 write io is only around 50MB/s, while we have 10G network.

 Does it sound normal to you?


 Have you unthrottled both compaction and streaming via JMX/nodetool?

 Streaming is single threaded and can (?) be CPU bound, I would not be
 surprised if JIRA contains a ticket on the upper bounds of streaming
 performance in current implementation.

 =Rob







-- 
Dikang

Re: COPY command to export a table to CSV file

Values in /etc/security/limits.d/cassandra.conf

# Provided by the cassandra package
cassandra  -  memlock  unlimited
cassandra  -  nofile   10


On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com wrote:

 Hi,

 Thanks for the info,

 Does the nproc,nofile,memlock settings in
 /etc/security/limits.d/cassandra.conf are set to optimum value ?

 What is the consistency level ?

 Best Regardds,
 Kiran.M.K.


 On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 hi,

 What is the count of records in the column-family ?
   We have about 38,000 Rows in the column-family for which we are
 trying to export
 What  is the Cassandra Version ?
  We are using Cassandra 2.0.11

 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
 The Server is 8 GB.

 regards
 Neha

 On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Hi,

 check  the MAX_HEAP_SIZE configuration in cassandra-env.sh environment
 file

 Also HEAP_NEWSIZE ?

 What is the Consistency Level you are using ?

 Best REgards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Seems like the is related to JAVA HEAP Memory.

 What is the count of records in the column-family ?

 What  is the Cassandra Version ?

 Best Regards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hello all,

 We are getting the OutOfMemoryError on one of the Node and the Node is
 down, when we run the export command to get all the data from a table.


 Regards
 Neha




 ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java
 (line 199) Exception in thread Thread[ReadStage:532074,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
 at
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
 at
 org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
 at
 org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
 at
 org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
 at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
 at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88)
 at
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
 at
 org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82)
 at
 org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
 at
 org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
 at
 org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185)
 at
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
 at
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
 at
 org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101)
 at
 org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75)
 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)





 --
 Best Regards,
 Kiran.M.K.




 --
 Best Regards,
 Kiran.M.K.





 --
 Best Regards,
 Kiran.M.K.

Re: Adding nodes to existing cluster

2015-04-20 Thread Carlos Rolo

Start one node at a time. Wait 2 minutes before starting each node.

How much data and nodes you have already? Depending on that, the streaming
of data can stress on the resources you have.
I would recommend to start one and monitor, if things are ok, add another
one. And so on.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote:

Hi all,
In the near future I'll need to add more than 10 nodes to a 2.0.9
cluster (using vnodes).
I read this documentation on datastax website:

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

In one point it says:
If you are using racks, you can safely bootstrap two nodes at a time
when both nodes are on the same rack.

And in another is says:
Start Cassandra on each new node. Allow two minutes between node
initializations. You can monitor the startup and data streaming
process using nodetool netstats.

We're not using racks configuration and from reading this
documentation I'm not really sure is it safe for us to bootstrap all
nodes together (with two minutes between each other).
I really hate the tought of doing it one by one, I assume it will take
more than 6H per node.

What do you say?
--
Or Sher

Re: Adding nodes to existing cluster

2015-04-20 Thread Colin Clark

unsubscribe

On Apr 20, 2015, at 8:08 AM, Carlos Rolo r...@pythian.com wrote:

Independent of the snitch, data needs to travel to the new nodes (plus all
the keyspace information that goes via gossip). So I won't bootstrap them all
at once, even if it is only for network traffic generated.

Don't forget to run cleanup on the old nodes once all nodes are in place to
reclaim disk space.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com http://www.pythian.com/
On Mon, Apr 20, 2015 at 1:58 PM, Or Sher or.sh...@gmail.com
mailto:or.sh...@gmail.com wrote:
Thanks for the response.
Sure we'll monitor as we're adding nodes.
We're now using 6 nodes on each DC. (We have 2 DCs)
Each node contains ~800GB

Do you know how rack configurations are relevant here?
Do you see any reason to bootstrap them one by one if we're not using
rack awareness?

On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com
mailto:r...@pythian.com wrote:
Start one node at a time. Wait 2 minutes before starting each node.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin:
linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 tel:%2B31%206%20159%2061%20814 | Tel: +1 613 565
8696 x1649 tel:%2B1%20613%20565%208696%20x1649
www.pythian.com http://www.pythian.com/

On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com
mailto:or.sh...@gmail.com wrote:

Hi all,
In the near future I'll need to add more than 10 nodes to a 2.0.9
cluster (using vnodes).
I read this documentation on datastax website:

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

In one point it says:
If you are using racks, you can safely bootstrap two nodes at a time
when both nodes are on the same rack.

And in another is says:
Start Cassandra on each new node. Allow two minutes between node
initializations. You can monitor the startup and data streaming
process using nodetool netstats.

What do you say?
--
Or Sher

--
Or Sher

smime.p7s
Description: S/MIME cryptographic signature

RE: Adding nodes to existing cluster

2015-04-20 Thread Matthew Johnson

Hi Colin,

To remove your address from the list, send a message to:

user-unsubscr...@cassandra.apache.org

Cheers,

Matt

*From:* Colin Clark [mailto:co...@clark.ws]
*Sent:* 20 April 2015 14:10
*To:* user@cassandra.apache.org
*Subject:* Re: Adding nodes to existing cluster

unsubscribe

On Apr 20, 2015, at 8:08 AM, Carlos Rolo r...@pythian.com wrote:

Independent of the snitch, data needs to travel to the new nodes (plus all
the keyspace information that goes via gossip). So I won't bootstrap them
all at once, even if it is only for network traffic generated.

Don't forget to run cleanup on the old nodes once all nodes are in place to
reclaim disk space.

Regards,

Carlos Juzarte Rolo

Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*

Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649

www.pythian.com

On Mon, Apr 20, 2015 at 1:58 PM, Or Sher or.sh...@gmail.com wrote:

Thanks for the response.
Sure we'll monitor as we're adding nodes.
We're now using 6 nodes on each DC. (We have 2 DCs)
Each node contains ~800GB

Do you know how rack configurations are relevant here?
Do you see any reason to bootstrap them one by one if we're not using
rack awareness?

On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com wrote:
Start one node at a time. Wait 2 minutes before starting each node.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin:
linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote:

Hi all,
In the near future I'll need to add more than 10 nodes to a 2.0.9
cluster (using vnodes).
I read this documentation on datastax website:

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

In one point it says:
If you are using racks, you can safely bootstrap two nodes at a time
when both nodes are on the same rack.

And in another is says:
Start Cassandra on each new node. Allow two minutes between node
initializations. You can monitor the startup and data streaming
process using nodetool netstats.

What do you say?
--
Or Sher

--
Or Sher

Re: Adding nodes to existing cluster

2015-04-20 Thread Or Sher

Thanks for the response.
Sure we'll monitor as we're adding nodes.
We're now using 6 nodes on each DC. (We have 2 DCs)
Each node contains ~800GB

Do you know how rack configurations are relevant here?
Do you see any reason to bootstrap them one by one if we're not using
rack awareness?

On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com wrote:
Start one node at a time. Wait 2 minutes before starting each node.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote:

Hi all,
In the near future I'll need to add more than 10 nodes to a 2.0.9
cluster (using vnodes).
I read this documentation on datastax website:

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

In one point it says:
If you are using racks, you can safely bootstrap two nodes at a time
when both nodes are on the same rack.

And in another is says:
Start Cassandra on each new node. Allow two minutes between node
initializations. You can monitor the startup and data streaming
process using nodetool netstats.

What do you say?
--
Or Sher

--
Or Sher

Re: Adding nodes to existing cluster

2015-04-20 Thread Carlos Rolo

Independent of the snitch, data needs to travel to the new nodes (plus all
the keyspace information that goes via gossip). So I won't bootstrap them
all at once, even if it is only for network traffic generated.

Don't forget to run cleanup on the old nodes once all nodes are in place to
reclaim disk space.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo
http://linkedin.com/in/carlosjuzarterolo*
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 1:58 PM, Or Sher or.sh...@gmail.com wrote:

Thanks for the response.
Sure we'll monitor as we're adding nodes.
We're now using 6 nodes on each DC. (We have 2 DCs)
Each node contains ~800GB

Do you know how rack configurations are relevant here?
Do you see any reason to bootstrap them one by one if we're not using
rack awareness?

On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com wrote:
Start one node at a time. Wait 2 minutes before starting each node.

How much data and nodes you have already? Depending on that, the
streaming
of data can stress on the resources you have.
I would recommend to start one and monitor, if things are ok, add another
one. And so on.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin:
linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote:

Hi all,
In the near future I'll need to add more than 10 nodes to a 2.0.9
cluster (using vnodes).
I read this documentation on datastax website:

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

In one point it says:
If you are using racks, you can safely bootstrap two nodes at a time
when both nodes are on the same rack.

And in another is says:
Start Cassandra on each new node. Allow two minutes between node
initializations. You can monitor the startup and data streaming
process using nodetool netstats.

What do you say?
--
Or Sher

--
Or Sher

Re: timeout creating table

2015-04-20 Thread Jim Witschey

Jimmy,

What's the exact command that produced this trace? Are you saying that
the 16-second wait in your trace what times out in your CREATE TABLE
statements?

Jim Witschey

Software Engineer in Test | jim.witsc...@datastax.com

On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:
 hi,
 we have some unit tests that run parallel that will create tmp keyspace, and
 tables and then drop them after tests are done.

 From time to time, our create table statement run into All hosts(s) for
 query failed... Timeout during read (from datastax driver) error.

 We later turn on tracing, and record something  in the following.
 See below between === , Native_Transport-Request thread and MigrationStage
 thread, there was like 16 seconds doing something.

 Any idea what that 16 seconds Cassandra was doing? We can work around that
 but increasing our datastax driver timeout value, but wondering if there is
 actually better way to solve this?

 thanks



  tracing --


 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d97-e6e2-11e4-823d-93572f3db015
 |
 Key cache hit for sstable 95588 | 127.0.0.1 |   1592 |
 Native-Transport-Requests:102
 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d98-e6e2-11e4-823d-93572f3db015
 |   Seeking
 to partition beginning in data file | 127.0.0.1 |   1593 |
 Native-Transport-Requests:102
 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d99-e6e2-11e4-823d-93572f3db015
 |Merging
 data from memtables and 3 sstables | 127.0.0.1 |   1595 |
 Native-Transport-Requests:102

 =
 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d9a-e6e2-11e4-823d-93572f3db015
 |
 Read 3 live and 0 tombstoned cells | 127.0.0.1 |   1610 |
 Native-Transport-Requests:102
 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a40-e6e2-11e4-823d-93572f3db015
 |   Executing seq scan across 1 sstables for
 (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 |
 16381594 |  MigrationStage:1
 =

 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a41-e6e2-11e4-823d-93572f3db015
 |   Seeking
 to partition beginning in data file | 127.0.0.1 |   16381782 |
 MigrationStage:1
 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a42-e6e2-11e4-823d-93572f3db015
 |
 Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381787 |
 MigrationStage:1
 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a43-e6e2-11e4-823d-93572f3db015
 |   Seeking
 to partition beginning in data file | 127.0.0.1 |   16381789 |
 MigrationStage:1
 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a44-e6e2-11e4-823d-93572f3db015
 |
 Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381791 |
 MigrationStage:1
 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a45-e6e2-11e4-823d-93572f3db015
 |   Seeking
 to partition beginning in data file | 127.0.0.1 |   16381792 |
 MigrationStage:1
 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a46-e6e2-11e4-823d-93572f3db015
 |
 Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381794 |
 MigrationStage:1
 .
 .
 .

Adding nodes to existing cluster

2015-04-20 Thread Or Sher

Hi all,
In the near future I'll need to add more than 10 nodes to a 2.0.9
cluster (using vnodes).
I read this documentation on datastax website:
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

In one point it says:
If you are using racks, you can safely bootstrap two nodes at a time
when both nodes are on the same rack.

And in another is says:
Start Cassandra on each new node. Allow two minutes between node
initializations. You can monitor the startup and data streaming
process using nodetool netstats.

We're not using racks configuration and from reading this
documentation I'm not really sure is it safe for us to bootstrap all
nodes together (with two minutes between each other).
I really hate the tought of doing it one by one, I assume it will take
more than 6H per node.

What do you say?
-- 
Or Sher

Handle Write Heavy Loads in Cassandra 2.0.3

Hi,
 
Recently, we discovered that  millions of mutations were getting dropped on our 
cluster. Eventually, we solved this problem by increasing the value of 
memtable_flush_writers from 1 to 3. We usually write 3 CFs simultaneously an 
one of them has 4 Secondary Indexes.
 
New changes also include:
concurrent_compactors: 12 (earlier it was default)
compaction_throughput_mb_per_sec: 32(earlier it was default)
in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64)
memtable_flush_writers: 3 (earlier 1)
 
After, making above changes, our write heavy workload scenarios started giving 
promotion failed exceptions in  gc logs.
 
We have done JVM tuning and Cassandra config changes to solve this:
 
MAX_HEAP_SIZE=12G (Increased Heap to from 8G to reduce fragmentation)
HEAP_NEWSIZE=3G
 
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=2 (We observed that even at 
SurvivorRatio=4, our survivor space was getting 100% utilized under heavy write 
load and we thought that minor collections were directly promoting objects to 
Tenured generation)
 
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=20 (Lots of objects were moving 
from Eden to Tenured on each minor collection..may be related to medium life 
objects related to Memtables and compactions as suggested by heapdump)
 
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 //though it's default value
JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70 (to avoid concurrent 
failures we reduced value)
 
Cassandra config:
compaction_throughput_mb_per_sec: 24
memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default is 
1/4 heap which creates more long lived objects)
 
Questions:
1. Why increasing memtable_flush_writers caused promotion failures in JVM? Does 
more memtable_flush_writers mean more memtables in memory?
2. Still, objects are getting promoted at high speed to Tenured space. CMS is 
running on Old gen every 4-5 minutes  under heavy write load. Around 750+ minor 
collections of upto 300ms happened in 45 mins. Do you see any problems with new 
JVM tuning and Cassandra config? Is the justification given against those 
changes sounds logical? Any suggestions?
3. What is the best practice for reducing heap fragmentation/promotion failure 
when allocation and promotion rates are high?
 
Thanks
Anuj

Re: Adding nodes to existing cluster

The documentation is referring to Consistent Range Movements.

There is a change in 2.1 that won't allow you to bootstrap multiple nodes
at the same time unless you explicitly turn off consistent range movements.
Check out the jira:

https://issues.apache.org/jira/browse/CASSANDRA-2434

All the best,

[image: datastax_logo.png] http://www.datastax.com/

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png] https://www.linkedin.com/company/datastax [image:
facebook.png] https://www.facebook.com/datastax [image: twitter.png]
https://twitter.com/datastax [image: g+.png]
https://plus.google.com/+Datastax/about
http://feeds.feedburner.com/datastax

http://cassandrasummit-datastax.com/

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Mon, Apr 20, 2015 at 10:40 AM, Or Sher or.sh...@gmail.com wrote:

OK.
Thanks.
I'll monitor the resources status (network, memory, cpu, io) as I go
and try to bootsrap them at chunks which seems not to have a bad
impact.
Will do regarding the cleanup.

Thanks!

On Mon, Apr 20, 2015 at 4:08 PM, Carlos Rolo r...@pythian.com wrote:
Independent of the snitch, data needs to travel to the new nodes (plus
all
the keyspace information that goes via gossip). So I won't bootstrap them
all at once, even if it is only for network traffic generated.

Don't forget to run cleanup on the old nodes once all nodes are in place
to
reclaim disk space.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin:
linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 1:58 PM, Or Sher or.sh...@gmail.com wrote:

Thanks for the response.
Sure we'll monitor as we're adding nodes.
We're now using 6 nodes on each DC. (We have 2 DCs)
Each node contains ~800GB

Do you know how rack configurations are relevant here?
Do you see any reason to bootstrap them one by one if we're not using
rack awareness?

On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com wrote:
Start one node at a time. Wait 2 minutes before starting each node.

How much data and nodes you have already? Depending on that, the
streaming
of data can stress on the resources you have.
I would recommend to start one and monitor, if things are ok, add
another
one. And so on.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin:
linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote:

Hi all,
In the near future I'll need to add more than 10 nodes to a 2.0.9
cluster (using vnodes).
I read this documentation on datastax website:

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

In one point it says:
If you are using racks, you can safely bootstrap two nodes at a time
when both nodes are on the same rack.

And in another is says:
Start Cassandra on each new node. Allow two minutes between node
initializations. You can monitor the startup and data streaming
process using nodetool netstats.

What do you say?
--
Or Sher

--
Or Sher

Re: Adding nodes to existing cluster

2015-04-20 Thread Or Sher

OK.
Thanks.
I'll monitor the resources status (network, memory, cpu, io) as I go
and try to bootsrap them at chunks which seems not to have a bad
impact.
Will do regarding the cleanup.

Thanks!

On Mon, Apr 20, 2015 at 4:08 PM, Carlos Rolo r...@pythian.com wrote:
Independent of the snitch, data needs to travel to the new nodes (plus all
the keyspace information that goes via gossip). So I won't bootstrap them
all at once, even if it is only for network traffic generated.

Don't forget to run cleanup on the old nodes once all nodes are in place to
reclaim disk space.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 1:58 PM, Or Sher or.sh...@gmail.com wrote:

Thanks for the response.
Sure we'll monitor as we're adding nodes.
We're now using 6 nodes on each DC. (We have 2 DCs)
Each node contains ~800GB

Do you know how rack configurations are relevant here?
Do you see any reason to bootstrap them one by one if we're not using
rack awareness?

On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com wrote:
Start one node at a time. Wait 2 minutes before starting each node.

How much data and nodes you have already? Depending on that, the
streaming
of data can stress on the resources you have.
I would recommend to start one and monitor, if things are ok, add
another
one. And so on.

Regards,

Carlos Juzarte Rolo
Cassandra Consultant

Pythian - Love your data

rolo@pythian | Twitter: cjrolo | Linkedin:
linkedin.com/in/carlosjuzarterolo
Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649
www.pythian.com

On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote:

Hi all,
In the near future I'll need to add more than 10 nodes to a 2.0.9
cluster (using vnodes).
I read this documentation on datastax website:

http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html

In one point it says:
If you are using racks, you can safely bootstrap two nodes at a time
when both nodes are on the same rack.

And in another is says:
Start Cassandra on each new node. Allow two minutes between node
initializations. You can monitor the startup and data streaming
process using nodetool netstats.

What do you say?
--
Or Sher

--
Or Sher

Re: Getting ParNew GC in ... CMS Old Gen ... in logs

I think this is just saying that young gen collection using Par new collector 
took 248 seconds. This is quite normal with CMS unless it happens too 
frequenltly several times in a sec. I think query time has more to do with read 
timeout in yaml. Try increasing it. If its a range query then please increase 
range timeout in yaml. 


Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:shahab shahab.mok...@gmail.com
Date:Mon, 20 Apr, 2015 at 9:59 pm
Subject:Getting  ParNew GC in ... CMS Old Gen ...  in logs

Hi,


I am keep getting following line in the cassandra logs, apparently something 
related to Garbage Collection. And I guess this is one of the signs why i do 
not get any response (i get time-out) when I query large volume of data ?!!! 


 ParNew GC in 248ms.  CMS Old Gen: 453244264 - 570471312; Par Eden Space: 
167712624 - 0; Par Survivor Space: 0 - 20970080


Is above line is indication of something that need to be fixed in the system?? 
how can I resolve this?



best,

/Shahab

Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Alex Popescu

You'll have to configure your nodes to:

1. use AWS internal IPs for inter-node connection (check listen_address)
and
2. use the AWS public IP for client-to-node connections (check rpc_address)

Depending on the setup, there might be other interesting conf options in
cassandra.yaml (broadcast_address, listen_interface, rpc_interface).

[1]:
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html

On Mon, Apr 20, 2015 at 9:50 AM, Jonathan Haddad j...@jonhaddad.com wrote:

 Ideally you'll be on the same network, but if you can't be, you'll need to
 use the public ip in listen_address.

 On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson matt.john...@algomi.com
 wrote:

 Hi all,



 I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes,
 just as a POC. Cassandra servers connect to each other over their internal
 AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and
 sales3.



 I connect to it from my local dev environment using the seed’s external
 NAT address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed).



 When I try to connect, it connects fine, and can retrieve some data (I
 have very limited amounts of data in there, but it seems to retrieve ok),
 but I also get lots of stacktraces in my log where my dev environment is
 trying to connect to Cassandra on the internal IP (presumably the Cassandra
 seed node tells my dev env where to look):





 *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
 Cassandra host sales3/54.x.x.142:9042 added*

 *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
 Cassandra host /172.x.x.237:9042 added*

 *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
 Cassandra host /172.x.x.170:9042 added*

 *Connected to cluster: Test Cluster*

 *Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1*

 *Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1*

 *Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1*

 *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
 worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0,
 closed=false] Transport initialized and ready*

 *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
 worker-0} Session - Added connection pool for sales3/54.x.x.142:9042*

 *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
 worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0,
 closed=false] Error connecting to /172.x.x.237:9042 (connection timed out:
 /172.x.x.237:9042)*

 *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
 worker-1} Connection - Defuncting connection to /172.x.x.237:9042*

 *com.datastax.driver.core.TransportException**: [/172.x.x.237:9042]
 Cannot connect*





 Does anyone have any experience with connecting to AWS clusters from dev
 machines? How have you set up your aliases to get around this issue?



 Current setup in sales3 (seed node) cassandra.yaml:



 *- seeds: sales3*

 *listen_address: sales3*

 *rpc_address: sales3*



 Current setup in other nodes (eg sales2) cassandra.yaml:



 *- seeds: sales3*

 *listen_address: sales2*

 *rpc_address: sales2*





 Thanks!

 Matt






-- 
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax

Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Russell Bradberry

I would like to note that this will require all clients connect over the 
external IP address. If you have clients within Amazon that need to connect 
over the private IP address, this would not be possible.  If you have a mix of 
clients that need to connect over private IP address and public, then one of 
the solutions outlined in https://datastax-oss.atlassian.net/browse/JAVA-145 
may be more appropriate.

-Russ

From:  Alex Popescu
Reply-To:  user@cassandra.apache.org
Date:  Monday, April 20, 2015 at 2:00 PM
To:  user
Subject:  Re: Connecting to Cassandra cluster in AWS from local network

You'll have to configure your nodes to:

1. use AWS internal IPs for inter-node connection (check listen_address) and 
2. use the AWS public IP for client-to-node connections (check rpc_address)

Depending on the setup, there might be other interesting conf options in 
cassandra.yaml (broadcast_address, listen_interface, rpc_interface).

[1]: 
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html

On Mon, Apr 20, 2015 at 9:50 AM, Jonathan Haddad j...@jonhaddad.com wrote:
Ideally you'll be on the same network, but if you can't be, you'll need to use 
the public ip in listen_address.

On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson matt.john...@algomi.com wrote:
Hi all,

 

I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just 
as a POC. Cassandra servers connect to each other over their internal AWS IP 
addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3.

 

I connect to it from my local dev environment using the seed’s external NAT 
address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed).

 

When I try to connect, it connects fine, and can retrieve some data (I have 
very limited amounts of data in there, but it seems to retrieve ok), but I also 
get lots of stacktraces in my log where my dev environment is trying to connect 
to Cassandra on the internal IP (presumably the Cassandra seed node tells my 
dev env where to look):

 

 

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host sales3/54.x.x.142:9042 added

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host /172.x.x.237:9042 added

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host /172.x.x.170:9042 added

Connected to cluster: Test Cluster

Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1

Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1

Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1

DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0, 
closed=false] Transport initialized and ready

DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-0} Session - Added connection pool for sales3/54.x.x.142:9042

DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0, 
closed=false] Error connecting to /172.x.x.237:9042 (connection timed out: 
/172.x.x.237:9042)

DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-1} Connection - Defuncting connection to /172.x.x.237:9042

com.datastax.driver.core.TransportException: [/172.x.x.237:9042] Cannot connect

 

 

Does anyone have any experience with connecting to AWS clusters from dev 
machines? How have you set up your aliases to get around this issue?

 

Current setup in sales3 (seed node) cassandra.yaml:

 

- seeds: sales3

listen_address: sales3

rpc_address: sales3

 

Current setup in other nodes (eg sales2) cassandra.yaml:

 

- seeds: sales3

listen_address: sales2

rpc_address: sales2

 

 

Thanks!

Matt

 



-- 
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax

Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Jonathan Haddad

Ideally you'll be on the same network, but if you can't be, you'll need to
use the public ip in listen_address.

On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson matt.john...@algomi.com
wrote:

 Hi all,



 I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes,
 just as a POC. Cassandra servers connect to each other over their internal
 AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and
 sales3.



 I connect to it from my local dev environment using the seed’s external
 NAT address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed).



 When I try to connect, it connects fine, and can retrieve some data (I
 have very limited amounts of data in there, but it seems to retrieve ok),
 but I also get lots of stacktraces in my log where my dev environment is
 trying to connect to Cassandra on the internal IP (presumably the Cassandra
 seed node tells my dev env where to look):





 *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
 Cassandra host sales3/54.x.x.142:9042 added*

 *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
 Cassandra host /172.x.x.237:9042 added*

 *INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
 Cassandra host /172.x.x.170:9042 added*

 *Connected to cluster: Test Cluster*

 *Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1*

 *Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1*

 *Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1*

 *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
 worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0,
 closed=false] Transport initialized and ready*

 *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
 worker-0} Session - Added connection pool for sales3/54.x.x.142:9042*

 *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
 worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0,
 closed=false] Error connecting to /172.x.x.237:9042 (connection timed out:
 /172.x.x.237:9042)*

 *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
 worker-1} Connection - Defuncting connection to /172.x.x.237:9042*

 *com.datastax.driver.core.TransportException**: [/172.x.x.237:9042]
 Cannot connect*





 Does anyone have any experience with connecting to AWS clusters from dev
 machines? How have you set up your aliases to get around this issue?



 Current setup in sales3 (seed node) cassandra.yaml:



 *- seeds: sales3*

 *listen_address: sales3*

 *rpc_address: sales3*



 Current setup in other nodes (eg sales2) cassandra.yaml:



 *- seeds: sales3*

 *listen_address: sales2*

 *rpc_address: sales2*





 Thanks!

 Matt

Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Matthew Johnson

Hi all,



I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes,
just as a POC. Cassandra servers connect to each other over their internal
AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and
sales3.



I connect to it from my local dev environment using the seed’s external NAT
address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed).



When I try to connect, it connects fine, and can retrieve some data (I have
very limited amounts of data in there, but it seems to retrieve ok), but I
also get lots of stacktraces in my log where my dev environment is trying
to connect to Cassandra on the internal IP (presumably the Cassandra seed
node tells my dev env where to look):





*INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
Cassandra host sales3/54.x.x.142:9042 added*

*INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
Cassandra host /172.x.x.237:9042 added*

*INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New
Cassandra host /172.x.x.170:9042 added*

*Connected to cluster: Test Cluster*

*Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1*

*Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1*

*Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1*

*DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0,
closed=false] Transport initialized and ready*

*DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver
worker-0} Session - Added connection pool for sales3/54.x.x.142:9042*

*DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0,
closed=false] Error connecting to /172.x.x.237:9042 (connection timed out:
/172.x.x.237:9042)*

*DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver
worker-1} Connection - Defuncting connection to /172.x.x.237:9042*

*com.datastax.driver.core.TransportException**: [/172.x.x.237:9042] Cannot
connect*





Does anyone have any experience with connecting to AWS clusters from dev
machines? How have you set up your aliases to get around this issue?



Current setup in sales3 (seed node) cassandra.yaml:



*- seeds: sales3*

*listen_address: sales3*

*rpc_address: sales3*



Current setup in other nodes (eg sales2) cassandra.yaml:



*- seeds: sales3*

*listen_address: sales2*

*rpc_address: sales2*





Thanks!

Matt

Re: Connecting to Cassandra cluster in AWS from local network

2015-04-20 Thread Russell Bradberry

There are a couple options here. You can use the built in address translator, 
or, write a new load balancing policy.  See 
https://datastax-oss.atlassian.net/browse/JAVA-145 for more information.

From:  Jonathan Haddad
Reply-To:  user@cassandra.apache.org
Date:  Monday, April 20, 2015 at 12:50 PM
To:  user@cassandra.apache.org
Subject:  Re: Connecting to Cassandra cluster in AWS from local network

Ideally you'll be on the same network, but if you can't be, you'll need to use 
the public ip in listen_address.

On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson matt.john...@algomi.com wrote:
Hi all,

I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just 
as a POC. Cassandra servers connect to each other over their internal AWS IP 
addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3.

I connect to it from my local dev environment using the seed’s external NAT 
address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed).

When I try to connect, it connects fine, and can retrieve some data (I have 
very limited amounts of data in there, but it seems to retrieve ok), but I also 
get lots of stacktraces in my log where my dev environment is trying to connect 
to Cassandra on the internal IP (presumably the Cassandra seed node tells my 
dev env where to look):

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host sales3/54.x.x.142:9042 added

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host /172.x.x.237:9042 added

INFO  2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra 
host /172.x.x.170:9042 added

Connected to cluster: Test Cluster

Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1

Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1

Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1

DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0, 
closed=false] Transport initialized and ready

DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-0} Session - Added connection pool for sales3/54.x.x.142:9042

DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0, 
closed=false] Error connecting to /172.x.x.237:9042 (connection timed out: 
/172.x.x.237:9042)

DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver 
worker-1} Connection - Defuncting connection to /172.x.x.237:9042

com.datastax.driver.core.TransportException: [/172.x.x.237:9042] Cannot connect

Does anyone have any experience with connecting to AWS clusters from dev 
machines? How have you set up your aliases to get around this issue?

Current setup in sales3 (seed node) cassandra.yaml:

- seeds: sales3

listen_address: sales3

rpc_address: sales3

Current setup in other nodes (eg sales2) cassandra.yaml:

- seeds: sales3

listen_address: sales2

rpc_address: sales2

Thanks!

Matt

Cassandra based web app benchmark

2015-04-20 Thread Marko Asplund

Hi,

TechEmpower Web Framework Benchmarks (
https://www.techempower.com/benchmarks/) is a collaborative effort for
measuring performance of a large number of contemporary web development
platforms. Benchmarking and test implementation code is published as
open-source.

I've contributed a test implementation that uses Apache Cassandra for data
storage and based on the following technology stack:
* Java
* Resin app server + Servlet 3 with asynchronous processing
* Apache Cassandra database (v2.0.12)

TFB Round 10 results are expected to be released in the near future with
results from Cassandra based test implementation included.

Now that the initial test implementation has been merged as part of the
project codebase, I'd like to solicit feedback from the Cassandra user and
developer community on best practices, especially wrt. to performance, with
the hope that the test implementation can get the best performance out of
Cassandra in future benchmark rounds.

Any review comments and pull requests would be welcome. The code can be
found on Github:

https://github.com/TechEmpower/FrameworkBenchmarks
https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/servlet3-cass
https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/config/cassandra

More info on the benchmark project, as well as the Cassandra based test
implementation can be found here:
http://practicingtechie.com/2014/09/10/web-application-framework-benchmarks/

thanks,

marko

Getting ParNew GC in ... CMS Old Gen ... in logs

2015-04-20 Thread shahab

Hi,

I am keep getting following line in the cassandra logs, apparently
something related to Garbage Collection. And I guess this is one of the
signs why i do not get any response (i get time-out) when I query large
volume of data ?!!!

 ParNew GC in 248ms.  CMS Old Gen: 453244264 - 570471312; Par Eden Space:
167712624 - 0; Par Survivor Space: 0 - 20970080

Is above line is indication of something that need to be fixed in the
system?? how can I resolve this?


best,
/Shahab

Re: timeout creating table

2015-04-20 Thread Jimmy Lin

Yes, sometimes it is create table and sometime it is create index.
It doesn't happen all the time, but feel like if multiple tests trying to
do schema change(create or drop), Cassandra has a long delay on the schema
change statements.

I also just read about auto_snapshot, and I turn it off but still no luck.



On Mon, Apr 20, 2015 at 6:42 AM, Jim Witschey jim.witsc...@datastax.com
wrote:

 Jimmy,

 What's the exact command that produced this trace? Are you saying that
 the 16-second wait in your trace what times out in your CREATE TABLE
 statements?

 Jim Witschey

 Software Engineer in Test | jim.witsc...@datastax.com

 On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote:
  hi,
  we have some unit tests that run parallel that will create tmp keyspace,
 and
  tables and then drop them after tests are done.
 
  From time to time, our create table statement run into All hosts(s) for
  query failed... Timeout during read (from datastax driver) error.
 
  We later turn on tracing, and record something  in the following.
  See below between === , Native_Transport-Request thread and
 MigrationStage
  thread, there was like 16 seconds doing something.
 
  Any idea what that 16 seconds Cassandra was doing? We can work around
 that
  but increasing our datastax driver timeout value, but wondering if there
 is
  actually better way to solve this?
 
  thanks
 
 
 
   tracing --
 
 
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d97-e6e2-11e4-823d-93572f3db015
  |
  Key cache hit for sstable 95588 | 127.0.0.1 |   1592 |
  Native-Transport-Requests:102
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d98-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition beginning in data file | 127.0.0.1 |   1593 |
  Native-Transport-Requests:102
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d99-e6e2-11e4-823d-93572f3db015
  |
 Merging
  data from memtables and 3 sstables | 127.0.0.1 |   1595 |
  Native-Transport-Requests:102
 
  =
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 58730d9a-e6e2-11e4-823d-93572f3db015
  |
  Read 3 live and 0 tombstoned cells | 127.0.0.1 |   1610 |
  Native-Transport-Requests:102
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a40-e6e2-11e4-823d-93572f3db015
  |   Executing seq scan across 1 sstables for
  (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 |
  16381594 |  MigrationStage:1
  =
 
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a41-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition beginning in data file | 127.0.0.1 |   16381782 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a42-e6e2-11e4-823d-93572f3db015
  |
  Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381787 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a43-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition beginning in data file | 127.0.0.1 |   16381789 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a44-e6e2-11e4-823d-93572f3db015
  |
  Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381791 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a45-e6e2-11e4-823d-93572f3db015
  |
  Seeking
  to partition beginning in data file | 127.0.0.1 |   16381792 |
  MigrationStage:1
  5872bf70-e6e2-11e4-823d-93572f3db015 |
 62364a46-e6e2-11e4-823d-93572f3db015
  |
  Read 0 live and 0 tombstoned cells | 127.0.0.1 |   16381794 |
  MigrationStage:1
  .
  .
  .

Re: COPY command to export a table to CSV file

Does the nproc,nofile,memlock settings in
/etc/security/limits.d/cassandra.conf are set to optimum value ?
it's all default.

What is the consistency level ?
CL = Qurom

Is there any other way to export a table to CSV?

regards
Neha

On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com wrote:

 Hi,

 Thanks for the info,

 Does the nproc,nofile,memlock settings in
 /etc/security/limits.d/cassandra.conf are set to optimum value ?

 What is the consistency level ?

 Best Regardds,
 Kiran.M.K.


 On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 hi,

 What is the count of records in the column-family ?
   We have about 38,000 Rows in the column-family for which we are
 trying to export
 What  is the Cassandra Version ?
  We are using Cassandra 2.0.11

 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default .
 The Server is 8 GB.

 regards
 Neha

 On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Hi,

 check  the MAX_HEAP_SIZE configuration in cassandra-env.sh environment
 file

 Also HEAP_NEWSIZE ?

 What is the Consistency Level you are using ?

 Best REgards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com
 wrote:

 Seems like the is related to JAVA HEAP Memory.

 What is the count of records in the column-family ?

 What  is the Cassandra Version ?

 Best Regards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hello all,

 We are getting the OutOfMemoryError on one of the Node and the Node is
 down, when we run the export command to get all the data from a table.


 Regards
 Neha




 ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java
 (line 199) Exception in thread Thread[ReadStage:532074,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
 at
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
 at
 org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
 at
 org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
 at
 org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
 at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
 at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88)
 at
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
 at
 org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82)
 at
 org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
 at
 org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
 at
 org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185)
 at
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
 at
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
 at
 org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101)
 at
 org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75)
 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)





 --
 Best Regards,
 Kiran.M.K.




 --
 Best Regards,

Re: Handle Write Heavy Loads in Cassandra 2.0.3

Small correction: we are making writes in 5 cf an reading frm one at high 
speeds. 



Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:Anuj Wadehra anujw_2...@yahoo.co.in
Date:Mon, 20 Apr, 2015 at 7:53 pm
Subject:Handle Write Heavy Loads in Cassandra 2.0.3

Hi, 
 
Recently, we discovered that  millions of mutations were getting dropped on our 
cluster. Eventually, we solved this problem by increasing the value of 
memtable_flush_writers from 1 to 3. We usually write 3 CFs simultaneously an 
one of them has 4 Secondary Indexes. 
 
New changes also include: 
concurrent_compactors: 12 (earlier it was default) 
compaction_throughput_mb_per_sec: 32(earlier it was default) 
in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64) 
memtable_flush_writers: 3 (earlier 1) 
 
After, making above changes, our write heavy workload scenarios started giving 
promotion failed exceptions in  gc logs. 
 
We have done JVM tuning and Cassandra config changes to solve this: 
 
MAX_HEAP_SIZE=12G (Increased Heap to from 8G to reduce fragmentation) 
HEAP_NEWSIZE=3G 
 
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=2 (We observed that even at 
SurvivorRatio=4, our survivor space was getting 100% utilized under heavy write 
load and we thought that minor collections were directly promoting objects to 
Tenured generation) 
 
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=20 (Lots of objects were moving 
from Eden to Tenured on each minor collection..may be related to medium life 
objects related to Memtables and compactions as suggested by heapdump) 
 
JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20 
JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions 
JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity 
JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs 
JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 
JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark 
JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3 
JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 //though it's default value 
JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways 
JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled 
JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking 
JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70 (to avoid concurrent 
failures we reduced value) 
 
Cassandra config: 
compaction_throughput_mb_per_sec: 24 
memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default is 
1/4 heap which creates more long lived objects) 
 
Questions: 
1. Why increasing memtable_flush_writers and in_memory_compaction_limit_in_mb 
caused promotion failures in JVM? Does more memtable_flush_writers mean more 
memtables in memory? 
2. Still, objects are getting promoted at high speed to Tenured space. CMS is 
running on Old gen every 4-5 minutes  under heavy write load. Around 750+ minor 
collections of upto 300ms happened in 45 mins. Do you see any problems with new 
JVM tuning and Cassandra config? Is the justification given against those 
changes sounds logical? Any suggestions? 
3. What is the best practice for reducing heap fragmentation/promotion failure 
when allocation and promotion rates are high? 
 
Thanks 
Anuj

Re: Getting ParNew GC in ... CMS Old Gen ... in logs

I meant 248 milli seconds

Sent from Yahoo Mail on Android

From:Anuj Wadehra anujw_2...@yahoo.co.in
Date:Mon, 20 Apr, 2015 at 11:41 pm
Subject:Re: Getting  ParNew GC in ... CMS Old Gen ...  in logs

I think this is just saying that young gen collection using Par new collector 
took 248 seconds. This is quite normal with CMS unless it happens too 
frequenltly several times in a sec. I think query time has more to do with read 
timeout in yaml. Try increasing it. If its a range query then please increase 
range timeout in yaml. 

Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:shahab shahab.mok...@gmail.com
Date:Mon, 20 Apr, 2015 at 9:59 pm
Subject:Getting  ParNew GC in ... CMS Old Gen ...  in logs

Hi,

I am keep getting following line in the cassandra logs, apparently something 
related to Garbage Collection. And I guess this is one of the signs why i do 
not get any response (i get time-out) when I query large volume of data ?!!! 

 ParNew GC in 248ms.  CMS Old Gen: 453244264 - 570471312; Par Eden Space: 
167712624 - 0; Par Survivor Space: 0 - 20970080

Is above line is indication of something that need to be fixed in the system?? 
how can I resolve this?

best,

/Shahab

Re: COPY command to export a table to CSV file

2015-04-20 Thread Kiran mk

Hi,

check  the MAX_HEAP_SIZE configuration in cassandra-env.sh environment file

Also HEAP_NEWSIZE ?

What is the Consistency Level you are using ?

Best REgards,
Kiran.M.K.

On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com wrote:

 Seems like the is related to JAVA HEAP Memory.

 What is the count of records in the column-family ?

 What  is the Cassandra Version ?

 Best Regards,
 Kiran.M.K.

 On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com
 wrote:

 Hello all,

 We are getting the OutOfMemoryError on one of the Node and the Node is
 down, when we run the export command to get all the data from a table.


 Regards
 Neha




 ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java
 (line 199) Exception in thread Thread[ReadStage:532074,5,main]
 java.lang.OutOfMemoryError: Java heap space
 at
 org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347)
 at
 org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392)
 at
 org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355)
 at
 org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124)
 at
 org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85)
 at org.apache.cassandra.db.Column$1.computeNext(Column.java:75)
 at org.apache.cassandra.db.Column$1.computeNext(Column.java:64)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88)
 at
 org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
 at
 org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82)
 at
 org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157)
 at
 org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
 at
 org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185)
 at
 org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122)
 at
 org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
 at
 org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101)
 at
 org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75)
 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115)
 at
 org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)





 --
 Best Regards,
 Kiran.M.K.




-- 
Best Regards,
Kiran.M.K.

Re: COPY command to export a table to CSV file