Re: COPY command to export a table to CSV file
hi, what happens if unloader meets blob field? 2015-04-20 23:43 GMT+02:00 Sebastian Estevez sebastian.este...@datastax.com : Try Brian's cassandra-unloader https://github.com/brianmhess/cassandra-loader#cassandra-unloader All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi nehajtriv...@gmail.com wrote: Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? it's all default. What is the consistency level ? CL = Qurom Is there any other way to export a table to CSV? regards Neha On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com wrote: Hi, Thanks for the info, Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? What is the consistency level ? Best Regardds, Kiran.M.K. On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com wrote: hi, What is the count of records in the column-family ? We have about 38,000 Rows in the column-family for which we are trying to export What is the Cassandra Version ? We are using Cassandra 2.0.11 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default . The Server is 8 GB. regards Neha On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com wrote: Hi, check the MAX_HEAP_SIZE configuration in cassandra-env.sh environment file Also HEAP_NEWSIZE ? What is the Consistency Level you are using ? Best REgards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com wrote: Seems like the is related to JAVA HEAP Memory. What is the count of records in the column-family ? What is the Cassandra Version ? Best Regards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hello all, We are getting the OutOfMemoryError on one of the Node and the Node is down, when we run the export command to get all the data from a table. Regards Neha ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:532074,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85) at org.apache.cassandra.db.Column$1.computeNext(Column.java:75) at org.apache.cassandra.db.Column$1.computeNext(Column.java:64) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140)
Re: COPY command to export a table to CSV file
Try Brian's cassandra-unloader https://github.com/brianmhess/cassandra-loader#cassandra-unloader All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi nehajtriv...@gmail.com wrote: Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? it's all default. What is the consistency level ? CL = Qurom Is there any other way to export a table to CSV? regards Neha On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com wrote: Hi, Thanks for the info, Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? What is the consistency level ? Best Regardds, Kiran.M.K. On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com wrote: hi, What is the count of records in the column-family ? We have about 38,000 Rows in the column-family for which we are trying to export What is the Cassandra Version ? We are using Cassandra 2.0.11 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default . The Server is 8 GB. regards Neha On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com wrote: Hi, check the MAX_HEAP_SIZE configuration in cassandra-env.sh environment file Also HEAP_NEWSIZE ? What is the Consistency Level you are using ? Best REgards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com wrote: Seems like the is related to JAVA HEAP Memory. What is the count of records in the column-family ? What is the Cassandra Version ? Best Regards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hello all, We are getting the OutOfMemoryError on one of the Node and the Node is down, when we run the export command to get all the data from a table. Regards Neha ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:532074,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85) at org.apache.cassandra.db.Column$1.computeNext(Column.java:75) at org.apache.cassandra.db.Column$1.computeNext(Column.java:64) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200) at
Bootstrap performance.
Hi guys, We have a 100+ nodes cluster, each node has about 400G data, and is running on a flash disk. We are running 2.1.2. When I bring in a new node into the cluster, it introduces significant load to the cluster. For the new node, the cpu usage is 100%, but disk write io is only around 50MB/s, while we have 10G network. Does it sound normal to you? Here are some iostat and vmstat metrics: iostat avg-cpu: %user %nice %system %iowait %steal %idle 88.523.994.110.000.003.38 Device:tpsMB_read/sMB_wrtn/sMB_readMB_wrtn sda 1.00 0.00 0.04 0 0 sdb 156.50 0.0055.62 01 vmstat = 138 0 0 86781912 438780 10152336800 0 31893 264496 247316 95 4 1 0 0 2015-04-21 01:04:01 UTC 147 0 0 86562400 438780 10160724800 0 90510 456635 245849 91 5 4 0 0 2015-04-21 01:04:03 UTC 143 0 0 86341168 438780 10169222400 0 32392 284495 273656 92 4 4 0 0 2015-04-21 01:04:05 UTC Thanks. -- Dikang
Re: timeout creating table
Can you grep for GCInspector in your system.log? Maybe you have long GC pauses. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Apr 20, 2015 at 12:19 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: Yes, sometimes it is create table and sometime it is create index. It doesn't happen all the time, but feel like if multiple tests trying to do schema change(create or drop), Cassandra has a long delay on the schema change statements. I also just read about auto_snapshot, and I turn it off but still no luck. On Mon, Apr 20, 2015 at 6:42 AM, Jim Witschey jim.witsc...@datastax.com wrote: Jimmy, What's the exact command that produced this trace? Are you saying that the 16-second wait in your trace what times out in your CREATE TABLE statements? Jim Witschey Software Engineer in Test | jim.witsc...@datastax.com On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: hi, we have some unit tests that run parallel that will create tmp keyspace, and tables and then drop them after tests are done. From time to time, our create table statement run into All hosts(s) for query failed... Timeout during read (from datastax driver) error. We later turn on tracing, and record something in the following. See below between === , Native_Transport-Request thread and MigrationStage thread, there was like 16 seconds doing something. Any idea what that 16 seconds Cassandra was doing? We can work around that but increasing our datastax driver timeout value, but wondering if there is actually better way to solve this? thanks tracing -- 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d97-e6e2-11e4-823d-93572f3db015 | Key cache hit for sstable 95588 | 127.0.0.1 | 1592 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d98-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 1593 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d99-e6e2-11e4-823d-93572f3db015 | Merging data from memtables and 3 sstables | 127.0.0.1 | 1595 | Native-Transport-Requests:102 = 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d9a-e6e2-11e4-823d-93572f3db015 | Read 3 live and 0 tombstoned cells | 127.0.0.1 | 1610 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a40-e6e2-11e4-823d-93572f3db015 | Executing seq scan across 1 sstables for (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 | 16381594 | MigrationStage:1 = 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a41-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 16381782 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a42-e6e2-11e4-823d-93572f3db015 | Read 0 live and 0 tombstoned cells | 127.0.0.1 | 16381787 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a43-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 16381789 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a44-e6e2-11e4-823d-93572f3db015 | Read 0 live and 0 tombstoned cells | 127.0.0.1 | 16381791 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a45-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 16381792 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a46-e6e2-11e4-823d-93572f3db015 | Read 0 live and 0 tombstoned cells | 127.0.0.1 | 16381794 | MigrationStage:1 . . .
Re: COPY command to export a table to CSV file
Blobs are ByteBuffer s it calls getBytes().toString: https://github.com/brianmhess/cassandra-loader/blob/master/src/main/java/com/datastax/loader/parser/ByteBufferParser.java#L35 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Apr 20, 2015 at 5:47 PM, Serega Sheypak serega.shey...@gmail.com wrote: hi, what happens if unloader meets blob field? 2015-04-20 23:43 GMT+02:00 Sebastian Estevez sebastian.este...@datastax.com: Try Brian's cassandra-unloader https://github.com/brianmhess/cassandra-loader#cassandra-unloader All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi nehajtriv...@gmail.com wrote: Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? it's all default. What is the consistency level ? CL = Qurom Is there any other way to export a table to CSV? regards Neha On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com wrote: Hi, Thanks for the info, Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? What is the consistency level ? Best Regardds, Kiran.M.K. On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com wrote: hi, What is the count of records in the column-family ? We have about 38,000 Rows in the column-family for which we are trying to export What is the Cassandra Version ? We are using Cassandra 2.0.11 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default . The Server is 8 GB. regards Neha On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com wrote: Hi, check the MAX_HEAP_SIZE configuration in cassandra-env.sh environment file Also HEAP_NEWSIZE ? What is the Consistency Level you are using ? Best REgards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com wrote: Seems like the is related to JAVA HEAP Memory. What is the count of records in the column-family ? What is the Cassandra Version ? Best Regards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hello all, We are getting the OutOfMemoryError on one of the Node and the Node is down, when we run the export command to get all the data from a table. Regards Neha ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:532074,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85) at org.apache.cassandra.db.Column$1.computeNext(Column.java:75) at org.apache.cassandra.db.Column$1.computeNext(Column.java:64) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at
Re: CQL 3.x Update ...USING TIMESTAMP...
Tyler, I can consider trying out light weight transactions, but here are my concerns #1. We have 2 data centers located close by with plans to expand to more data centers which are even further away geographically. #2. How will this impact light weight transactions when there is high level of network contention for cross data center traffic. #3. Do you know of any real examples where companies have used light weight transactions in a multi-data center traffic. Regards Sachin On Tue, Mar 24, 2015 at 10:56 AM, Tyler Hobbs ty...@datastax.com wrote: do you just mean that it's easy to forget to always set your timestamp correctly, and if you goof it up, it makes it difficult to recover from (i.e. you issue a delete with system timestamp instead of document version, and that's way larger than your document version would ever be, so you can never write that document again)? Yes, that's basically what I meant. Plus, if you need to make a manual correction to a document, you'll need to increment the version, which would presumably cause problems for your application. It's possible to handle all of this correctly if you take care, but I wouldn't trust myself to always get this right. @Tyler With your recommendation, won't I end up saving all the version(s) of the document. In my case the document is pretty huge (~5mb) and each document has up to 10 versions. And you already highlighted that light weight transactions are very expensive. You can always delete older versions to free up space. Using lightweight transactions may be a decent option if you don't have really high write throughput and aren't expecting high contention (which I don't think you are). I recommend testing this out with your application to see how it performs for you. On Sun, Mar 22, 2015 at 7:02 PM, Sachin Nikam skni...@gmail.com wrote: @Eric Stevens Thanks for representing my position while I came back to this thread. @Tyler With your recommendation, won't I end up saving all the version(s) of the document. In my case the document is pretty huge (~5mb) and each document has up to 10 versions. And you already highlighted that light weight transactions are very expensive. Also as Eric mentions, can you elaborate on what kind of problems could happen when we try to overwrite or delete data? Regards Sachin On Fri, Mar 13, 2015 at 4:23 AM, Brice Dutheil brice.duth...@gmail.com wrote: I agree with Tyler, in the normal run of a live application I would not recommend the use of the timestamp, and use other ways to *version* *inserts*. Otherwise you may fall in the *upsert* pitfalls that Tyler mentions. However I find there’s a legitimate use the USING TIMESTAMP trick, when migrating data form another datastore. The trick is at some point to enable the application to start writing cassandra *without* any timestamp setting on the statements. ⇐ for fresh data Then start a migration batch that will use a write time with an older date (i.e. when there’s *no* possible *collision* with other data). ⇐ for older data *This tricks has been used in prod with billions of records.* -- Brice On Thu, Mar 12, 2015 at 10:42 PM, Eric Stevens migh...@gmail.com wrote: Ok, but if you're using a system of time that isn't server clock oriented (Sachin's document revision ID, and my fixed and necessarily consistent base timestamp [B's always know their parent A's exact recorded timestamp]), isn't the principle of using timestamps to force a particular update out of several to win still sound? as using the clocks is only valid if clocks are perfectly sync'ed, which they are not Clock skew is a problem which doesn't seem to be a factor in either use case given that both have a consistent external source of truth for timestamp. On Thu, Mar 12, 2015 at 12:58 PM, Jonathan Haddad j...@jonhaddad.com wrote: In most datacenters you're going to see significant variance in your server times. Likely 20ms between servers in the same rack. Even google, using atomic clocks, has 1-7ms variance. [1] I would +1 Tyler's advice here, as using the clocks is only valid if clocks are perfectly sync'ed, which they are not, and likely never will be in our lifetime. [1] http://queue.acm.org/detail.cfm?id=2745385 On Thu, Mar 12, 2015 at 7:04 AM Eric Stevens migh...@gmail.com wrote: It's possible, but you'll end up with problems when attempting to overwrite or delete entries I'm wondering if you can elucidate on that a little bit, do you just mean that it's easy to forget to always set your timestamp correctly, and if you goof it up, it makes it difficult to recover from (i.e. you issue a delete with system timestamp instead of document version, and that's way larger than your document version would ever be, so you can never write that document again)? Or is there some bug in write timestamps that can cause the wrong entry to win the write contention? We're looking at doing
Re: COPY command to export a table to CSV file
Thanks Sebastian, I will try it out. But I am also curious why is the COPY command failing with Out of Memory Error. regards Neha On Tue, Apr 21, 2015 at 4:35 AM, Sebastian Estevez sebastian.este...@datastax.com wrote: Blobs are ByteBuffer s it calls getBytes().toString: https://github.com/brianmhess/cassandra-loader/blob/master/src/main/java/com/datastax/loader/parser/ByteBufferParser.java#L35 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Apr 20, 2015 at 5:47 PM, Serega Sheypak serega.shey...@gmail.com wrote: hi, what happens if unloader meets blob field? 2015-04-20 23:43 GMT+02:00 Sebastian Estevez sebastian.este...@datastax.com: Try Brian's cassandra-unloader https://github.com/brianmhess/cassandra-loader#cassandra-unloader All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Apr 20, 2015 at 12:31 PM, Neha Trivedi nehajtriv...@gmail.com wrote: Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? it's all default. What is the consistency level ? CL = Qurom Is there any other way to export a table to CSV? regards Neha On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com wrote: Hi, Thanks for the info, Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? What is the consistency level ? Best Regardds, Kiran.M.K. On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com wrote: hi, What is the count of records in the column-family ? We have about 38,000 Rows in the column-family for which we are trying to export What is the Cassandra Version ? We are using Cassandra 2.0.11 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default . The Server is 8 GB. regards Neha On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com wrote: Hi, check the MAX_HEAP_SIZE configuration in cassandra-env.sh environment file Also HEAP_NEWSIZE ? What is the Consistency Level you are using ? Best REgards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com wrote: Seems like the is related to JAVA HEAP Memory. What is the count of records in the column-family ? What is the Cassandra Version ? Best Regards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hello all, We are getting the OutOfMemoryError on one of the Node and the Node is down, when we run the export command to get all the data from a table. Regards Neha ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:532074,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85) at
bootstrap performance.
Re: timeout creating table
hi, there were only a few (4 of them across 4 minutes with around 200ms), so shouldn't be the reason The system log has tons of INFO [MigrationStage:1] 2015-04-20 11:03:21,880 ColumnFamilyStore.java (line 633) Enqueuing flush of Memtable-schema_keyspaces@2079381036(138/1215 serialized/live bytes, 3 ops) INFO [MigrationStage:1] 2015-04-20 11:03:21,900 ColumnFamilyStore.java (line 633) Enqueuing flush of Memtable-schema_columnfamilies@1283263314(1036/3946 serialized/live bytes, 24 ops) INFO [MigrationStage:1] 2015-04-20 11:03:21,921 ColumnFamilyStore.java (line 633) Enqueuing flush of Memtable-schema_columns But that could be just normal given that our unit tests are doing lot of droping keyspace and creating keyspace/tables. I read the MigrationStage thread pool is default to one, so wondering if that could be a reason it may be doing something that block others? On Mon, Apr 20, 2015 at 2:40 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: Can you grep for GCInspector in your system.log? Maybe you have long GC pauses. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Apr 20, 2015 at 12:19 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: Yes, sometimes it is create table and sometime it is create index. It doesn't happen all the time, but feel like if multiple tests trying to do schema change(create or drop), Cassandra has a long delay on the schema change statements. I also just read about auto_snapshot, and I turn it off but still no luck. On Mon, Apr 20, 2015 at 6:42 AM, Jim Witschey jim.witsc...@datastax.com wrote: Jimmy, What's the exact command that produced this trace? Are you saying that the 16-second wait in your trace what times out in your CREATE TABLE statements? Jim Witschey Software Engineer in Test | jim.witsc...@datastax.com On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: hi, we have some unit tests that run parallel that will create tmp keyspace, and tables and then drop them after tests are done. From time to time, our create table statement run into All hosts(s) for query failed... Timeout during read (from datastax driver) error. We later turn on tracing, and record something in the following. See below between === , Native_Transport-Request thread and MigrationStage thread, there was like 16 seconds doing something. Any idea what that 16 seconds Cassandra was doing? We can work around that but increasing our datastax driver timeout value, but wondering if there is actually better way to solve this? thanks tracing -- 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d97-e6e2-11e4-823d-93572f3db015 | Key cache hit for sstable 95588 | 127.0.0.1 | 1592 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d98-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 1593 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d99-e6e2-11e4-823d-93572f3db015 | Merging data from memtables and 3 sstables | 127.0.0.1 | 1595 | Native-Transport-Requests:102 = 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d9a-e6e2-11e4-823d-93572f3db015 | Read 3 live and 0 tombstoned cells | 127.0.0.1 | 1610 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a40-e6e2-11e4-823d-93572f3db015 | Executing seq scan across 1 sstables for (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 | 16381594 | MigrationStage:1 = 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a41-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 16381782 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a42-e6e2-11e4-823d-93572f3db015 | Read 0 live and 0 tombstoned cells | 127.0.0.1 | 16381787 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a43-e6e2-11e4-823d-93572f3db015 | Seeking to partition
Re: Bootstrap performance.
On Mon, Apr 20, 2015 at 6:08 PM, Dikang Gu dikan...@gmail.com wrote: When I bring in a new node into the cluster, it introduces significant load to the cluster. For the new node, the cpu usage is 100%, but disk write io is only around 50MB/s, while we have 10G network. Does it sound normal to you? Have you unthrottled both compaction and streaming via JMX/nodetool? Streaming is single threaded and can (?) be CPU bound, I would not be surprised if JIRA contains a ticket on the upper bounds of streaming performance in current implementation. =Rob
Re: Bootstrap performance.
Hi Rob, Why do you say steaming is single threaded? I see a lot of background streaming threads running, for example: STREAM-IN-/10.210.165.49 daemon prio=10 tid=0x7f81fc001000 nid=0x107075 runnable [0x7f836b256000] STREAM-IN-/10.213.51.57 daemon prio=10 tid=0x7f81f0002000 nid=0x107073 runnable [0x7f836b1d4000] STREAM-IN-/10.213.51.61 daemon prio=10 tid=0x7f81e8001000 nid=0x107070 runnable [0x7f836b11] STREAM-IN-/10.213.51.63 daemon prio=10 tid=0x7f81dc001800 nid=0x10706f runnable [0x7f836b0cf000] Thanks Dikang. On Mon, Apr 20, 2015 at 6:48 PM, Robert Coli rc...@eventbrite.com wrote: On Mon, Apr 20, 2015 at 6:08 PM, Dikang Gu dikan...@gmail.com wrote: When I bring in a new node into the cluster, it introduces significant load to the cluster. For the new node, the cpu usage is 100%, but disk write io is only around 50MB/s, while we have 10G network. Does it sound normal to you? Have you unthrottled both compaction and streaming via JMX/nodetool? Streaming is single threaded and can (?) be CPU bound, I would not be surprised if JIRA contains a ticket on the upper bounds of streaming performance in current implementation. =Rob -- Dikang
Re: COPY command to export a table to CSV file
Values in /etc/security/limits.d/cassandra.conf # Provided by the cassandra package cassandra - memlock unlimited cassandra - nofile 10 On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com wrote: Hi, Thanks for the info, Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? What is the consistency level ? Best Regardds, Kiran.M.K. On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com wrote: hi, What is the count of records in the column-family ? We have about 38,000 Rows in the column-family for which we are trying to export What is the Cassandra Version ? We are using Cassandra 2.0.11 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default . The Server is 8 GB. regards Neha On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com wrote: Hi, check the MAX_HEAP_SIZE configuration in cassandra-env.sh environment file Also HEAP_NEWSIZE ? What is the Consistency Level you are using ? Best REgards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com wrote: Seems like the is related to JAVA HEAP Memory. What is the count of records in the column-family ? What is the Cassandra Version ? Best Regards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hello all, We are getting the OutOfMemoryError on one of the Node and the Node is down, when we run the export command to get all the data from a table. Regards Neha ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:532074,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85) at org.apache.cassandra.db.Column$1.computeNext(Column.java:75) at org.apache.cassandra.db.Column$1.computeNext(Column.java:64) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) -- Best Regards, Kiran.M.K. -- Best Regards, Kiran.M.K. -- Best Regards, Kiran.M.K.
Re: Adding nodes to existing cluster
Start one node at a time. Wait 2 minutes before starting each node. How much data and nodes you have already? Depending on that, the streaming of data can stress on the resources you have. I would recommend to start one and monitor, if things are ok, add another one. And so on. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote: Hi all, In the near future I'll need to add more than 10 nodes to a 2.0.9 cluster (using vnodes). I read this documentation on datastax website: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html In one point it says: If you are using racks, you can safely bootstrap two nodes at a time when both nodes are on the same rack. And in another is says: Start Cassandra on each new node. Allow two minutes between node initializations. You can monitor the startup and data streaming process using nodetool netstats. We're not using racks configuration and from reading this documentation I'm not really sure is it safe for us to bootstrap all nodes together (with two minutes between each other). I really hate the tought of doing it one by one, I assume it will take more than 6H per node. What do you say? -- Or Sher -- --
Re: Adding nodes to existing cluster
unsubscribe On Apr 20, 2015, at 8:08 AM, Carlos Rolo r...@pythian.com wrote: Independent of the snitch, data needs to travel to the new nodes (plus all the keyspace information that goes via gossip). So I won't bootstrap them all at once, even if it is only for network traffic generated. Don't forget to run cleanup on the old nodes once all nodes are in place to reclaim disk space. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com http://www.pythian.com/ On Mon, Apr 20, 2015 at 1:58 PM, Or Sher or.sh...@gmail.com mailto:or.sh...@gmail.com wrote: Thanks for the response. Sure we'll monitor as we're adding nodes. We're now using 6 nodes on each DC. (We have 2 DCs) Each node contains ~800GB Do you know how rack configurations are relevant here? Do you see any reason to bootstrap them one by one if we're not using rack awareness? On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com mailto:r...@pythian.com wrote: Start one node at a time. Wait 2 minutes before starting each node. How much data and nodes you have already? Depending on that, the streaming of data can stress on the resources you have. I would recommend to start one and monitor, if things are ok, add another one. And so on. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 tel:%2B31%206%20159%2061%20814 | Tel: +1 613 565 8696 x1649 tel:%2B1%20613%20565%208696%20x1649 www.pythian.com http://www.pythian.com/ On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com mailto:or.sh...@gmail.com wrote: Hi all, In the near future I'll need to add more than 10 nodes to a 2.0.9 cluster (using vnodes). I read this documentation on datastax website: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html In one point it says: If you are using racks, you can safely bootstrap two nodes at a time when both nodes are on the same rack. And in another is says: Start Cassandra on each new node. Allow two minutes between node initializations. You can monitor the startup and data streaming process using nodetool netstats. We're not using racks configuration and from reading this documentation I'm not really sure is it safe for us to bootstrap all nodes together (with two minutes between each other). I really hate the tought of doing it one by one, I assume it will take more than 6H per node. What do you say? -- Or Sher -- -- Or Sher -- smime.p7s Description: S/MIME cryptographic signature
RE: Adding nodes to existing cluster
Hi Colin, To remove your address from the list, send a message to: user-unsubscr...@cassandra.apache.org Cheers, Matt *From:* Colin Clark [mailto:co...@clark.ws] *Sent:* 20 April 2015 14:10 *To:* user@cassandra.apache.org *Subject:* Re: Adding nodes to existing cluster unsubscribe On Apr 20, 2015, at 8:08 AM, Carlos Rolo r...@pythian.com wrote: Independent of the snitch, data needs to travel to the new nodes (plus all the keyspace information that goes via gossip). So I won't bootstrap them all at once, even if it is only for network traffic generated. Don't forget to run cleanup on the old nodes once all nodes are in place to reclaim disk space. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Apr 20, 2015 at 1:58 PM, Or Sher or.sh...@gmail.com wrote: Thanks for the response. Sure we'll monitor as we're adding nodes. We're now using 6 nodes on each DC. (We have 2 DCs) Each node contains ~800GB Do you know how rack configurations are relevant here? Do you see any reason to bootstrap them one by one if we're not using rack awareness? On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com wrote: Start one node at a time. Wait 2 minutes before starting each node. How much data and nodes you have already? Depending on that, the streaming of data can stress on the resources you have. I would recommend to start one and monitor, if things are ok, add another one. And so on. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote: Hi all, In the near future I'll need to add more than 10 nodes to a 2.0.9 cluster (using vnodes). I read this documentation on datastax website: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html In one point it says: If you are using racks, you can safely bootstrap two nodes at a time when both nodes are on the same rack. And in another is says: Start Cassandra on each new node. Allow two minutes between node initializations. You can monitor the startup and data streaming process using nodetool netstats. We're not using racks configuration and from reading this documentation I'm not really sure is it safe for us to bootstrap all nodes together (with two minutes between each other). I really hate the tought of doing it one by one, I assume it will take more than 6H per node. What do you say? -- Or Sher -- -- Or Sher --
Re: Adding nodes to existing cluster
Thanks for the response. Sure we'll monitor as we're adding nodes. We're now using 6 nodes on each DC. (We have 2 DCs) Each node contains ~800GB Do you know how rack configurations are relevant here? Do you see any reason to bootstrap them one by one if we're not using rack awareness? On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com wrote: Start one node at a time. Wait 2 minutes before starting each node. How much data and nodes you have already? Depending on that, the streaming of data can stress on the resources you have. I would recommend to start one and monitor, if things are ok, add another one. And so on. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote: Hi all, In the near future I'll need to add more than 10 nodes to a 2.0.9 cluster (using vnodes). I read this documentation on datastax website: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html In one point it says: If you are using racks, you can safely bootstrap two nodes at a time when both nodes are on the same rack. And in another is says: Start Cassandra on each new node. Allow two minutes between node initializations. You can monitor the startup and data streaming process using nodetool netstats. We're not using racks configuration and from reading this documentation I'm not really sure is it safe for us to bootstrap all nodes together (with two minutes between each other). I really hate the tought of doing it one by one, I assume it will take more than 6H per node. What do you say? -- Or Sher -- -- Or Sher
Re: Adding nodes to existing cluster
Independent of the snitch, data needs to travel to the new nodes (plus all the keyspace information that goes via gossip). So I won't bootstrap them all at once, even if it is only for network traffic generated. Don't forget to run cleanup on the old nodes once all nodes are in place to reclaim disk space. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: *linkedin.com/in/carlosjuzarterolo http://linkedin.com/in/carlosjuzarterolo* Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Apr 20, 2015 at 1:58 PM, Or Sher or.sh...@gmail.com wrote: Thanks for the response. Sure we'll monitor as we're adding nodes. We're now using 6 nodes on each DC. (We have 2 DCs) Each node contains ~800GB Do you know how rack configurations are relevant here? Do you see any reason to bootstrap them one by one if we're not using rack awareness? On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com wrote: Start one node at a time. Wait 2 minutes before starting each node. How much data and nodes you have already? Depending on that, the streaming of data can stress on the resources you have. I would recommend to start one and monitor, if things are ok, add another one. And so on. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote: Hi all, In the near future I'll need to add more than 10 nodes to a 2.0.9 cluster (using vnodes). I read this documentation on datastax website: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html In one point it says: If you are using racks, you can safely bootstrap two nodes at a time when both nodes are on the same rack. And in another is says: Start Cassandra on each new node. Allow two minutes between node initializations. You can monitor the startup and data streaming process using nodetool netstats. We're not using racks configuration and from reading this documentation I'm not really sure is it safe for us to bootstrap all nodes together (with two minutes between each other). I really hate the tought of doing it one by one, I assume it will take more than 6H per node. What do you say? -- Or Sher -- -- Or Sher -- --
Re: timeout creating table
Jimmy, What's the exact command that produced this trace? Are you saying that the 16-second wait in your trace what times out in your CREATE TABLE statements? Jim Witschey Software Engineer in Test | jim.witsc...@datastax.com On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: hi, we have some unit tests that run parallel that will create tmp keyspace, and tables and then drop them after tests are done. From time to time, our create table statement run into All hosts(s) for query failed... Timeout during read (from datastax driver) error. We later turn on tracing, and record something in the following. See below between === , Native_Transport-Request thread and MigrationStage thread, there was like 16 seconds doing something. Any idea what that 16 seconds Cassandra was doing? We can work around that but increasing our datastax driver timeout value, but wondering if there is actually better way to solve this? thanks tracing -- 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d97-e6e2-11e4-823d-93572f3db015 | Key cache hit for sstable 95588 | 127.0.0.1 | 1592 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d98-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 1593 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d99-e6e2-11e4-823d-93572f3db015 |Merging data from memtables and 3 sstables | 127.0.0.1 | 1595 | Native-Transport-Requests:102 = 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d9a-e6e2-11e4-823d-93572f3db015 | Read 3 live and 0 tombstoned cells | 127.0.0.1 | 1610 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a40-e6e2-11e4-823d-93572f3db015 | Executing seq scan across 1 sstables for (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 | 16381594 | MigrationStage:1 = 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a41-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 16381782 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a42-e6e2-11e4-823d-93572f3db015 | Read 0 live and 0 tombstoned cells | 127.0.0.1 | 16381787 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a43-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 16381789 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a44-e6e2-11e4-823d-93572f3db015 | Read 0 live and 0 tombstoned cells | 127.0.0.1 | 16381791 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a45-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 16381792 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a46-e6e2-11e4-823d-93572f3db015 | Read 0 live and 0 tombstoned cells | 127.0.0.1 | 16381794 | MigrationStage:1 . . .
Adding nodes to existing cluster
Hi all, In the near future I'll need to add more than 10 nodes to a 2.0.9 cluster (using vnodes). I read this documentation on datastax website: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html In one point it says: If you are using racks, you can safely bootstrap two nodes at a time when both nodes are on the same rack. And in another is says: Start Cassandra on each new node. Allow two minutes between node initializations. You can monitor the startup and data streaming process using nodetool netstats. We're not using racks configuration and from reading this documentation I'm not really sure is it safe for us to bootstrap all nodes together (with two minutes between each other). I really hate the tought of doing it one by one, I assume it will take more than 6H per node. What do you say? -- Or Sher
Handle Write Heavy Loads in Cassandra 2.0.3
Hi, Recently, we discovered that millions of mutations were getting dropped on our cluster. Eventually, we solved this problem by increasing the value of memtable_flush_writers from 1 to 3. We usually write 3 CFs simultaneously an one of them has 4 Secondary Indexes. New changes also include: concurrent_compactors: 12 (earlier it was default) compaction_throughput_mb_per_sec: 32(earlier it was default) in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64) memtable_flush_writers: 3 (earlier 1) After, making above changes, our write heavy workload scenarios started giving promotion failed exceptions in gc logs. We have done JVM tuning and Cassandra config changes to solve this: MAX_HEAP_SIZE=12G (Increased Heap to from 8G to reduce fragmentation) HEAP_NEWSIZE=3G JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=2 (We observed that even at SurvivorRatio=4, our survivor space was getting 100% utilized under heavy write load and we thought that minor collections were directly promoting objects to Tenured generation) JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=20 (Lots of objects were moving from Eden to Tenured on each minor collection..may be related to medium life objects related to Memtables and compactions as suggested by heapdump) JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 //though it's default value JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70 (to avoid concurrent failures we reduced value) Cassandra config: compaction_throughput_mb_per_sec: 24 memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default is 1/4 heap which creates more long lived objects) Questions: 1. Why increasing memtable_flush_writers caused promotion failures in JVM? Does more memtable_flush_writers mean more memtables in memory? 2. Still, objects are getting promoted at high speed to Tenured space. CMS is running on Old gen every 4-5 minutes under heavy write load. Around 750+ minor collections of upto 300ms happened in 45 mins. Do you see any problems with new JVM tuning and Cassandra config? Is the justification given against those changes sounds logical? Any suggestions? 3. What is the best practice for reducing heap fragmentation/promotion failure when allocation and promotion rates are high? Thanks Anuj
Re: Adding nodes to existing cluster
The documentation is referring to Consistent Range Movements. There is a change in 2.1 that won't allow you to bootstrap multiple nodes at the same time unless you explicitly turn off consistent range movements. Check out the jira: https://issues.apache.org/jira/browse/CASSANDRA-2434 All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Mon, Apr 20, 2015 at 10:40 AM, Or Sher or.sh...@gmail.com wrote: OK. Thanks. I'll monitor the resources status (network, memory, cpu, io) as I go and try to bootsrap them at chunks which seems not to have a bad impact. Will do regarding the cleanup. Thanks! On Mon, Apr 20, 2015 at 4:08 PM, Carlos Rolo r...@pythian.com wrote: Independent of the snitch, data needs to travel to the new nodes (plus all the keyspace information that goes via gossip). So I won't bootstrap them all at once, even if it is only for network traffic generated. Don't forget to run cleanup on the old nodes once all nodes are in place to reclaim disk space. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Apr 20, 2015 at 1:58 PM, Or Sher or.sh...@gmail.com wrote: Thanks for the response. Sure we'll monitor as we're adding nodes. We're now using 6 nodes on each DC. (We have 2 DCs) Each node contains ~800GB Do you know how rack configurations are relevant here? Do you see any reason to bootstrap them one by one if we're not using rack awareness? On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com wrote: Start one node at a time. Wait 2 minutes before starting each node. How much data and nodes you have already? Depending on that, the streaming of data can stress on the resources you have. I would recommend to start one and monitor, if things are ok, add another one. And so on. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote: Hi all, In the near future I'll need to add more than 10 nodes to a 2.0.9 cluster (using vnodes). I read this documentation on datastax website: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html In one point it says: If you are using racks, you can safely bootstrap two nodes at a time when both nodes are on the same rack. And in another is says: Start Cassandra on each new node. Allow two minutes between node initializations. You can monitor the startup and data streaming process using nodetool netstats. We're not using racks configuration and from reading this documentation I'm not really sure is it safe for us to bootstrap all nodes together (with two minutes between each other). I really hate the tought of doing it one by one, I assume it will take more than 6H per node. What do you say? -- Or Sher -- -- Or Sher -- -- Or Sher
Re: Adding nodes to existing cluster
OK. Thanks. I'll monitor the resources status (network, memory, cpu, io) as I go and try to bootsrap them at chunks which seems not to have a bad impact. Will do regarding the cleanup. Thanks! On Mon, Apr 20, 2015 at 4:08 PM, Carlos Rolo r...@pythian.com wrote: Independent of the snitch, data needs to travel to the new nodes (plus all the keyspace information that goes via gossip). So I won't bootstrap them all at once, even if it is only for network traffic generated. Don't forget to run cleanup on the old nodes once all nodes are in place to reclaim disk space. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Apr 20, 2015 at 1:58 PM, Or Sher or.sh...@gmail.com wrote: Thanks for the response. Sure we'll monitor as we're adding nodes. We're now using 6 nodes on each DC. (We have 2 DCs) Each node contains ~800GB Do you know how rack configurations are relevant here? Do you see any reason to bootstrap them one by one if we're not using rack awareness? On Mon, Apr 20, 2015 at 2:49 PM, Carlos Rolo r...@pythian.com wrote: Start one node at a time. Wait 2 minutes before starting each node. How much data and nodes you have already? Depending on that, the streaming of data can stress on the resources you have. I would recommend to start one and monitor, if things are ok, add another one. And so on. Regards, Carlos Juzarte Rolo Cassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo Mobile: +31 6 159 61 814 | Tel: +1 613 565 8696 x1649 www.pythian.com On Mon, Apr 20, 2015 at 11:02 AM, Or Sher or.sh...@gmail.com wrote: Hi all, In the near future I'll need to add more than 10 nodes to a 2.0.9 cluster (using vnodes). I read this documentation on datastax website: http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html In one point it says: If you are using racks, you can safely bootstrap two nodes at a time when both nodes are on the same rack. And in another is says: Start Cassandra on each new node. Allow two minutes between node initializations. You can monitor the startup and data streaming process using nodetool netstats. We're not using racks configuration and from reading this documentation I'm not really sure is it safe for us to bootstrap all nodes together (with two minutes between each other). I really hate the tought of doing it one by one, I assume it will take more than 6H per node. What do you say? -- Or Sher -- -- Or Sher -- -- Or Sher
Re: Getting ParNew GC in ... CMS Old Gen ... in logs
I think this is just saying that young gen collection using Par new collector took 248 seconds. This is quite normal with CMS unless it happens too frequenltly several times in a sec. I think query time has more to do with read timeout in yaml. Try increasing it. If its a range query then please increase range timeout in yaml. Thanks Anuj Wadehra Sent from Yahoo Mail on Android From:shahab shahab.mok...@gmail.com Date:Mon, 20 Apr, 2015 at 9:59 pm Subject:Getting ParNew GC in ... CMS Old Gen ... in logs Hi, I am keep getting following line in the cassandra logs, apparently something related to Garbage Collection. And I guess this is one of the signs why i do not get any response (i get time-out) when I query large volume of data ?!!! ParNew GC in 248ms. CMS Old Gen: 453244264 - 570471312; Par Eden Space: 167712624 - 0; Par Survivor Space: 0 - 20970080 Is above line is indication of something that need to be fixed in the system?? how can I resolve this? best, /Shahab
Re: Connecting to Cassandra cluster in AWS from local network
You'll have to configure your nodes to: 1. use AWS internal IPs for inter-node connection (check listen_address) and 2. use the AWS public IP for client-to-node connections (check rpc_address) Depending on the setup, there might be other interesting conf options in cassandra.yaml (broadcast_address, listen_interface, rpc_interface). [1]: http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html On Mon, Apr 20, 2015 at 9:50 AM, Jonathan Haddad j...@jonhaddad.com wrote: Ideally you'll be on the same network, but if you can't be, you'll need to use the public ip in listen_address. On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson matt.john...@algomi.com wrote: Hi all, I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just as a POC. Cassandra servers connect to each other over their internal AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3. I connect to it from my local dev environment using the seed’s external NAT address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed). When I try to connect, it connects fine, and can retrieve some data (I have very limited amounts of data in there, but it seems to retrieve ok), but I also get lots of stacktraces in my log where my dev environment is trying to connect to Cassandra on the internal IP (presumably the Cassandra seed node tells my dev env where to look): *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host sales3/54.x.x.142:9042 added* *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.237:9042 added* *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.170:9042 added* *Connected to cluster: Test Cluster* *Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1* *Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1* *Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1* *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0, closed=false] Transport initialized and ready* *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Session - Added connection pool for sales3/54.x.x.142:9042* *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0, closed=false] Error connecting to /172.x.x.237:9042 (connection timed out: /172.x.x.237:9042)* *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Defuncting connection to /172.x.x.237:9042* *com.datastax.driver.core.TransportException**: [/172.x.x.237:9042] Cannot connect* Does anyone have any experience with connecting to AWS clusters from dev machines? How have you set up your aliases to get around this issue? Current setup in sales3 (seed node) cassandra.yaml: *- seeds: sales3* *listen_address: sales3* *rpc_address: sales3* Current setup in other nodes (eg sales2) cassandra.yaml: *- seeds: sales3* *listen_address: sales2* *rpc_address: sales2* Thanks! Matt -- Bests, Alex Popescu | @al3xandru Sen. Product Manager @ DataStax
Re: Connecting to Cassandra cluster in AWS from local network
I would like to note that this will require all clients connect over the external IP address. If you have clients within Amazon that need to connect over the private IP address, this would not be possible. If you have a mix of clients that need to connect over private IP address and public, then one of the solutions outlined in https://datastax-oss.atlassian.net/browse/JAVA-145 may be more appropriate. -Russ From: Alex Popescu Reply-To: user@cassandra.apache.org Date: Monday, April 20, 2015 at 2:00 PM To: user Subject: Re: Connecting to Cassandra cluster in AWS from local network You'll have to configure your nodes to: 1. use AWS internal IPs for inter-node connection (check listen_address) and 2. use the AWS public IP for client-to-node connections (check rpc_address) Depending on the setup, there might be other interesting conf options in cassandra.yaml (broadcast_address, listen_interface, rpc_interface). [1]: http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html On Mon, Apr 20, 2015 at 9:50 AM, Jonathan Haddad j...@jonhaddad.com wrote: Ideally you'll be on the same network, but if you can't be, you'll need to use the public ip in listen_address. On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson matt.john...@algomi.com wrote: Hi all, I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just as a POC. Cassandra servers connect to each other over their internal AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3. I connect to it from my local dev environment using the seed’s external NAT address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed). When I try to connect, it connects fine, and can retrieve some data (I have very limited amounts of data in there, but it seems to retrieve ok), but I also get lots of stacktraces in my log where my dev environment is trying to connect to Cassandra on the internal IP (presumably the Cassandra seed node tells my dev env where to look): INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host sales3/54.x.x.142:9042 added INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.237:9042 added INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.170:9042 added Connected to cluster: Test Cluster Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1 Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1 Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1 DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0, closed=false] Transport initialized and ready DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Session - Added connection pool for sales3/54.x.x.142:9042 DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0, closed=false] Error connecting to /172.x.x.237:9042 (connection timed out: /172.x.x.237:9042) DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Defuncting connection to /172.x.x.237:9042 com.datastax.driver.core.TransportException: [/172.x.x.237:9042] Cannot connect Does anyone have any experience with connecting to AWS clusters from dev machines? How have you set up your aliases to get around this issue? Current setup in sales3 (seed node) cassandra.yaml: - seeds: sales3 listen_address: sales3 rpc_address: sales3 Current setup in other nodes (eg sales2) cassandra.yaml: - seeds: sales3 listen_address: sales2 rpc_address: sales2 Thanks! Matt -- Bests, Alex Popescu | @al3xandru Sen. Product Manager @ DataStax
Re: Connecting to Cassandra cluster in AWS from local network
Ideally you'll be on the same network, but if you can't be, you'll need to use the public ip in listen_address. On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson matt.john...@algomi.com wrote: Hi all, I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just as a POC. Cassandra servers connect to each other over their internal AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3. I connect to it from my local dev environment using the seed’s external NAT address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed). When I try to connect, it connects fine, and can retrieve some data (I have very limited amounts of data in there, but it seems to retrieve ok), but I also get lots of stacktraces in my log where my dev environment is trying to connect to Cassandra on the internal IP (presumably the Cassandra seed node tells my dev env where to look): *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host sales3/54.x.x.142:9042 added* *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.237:9042 added* *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.170:9042 added* *Connected to cluster: Test Cluster* *Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1* *Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1* *Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1* *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0, closed=false] Transport initialized and ready* *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Session - Added connection pool for sales3/54.x.x.142:9042* *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0, closed=false] Error connecting to /172.x.x.237:9042 (connection timed out: /172.x.x.237:9042)* *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Defuncting connection to /172.x.x.237:9042* *com.datastax.driver.core.TransportException**: [/172.x.x.237:9042] Cannot connect* Does anyone have any experience with connecting to AWS clusters from dev machines? How have you set up your aliases to get around this issue? Current setup in sales3 (seed node) cassandra.yaml: *- seeds: sales3* *listen_address: sales3* *rpc_address: sales3* Current setup in other nodes (eg sales2) cassandra.yaml: *- seeds: sales3* *listen_address: sales2* *rpc_address: sales2* Thanks! Matt
Connecting to Cassandra cluster in AWS from local network
Hi all, I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just as a POC. Cassandra servers connect to each other over their internal AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3. I connect to it from my local dev environment using the seed’s external NAT address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed). When I try to connect, it connects fine, and can retrieve some data (I have very limited amounts of data in there, but it seems to retrieve ok), but I also get lots of stacktraces in my log where my dev environment is trying to connect to Cassandra on the internal IP (presumably the Cassandra seed node tells my dev env where to look): *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host sales3/54.x.x.142:9042 added* *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.237:9042 added* *INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.170:9042 added* *Connected to cluster: Test Cluster* *Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1* *Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1* *Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1* *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0, closed=false] Transport initialized and ready* *DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Session - Added connection pool for sales3/54.x.x.142:9042* *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0, closed=false] Error connecting to /172.x.x.237:9042 (connection timed out: /172.x.x.237:9042)* *DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Defuncting connection to /172.x.x.237:9042* *com.datastax.driver.core.TransportException**: [/172.x.x.237:9042] Cannot connect* Does anyone have any experience with connecting to AWS clusters from dev machines? How have you set up your aliases to get around this issue? Current setup in sales3 (seed node) cassandra.yaml: *- seeds: sales3* *listen_address: sales3* *rpc_address: sales3* Current setup in other nodes (eg sales2) cassandra.yaml: *- seeds: sales3* *listen_address: sales2* *rpc_address: sales2* Thanks! Matt
Re: Connecting to Cassandra cluster in AWS from local network
There are a couple options here. You can use the built in address translator, or, write a new load balancing policy. See https://datastax-oss.atlassian.net/browse/JAVA-145 for more information. From: Jonathan Haddad Reply-To: user@cassandra.apache.org Date: Monday, April 20, 2015 at 12:50 PM To: user@cassandra.apache.org Subject: Re: Connecting to Cassandra cluster in AWS from local network Ideally you'll be on the same network, but if you can't be, you'll need to use the public ip in listen_address. On Mon, Apr 20, 2015 at 9:47 AM Matthew Johnson matt.john...@algomi.com wrote: Hi all, I have set up a Cassandra cluster with 2.1.4 on some existing AWS boxes, just as a POC. Cassandra servers connect to each other over their internal AWS IP addresses (172.x.x.x) aliased in /etc/hosts as sales1, sales2 and sales3. I connect to it from my local dev environment using the seed’s external NAT address (54.x.x.x) aliases in my Windows hosts file as sales3 (my seed). When I try to connect, it connects fine, and can retrieve some data (I have very limited amounts of data in there, but it seems to retrieve ok), but I also get lots of stacktraces in my log where my dev environment is trying to connect to Cassandra on the internal IP (presumably the Cassandra seed node tells my dev env where to look): INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host sales3/54.x.x.142:9042 added INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.237:9042 added INFO 2015-04-20 16:34:14,808 [CASSANDRA-CLIENT] {main} Cluster - New Cassandra host /172.x.x.170:9042 added Connected to cluster: Test Cluster Datatacenter: datacenter1; Host: /172.x.x.170; Rack: rack1 Datatacenter: datacenter1; Host: sales3/54.x.x.142; Rack: rack1 Datatacenter: datacenter1; Host: /172.x.x.237; Rack: rack1 DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Connection - Connection[sales3/54.x.x.142:9042-2, inFlight=0, closed=false] Transport initialized and ready DEBUG 2015-04-20 16:34:14,901 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-0} Session - Added connection pool for sales3/54.x.x.142:9042 DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Connection[/172.x.x.237:9042-1, inFlight=0, closed=false] Error connecting to /172.x.x.237:9042 (connection timed out: /172.x.x.237:9042) DEBUG 2015-04-20 16:34:19,850 [CASSANDRA-CLIENT] {Cassandra Java Driver worker-1} Connection - Defuncting connection to /172.x.x.237:9042 com.datastax.driver.core.TransportException: [/172.x.x.237:9042] Cannot connect Does anyone have any experience with connecting to AWS clusters from dev machines? How have you set up your aliases to get around this issue? Current setup in sales3 (seed node) cassandra.yaml: - seeds: sales3 listen_address: sales3 rpc_address: sales3 Current setup in other nodes (eg sales2) cassandra.yaml: - seeds: sales3 listen_address: sales2 rpc_address: sales2 Thanks! Matt
Cassandra based web app benchmark
Hi, TechEmpower Web Framework Benchmarks ( https://www.techempower.com/benchmarks/) is a collaborative effort for measuring performance of a large number of contemporary web development platforms. Benchmarking and test implementation code is published as open-source. I've contributed a test implementation that uses Apache Cassandra for data storage and based on the following technology stack: * Java * Resin app server + Servlet 3 with asynchronous processing * Apache Cassandra database (v2.0.12) TFB Round 10 results are expected to be released in the near future with results from Cassandra based test implementation included. Now that the initial test implementation has been merged as part of the project codebase, I'd like to solicit feedback from the Cassandra user and developer community on best practices, especially wrt. to performance, with the hope that the test implementation can get the best performance out of Cassandra in future benchmark rounds. Any review comments and pull requests would be welcome. The code can be found on Github: https://github.com/TechEmpower/FrameworkBenchmarks https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Java/servlet3-cass https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/config/cassandra More info on the benchmark project, as well as the Cassandra based test implementation can be found here: http://practicingtechie.com/2014/09/10/web-application-framework-benchmarks/ thanks, marko
Getting ParNew GC in ... CMS Old Gen ... in logs
Hi, I am keep getting following line in the cassandra logs, apparently something related to Garbage Collection. And I guess this is one of the signs why i do not get any response (i get time-out) when I query large volume of data ?!!! ParNew GC in 248ms. CMS Old Gen: 453244264 - 570471312; Par Eden Space: 167712624 - 0; Par Survivor Space: 0 - 20970080 Is above line is indication of something that need to be fixed in the system?? how can I resolve this? best, /Shahab
Re: timeout creating table
Yes, sometimes it is create table and sometime it is create index. It doesn't happen all the time, but feel like if multiple tests trying to do schema change(create or drop), Cassandra has a long delay on the schema change statements. I also just read about auto_snapshot, and I turn it off but still no luck. On Mon, Apr 20, 2015 at 6:42 AM, Jim Witschey jim.witsc...@datastax.com wrote: Jimmy, What's the exact command that produced this trace? Are you saying that the 16-second wait in your trace what times out in your CREATE TABLE statements? Jim Witschey Software Engineer in Test | jim.witsc...@datastax.com On Sun, Apr 19, 2015 at 7:13 PM, Jimmy Lin y2klyf+w...@gmail.com wrote: hi, we have some unit tests that run parallel that will create tmp keyspace, and tables and then drop them after tests are done. From time to time, our create table statement run into All hosts(s) for query failed... Timeout during read (from datastax driver) error. We later turn on tracing, and record something in the following. See below between === , Native_Transport-Request thread and MigrationStage thread, there was like 16 seconds doing something. Any idea what that 16 seconds Cassandra was doing? We can work around that but increasing our datastax driver timeout value, but wondering if there is actually better way to solve this? thanks tracing -- 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d97-e6e2-11e4-823d-93572f3db015 | Key cache hit for sstable 95588 | 127.0.0.1 | 1592 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d98-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 1593 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d99-e6e2-11e4-823d-93572f3db015 | Merging data from memtables and 3 sstables | 127.0.0.1 | 1595 | Native-Transport-Requests:102 = 5872bf70-e6e2-11e4-823d-93572f3db015 | 58730d9a-e6e2-11e4-823d-93572f3db015 | Read 3 live and 0 tombstoned cells | 127.0.0.1 | 1610 | Native-Transport-Requests:102 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a40-e6e2-11e4-823d-93572f3db015 | Executing seq scan across 1 sstables for (min(-9223372036854775808), min(-9223372036854775808)] | 127.0.0.1 | 16381594 | MigrationStage:1 = 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a41-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 16381782 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a42-e6e2-11e4-823d-93572f3db015 | Read 0 live and 0 tombstoned cells | 127.0.0.1 | 16381787 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a43-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 16381789 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a44-e6e2-11e4-823d-93572f3db015 | Read 0 live and 0 tombstoned cells | 127.0.0.1 | 16381791 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a45-e6e2-11e4-823d-93572f3db015 | Seeking to partition beginning in data file | 127.0.0.1 | 16381792 | MigrationStage:1 5872bf70-e6e2-11e4-823d-93572f3db015 | 62364a46-e6e2-11e4-823d-93572f3db015 | Read 0 live and 0 tombstoned cells | 127.0.0.1 | 16381794 | MigrationStage:1 . . .
Re: COPY command to export a table to CSV file
Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? it's all default. What is the consistency level ? CL = Qurom Is there any other way to export a table to CSV? regards Neha On Mon, Apr 20, 2015 at 12:21 PM, Kiran mk coolkiran2...@gmail.com wrote: Hi, Thanks for the info, Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? What is the consistency level ? Best Regardds, Kiran.M.K. On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com wrote: hi, What is the count of records in the column-family ? We have about 38,000 Rows in the column-family for which we are trying to export What is the Cassandra Version ? We are using Cassandra 2.0.11 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default . The Server is 8 GB. regards Neha On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com wrote: Hi, check the MAX_HEAP_SIZE configuration in cassandra-env.sh environment file Also HEAP_NEWSIZE ? What is the Consistency Level you are using ? Best REgards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com wrote: Seems like the is related to JAVA HEAP Memory. What is the count of records in the column-family ? What is the Cassandra Version ? Best Regards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hello all, We are getting the OutOfMemoryError on one of the Node and the Node is down, when we run the export command to get all the data from a table. Regards Neha ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:532074,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85) at org.apache.cassandra.db.Column$1.computeNext(Column.java:75) at org.apache.cassandra.db.Column$1.computeNext(Column.java:64) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) -- Best Regards, Kiran.M.K. -- Best Regards,
Re: Handle Write Heavy Loads in Cassandra 2.0.3
Small correction: we are making writes in 5 cf an reading frm one at high speeds. Thanks Anuj Wadehra Sent from Yahoo Mail on Android From:Anuj Wadehra anujw_2...@yahoo.co.in Date:Mon, 20 Apr, 2015 at 7:53 pm Subject:Handle Write Heavy Loads in Cassandra 2.0.3 Hi, Recently, we discovered that millions of mutations were getting dropped on our cluster. Eventually, we solved this problem by increasing the value of memtable_flush_writers from 1 to 3. We usually write 3 CFs simultaneously an one of them has 4 Secondary Indexes. New changes also include: concurrent_compactors: 12 (earlier it was default) compaction_throughput_mb_per_sec: 32(earlier it was default) in_memory_compaction_limit_in_mb: 400 ((earlier it was default 64) memtable_flush_writers: 3 (earlier 1) After, making above changes, our write heavy workload scenarios started giving promotion failed exceptions in gc logs. We have done JVM tuning and Cassandra config changes to solve this: MAX_HEAP_SIZE=12G (Increased Heap to from 8G to reduce fragmentation) HEAP_NEWSIZE=3G JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=2 (We observed that even at SurvivorRatio=4, our survivor space was getting 100% utilized under heavy write load and we thought that minor collections were directly promoting objects to Tenured generation) JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=20 (Lots of objects were moving from Eden to Tenured on each minor collection..may be related to medium life objects related to Memtables and compactions as suggested by heapdump) JVM_OPTS=$JVM_OPTS -XX:ConcGCThreads=20 JVM_OPTS=$JVM_OPTS -XX:+UnlockDiagnosticVMOptions JVM_OPTS=$JVM_OPTS -XX:+UseGCTaskAffinity JVM_OPTS=$JVM_OPTS -XX:+BindGCTaskThreadsToCPUs JVM_OPTS=$JVM_OPTS -XX:ParGCCardsPerStrideChunk=32768 JVM_OPTS=$JVM_OPTS -XX:+CMSScavengeBeforeRemark JVM_OPTS=$JVM_OPTS -XX:CMSMaxAbortablePrecleanTime=3 JVM_OPTS=$JVM_OPTS -XX:CMSWaitDuration=2000 //though it's default value JVM_OPTS=$JVM_OPTS -XX:+CMSEdenChunksRecordAlways JVM_OPTS=$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled JVM_OPTS=$JVM_OPTS -XX:-UseBiasedLocking JVM_OPTS=$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=70 (to avoid concurrent failures we reduced value) Cassandra config: compaction_throughput_mb_per_sec: 24 memtable_total_space_in_mb: 1000 (to make memtable flush frequent.default is 1/4 heap which creates more long lived objects) Questions: 1. Why increasing memtable_flush_writers and in_memory_compaction_limit_in_mb caused promotion failures in JVM? Does more memtable_flush_writers mean more memtables in memory? 2. Still, objects are getting promoted at high speed to Tenured space. CMS is running on Old gen every 4-5 minutes under heavy write load. Around 750+ minor collections of upto 300ms happened in 45 mins. Do you see any problems with new JVM tuning and Cassandra config? Is the justification given against those changes sounds logical? Any suggestions? 3. What is the best practice for reducing heap fragmentation/promotion failure when allocation and promotion rates are high? Thanks Anuj
Re: Getting ParNew GC in ... CMS Old Gen ... in logs
I meant 248 milli seconds Sent from Yahoo Mail on Android From:Anuj Wadehra anujw_2...@yahoo.co.in Date:Mon, 20 Apr, 2015 at 11:41 pm Subject:Re: Getting ParNew GC in ... CMS Old Gen ... in logs I think this is just saying that young gen collection using Par new collector took 248 seconds. This is quite normal with CMS unless it happens too frequenltly several times in a sec. I think query time has more to do with read timeout in yaml. Try increasing it. If its a range query then please increase range timeout in yaml. Thanks Anuj Wadehra Sent from Yahoo Mail on Android From:shahab shahab.mok...@gmail.com Date:Mon, 20 Apr, 2015 at 9:59 pm Subject:Getting ParNew GC in ... CMS Old Gen ... in logs Hi, I am keep getting following line in the cassandra logs, apparently something related to Garbage Collection. And I guess this is one of the signs why i do not get any response (i get time-out) when I query large volume of data ?!!! ParNew GC in 248ms. CMS Old Gen: 453244264 - 570471312; Par Eden Space: 167712624 - 0; Par Survivor Space: 0 - 20970080 Is above line is indication of something that need to be fixed in the system?? how can I resolve this? best, /Shahab
Re: COPY command to export a table to CSV file
Hi, check the MAX_HEAP_SIZE configuration in cassandra-env.sh environment file Also HEAP_NEWSIZE ? What is the Consistency Level you are using ? Best REgards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com wrote: Seems like the is related to JAVA HEAP Memory. What is the count of records in the column-family ? What is the Cassandra Version ? Best Regards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hello all, We are getting the OutOfMemoryError on one of the Node and the Node is down, when we run the export command to get all the data from a table. Regards Neha ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:532074,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85) at org.apache.cassandra.db.Column$1.computeNext(Column.java:75) at org.apache.cassandra.db.Column$1.computeNext(Column.java:64) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) -- Best Regards, Kiran.M.K. -- Best Regards, Kiran.M.K.
Re: COPY command to export a table to CSV file
hi, What is the count of records in the column-family ? We have about 38,000 Rows in the column-family for which we are trying to export What is the Cassandra Version ? We are using Cassandra 2.0.11 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default . The Server is 8 GB. regards Neha On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com wrote: Hi, check the MAX_HEAP_SIZE configuration in cassandra-env.sh environment file Also HEAP_NEWSIZE ? What is the Consistency Level you are using ? Best REgards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com wrote: Seems like the is related to JAVA HEAP Memory. What is the count of records in the column-family ? What is the Cassandra Version ? Best Regards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hello all, We are getting the OutOfMemoryError on one of the Node and the Node is down, when we run the export command to get all the data from a table. Regards Neha ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:532074,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85) at org.apache.cassandra.db.Column$1.computeNext(Column.java:75) at org.apache.cassandra.db.Column$1.computeNext(Column.java:64) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) -- Best Regards, Kiran.M.K. -- Best Regards, Kiran.M.K.
Re: COPY command to export a table to CSV file
Hi, Thanks for the info, Does the nproc,nofile,memlock settings in /etc/security/limits.d/cassandra.conf are set to optimum value ? What is the consistency level ? Best Regardds, Kiran.M.K. On Mon, Apr 20, 2015 at 11:55 AM, Neha Trivedi nehajtriv...@gmail.com wrote: hi, What is the count of records in the column-family ? We have about 38,000 Rows in the column-family for which we are trying to export What is the Cassandra Version ? We are using Cassandra 2.0.11 MAX_HEAP_SIZE and HEAP_NEWSIZE is the default . The Server is 8 GB. regards Neha On Mon, Apr 20, 2015 at 11:39 AM, Kiran mk coolkiran2...@gmail.com wrote: Hi, check the MAX_HEAP_SIZE configuration in cassandra-env.sh environment file Also HEAP_NEWSIZE ? What is the Consistency Level you are using ? Best REgards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:13 AM, Kiran mk coolkiran2...@gmail.com wrote: Seems like the is related to JAVA HEAP Memory. What is the count of records in the column-family ? What is the Cassandra Version ? Best Regards, Kiran.M.K. On Mon, Apr 20, 2015 at 11:08 AM, Neha Trivedi nehajtriv...@gmail.com wrote: Hello all, We are getting the OutOfMemoryError on one of the Node and the Node is down, when we run the export command to get all the data from a table. Regards Neha ERROR [ReadStage:532074] 2015-04-09 01:04:00,603 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:532074,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.io.util.RandomAccessReader.readBytes(RandomAccessReader.java:347) at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:392) at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:355) at org.apache.cassandra.db.ColumnSerializer.deserializeColumnBody(ColumnSerializer.java:124) at org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:85) at org.apache.cassandra.db.Column$1.computeNext(Column.java:75) at org.apache.cassandra.db.Column$1.computeNext(Column.java:64) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:88) at org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:82) at org.apache.cassandra.db.columniterator.LazyColumnIterator.computeNext(LazyColumnIterator.java:59) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:157) at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:140) at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:200) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:185) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:101) at org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:75) at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:115) at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:98) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) -- Best Regards, Kiran.M.K. -- Best Regards, Kiran.M.K. -- Best Regards, Kiran.M.K.