Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Jonathan Haddad
How much memory do you have?  Recently people have been seeing really great
performance using G1GC with heaps > 8GB and offheap memtable objects.

On Thu, Jun 18, 2015 at 1:31 AM Jason Wee  wrote:

> okay, iirc memtable has been removed off heap, google and got this
> http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1
>  apparently, there are still some reference on heap.
>
> On Thu, Jun 18, 2015 at 1:11 PM, Marcus Eriksson 
> wrote:
>
>> It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549
>>
>> On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki 
>> wrote:
>>
>>> Looks that memtable heap size is growing on some nodes rapidly (
>>> https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
>>> Drops are the places when nodes have been restarted.
>>>
>>> On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki 
>>> wrote:
>>>
 Hi,

 Two datacenters with 6 nodes (2.1.6) each. In each DC garbage
 collection is launched at the same time on each node (See [1] for total GC
 duration per 5 seconds). RF is set to 3. Any ideas?

 [1]
 https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

 --
 BR,
 Michał Łowicki

>>>
>>>
>>>
>>> --
>>> BR,
>>> Michał Łowicki
>>>
>>
>>
>


Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Jason Wee
okay, iirc memtable has been removed off heap, google and got this
http://www.datastax.com/dev/blog/off-heap-memtables-in-Cassandra-2-1
 apparently, there are still some reference on heap.

On Thu, Jun 18, 2015 at 1:11 PM, Marcus Eriksson  wrote:

> It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549
>
> On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki 
> wrote:
>
>> Looks that memtable heap size is growing on some nodes rapidly (
>> https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
>> Drops are the places when nodes have been restarted.
>>
>> On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki 
>> wrote:
>>
>>> Hi,
>>>
>>> Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection
>>> is launched at the same time on each node (See [1] for total GC duration
>>> per 5 seconds). RF is set to 3. Any ideas?
>>>
>>> [1]
>>> https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0
>>>
>>> --
>>> BR,
>>> Michał Łowicki
>>>
>>
>>
>>
>> --
>> BR,
>> Michał Łowicki
>>
>
>


Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Marcus Eriksson
It is probably this: https://issues.apache.org/jira/browse/CASSANDRA-9549

On Wed, Jun 17, 2015 at 7:37 PM, Michał Łowicki  wrote:

> Looks that memtable heap size is growing on some nodes rapidly (
> https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
> Drops are the places when nodes have been restarted.
>
> On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki 
> wrote:
>
>> Hi,
>>
>> Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection
>> is launched at the same time on each node (See [1] for total GC duration
>> per 5 seconds). RF is set to 3. Any ideas?
>>
>> [1]
>> https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0
>>
>> --
>> BR,
>> Michał Łowicki
>>
>
>
>
> --
> BR,
> Michał Łowicki
>


Re: Garbage collector launched on all nodes at once

2015-06-17 Thread Michał Łowicki
Looks that memtable heap size is growing on some nodes rapidly (
https://www.dropbox.com/s/3brloiy3fqang1r/Screenshot%202015-06-17%2019.21.49.png?dl=0).
Drops are the places when nodes have been restarted.

On Wed, Jun 17, 2015 at 6:53 PM, Michał Łowicki  wrote:

> Hi,
>
> Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection
> is launched at the same time on each node (See [1] for total GC duration
> per 5 seconds). RF is set to 3. Any ideas?
>
> [1]
> https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0
>
> --
> BR,
> Michał Łowicki
>



-- 
BR,
Michał Łowicki


Garbage collector launched on all nodes at once

2015-06-17 Thread Michał Łowicki
Hi,

Two datacenters with 6 nodes (2.1.6) each. In each DC garbage collection is
launched at the same time on each node (See [1] for total GC duration per 5
seconds). RF is set to 3. Any ideas?

[1]
https://www.dropbox.com/s/bsbyew1jxbe3dgo/Screenshot%202015-06-17%2018.49.48.png?dl=0

-- 
BR,
Michał Łowicki


Re: Connection reset during repair service

2015-06-17 Thread Alain RODRIGUEZ
Regarding the Datastax repair service I saw the same error over here.

Here is the datastax answer fwiw:

"The repair service timeout message is telling you that the service has not
received a response from the nodetool repair process running on Cassandra
within the configured (default) 3600 seconds. When this happens, the
Opscenter repair service stops monitoring the progress and places the sub
range repair request to the back of a queue to be re-run at a later time.
Is not necessarily indicative of a repair failure but it does suggest that
the repair process is taking longer than expected for some reason,
typically due to a hang, network issues, or wide rows on the table being
repaired.

As a possible workaround you can increase the timeout value in opscenter by
increasing the timeout period in the opscenterd.conf or .conf
(cluster takes precedence ) but if there is an underlying issue with
repairs completing on Cassandra this will not help.

single_repair_timeout = 3600

(see:
http://docs.datastax.com/en/opscenter/4.1/opsc/online_help/services/repairServiceAdvancedConfiguration.html
)."




2015-06-17 15:21 GMT+02:00 Sebastian Estevez :

> Do you do a ton of random updates amd deletes? That would not be a good
> workload for DTCS.
>
> Where are all your tombstones coming from?
>  On Jun 17, 2015 3:43 AM, "Alain RODRIGUEZ"  wrote:
>
>> Hi David, Edouard,
>>
>> Depending on your data model on event_data, you might want to consider
>> upgrading to use DTCS (C* 2.0.11+).
>>
>> Basically if those tombstones are due to a a Constant TTL and this is a
>> time series, it could be a real improvement.
>>
>> See:
>> https://labs.spotify.com/2014/12/18/date-tiered-compaction/
>> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
>>
>> I am not sure this is related to your problem but having 8904 tombstones
>> read at once is pretty bad. Also you might want to paginate queries a bit
>> since it looks like you retrieve a lot of data at once.
>>
>> Meanwhile, if you are using STCS you can consider performing major
>> compaction on a regular basis (taking into consideration major compaction
>> downsides)
>>
>> C*heers,
>>
>> Alain
>>
>>
>>
>>
>>
>> 2015-06-12 15:08 GMT+02:00 David CHARBONNIER <
>> david.charbonn...@rgsystem.com>:
>>
>>>  Hi,
>>>
>>>
>>>
>>> We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 and
>>> we’re experiencing issues with OPSCenter (version 5.1.3) Repair Service.
>>>
>>> When Repair Service is running, we can see repair timing out on a few
>>> ranges in OPSCenter’s event log viewer. See screenshot attached.
>>>
>>>
>>>
>>> On our Cassandra nodes, we can see a lot of theese messages in
>>> cassandra/system.log log file while a timeout shows up in OPSCenter :
>>>
>>>
>>>
>>> ERROR [Native-Transport-Requests:3372] 2015-06-12
>>> 02:22:33,231 ErrorMessage.java (line 222) Unexpected exception during
>>> request
>>>
>>> java.io.IOException: Connection reset by peer
>>>
>>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>>
>>> at sun.nio.ch.SocketDispatcher.read(Unknown Source)
>>>
>>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>>>
>>> at sun.nio.ch.IOUtil.read(Unknown Source)
>>>
>>> at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>>>
>>> at
>>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
>>>
>>> at
>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
>>>
>>> at
>>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>>>
>>> at
>>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
>>>
>>> at
>>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>>
>>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>>> Source)
>>>
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>>> Source)
>>>
>>> at java.lang.Thread.run(Unknown Source)
>>>
>>>
>>>
>>> You’ll find attached an extract of the system.log file with some more
>>> informations.
>>>
>>>
>>>
>>> Do you have any idea of what’s happening ?
>>>
>>>
>>>
>>> We suspect timeouts happening because we have some tables with many
>>> tombstones, and a warning is something triggered. We have edited the
>>> configuration allow warning, but still perform until encounter 1.000.000
>>> tombstones.
>>>
>>>
>>>
>>> During a compaction, we’ve also warning messages telling us that we’ve a
>>> lot of tombstones too :
>>>
>>>
>>>
>>> WARN [CompactionExecutor:1584] 2015-06-11 19:22:24,904
>>> SliceQueryFilter.java (line 225) Read 8640 live and 8904 tombstoned cells
>>> in rgsupv.event_data (see tombstone_warn_threshold). 1 columns was
>>> requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
>>> localDeletion=2147483647}
>>>
>>>
>>>
>>> Do you think it’s related to our first problem ?
>>>
>>>
>>>
>>

Re: Connection reset during repair service

2015-06-17 Thread Sebastian Estevez
Do you do a ton of random updates amd deletes? That would not be a good
workload for DTCS.

Where are all your tombstones coming from?
 On Jun 17, 2015 3:43 AM, "Alain RODRIGUEZ"  wrote:

> Hi David, Edouard,
>
> Depending on your data model on event_data, you might want to consider
> upgrading to use DTCS (C* 2.0.11+).
>
> Basically if those tombstones are due to a a Constant TTL and this is a
> time series, it could be a real improvement.
>
> See:
> https://labs.spotify.com/2014/12/18/date-tiered-compaction/
> http://www.datastax.com/dev/blog/datetieredcompactionstrategy
>
> I am not sure this is related to your problem but having 8904 tombstones
> read at once is pretty bad. Also you might want to paginate queries a bit
> since it looks like you retrieve a lot of data at once.
>
> Meanwhile, if you are using STCS you can consider performing major
> compaction on a regular basis (taking into consideration major compaction
> downsides)
>
> C*heers,
>
> Alain
>
>
>
>
>
> 2015-06-12 15:08 GMT+02:00 David CHARBONNIER <
> david.charbonn...@rgsystem.com>:
>
>>  Hi,
>>
>>
>>
>> We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 and
>> we’re experiencing issues with OPSCenter (version 5.1.3) Repair Service.
>>
>> When Repair Service is running, we can see repair timing out on a few
>> ranges in OPSCenter’s event log viewer. See screenshot attached.
>>
>>
>>
>> On our Cassandra nodes, we can see a lot of theese messages in
>> cassandra/system.log log file while a timeout shows up in OPSCenter :
>>
>>
>>
>> ERROR [Native-Transport-Requests:3372] 2015-06-12
>> 02:22:33,231 ErrorMessage.java (line 222) Unexpected exception during
>> request
>>
>> java.io.IOException: Connection reset by peer
>>
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>
>> at sun.nio.ch.SocketDispatcher.read(Unknown Source)
>>
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>>
>> at sun.nio.ch.IOUtil.read(Unknown Source)
>>
>> at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>>
>> at
>> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
>>
>> at
>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
>>
>> at
>> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>>
>> at
>> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
>>
>> at
>> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>>
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
>> Source)
>>
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
>> Source)
>>
>> at java.lang.Thread.run(Unknown Source)
>>
>>
>>
>> You’ll find attached an extract of the system.log file with some more
>> informations.
>>
>>
>>
>> Do you have any idea of what’s happening ?
>>
>>
>>
>> We suspect timeouts happening because we have some tables with many
>> tombstones, and a warning is something triggered. We have edited the
>> configuration allow warning, but still perform until encounter 1.000.000
>> tombstones.
>>
>>
>>
>> During a compaction, we’ve also warning messages telling us that we’ve a
>> lot of tombstones too :
>>
>>
>>
>> WARN [CompactionExecutor:1584] 2015-06-11 19:22:24,904
>> SliceQueryFilter.java (line 225) Read 8640 live and 8904 tombstoned cells
>> in rgsupv.event_data (see tombstone_warn_threshold). 1 columns was
>> requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
>> localDeletion=2147483647}
>>
>>
>>
>> Do you think it’s related to our first problem ?
>>
>>
>>
>> Our cluster is configured as follow :
>>
>> -  8 nodes with Debian 7.8 x64
>>
>> -  16Gb of memory and 4 CPU
>>
>> -  2  HDD : 1 for the system and the other for the data directory
>>
>>
>>
>> Best regards,
>>
>>
>>
>> *David CHARBONNIER*
>>
>> Sysadmin
>>
>> T : +33 411 934 200
>>
>> david.charbonn...@rgsystem.com
>>
>> ZAC Aéroport
>>
>> 125 Impasse Adam Smith
>>
>> 34470 Pérols - France
>>
>> *www.rgsystem.com* 
>>
>>
>>
>>
>>
>>
>>
>
>


Minor compaction not triggered

2015-06-17 Thread Jayapandian Ponraj
Hi

I have a cassandra cluster of 6 nodes, with DateTiered compaction for
the tables/CFs
For some reason the minor compaction never happens.
I have enabled debug logging and I don't see any debug logs related to
compaction like the following

https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/compaction/CompactionManager.java#L150
https://github.com/apache/cassandra/blob/cassandra-2.0/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategy.java#L127

As a result of no compactions, now the cluster has more than 50K
SStables per node.
How do i debug this issue further?
Appreciate any help..


spark-sql estimates Cassandra table with 3 rows as 8 TB of data, Cassandra 2.1, DSE 4.7

2015-06-17 Thread Serega Sheypak
Hi, spark-sql estimated input for Cassandra table with 3 rows as 8 TB.
sometimes it's estimated as -167B.
I run it on laptop, I don't have 8 TB space for the data.

We use DSE 4.7 with bundled spark and spark-sql-thriftserver

Here is the stat for a dummy select foo from bar where bar three rows and
several columns


   - *Total task time across all tasks: *7.6 min
   - *Input: *8388608.0 TB

I don't have so much TB on my macbook pro. I would like to, but I dont :(


Re: Using Cassandra and Twisted (Python)

2015-06-17 Thread Jonathan Ballet

Hello Alex,

thanks for your answer! I'll try posting there as well then!

Best,

 Jonathan


On 06/16/2015 07:05 PM, Alex Popescu wrote:

Jonathan,

I'm pretty sure you'll have better chances to get this answered on the
Python driver mailing list
https://groups.google.com/a/lists.datastax.com/forum/#!forum/python-driver-user

On Tue, Jun 16, 2015 at 1:01 AM, Jonathan Ballet mailto:jbal...@gfproducts.ch>> wrote:

Hi,

I'd like to write some Python applications using Twisted to talk to
a Cassandra cluster.

It seems like the Datastax Python library from
https://github.com/datastax/python-driver does support Twisted, but
it's not exactly clear how I would use this library along with
Twisted. The documentation for the async API is very sparse and
there's no mention on how to plug this into Twisted event-loop.

Does anyone have a small working example on how to use both of these?

Thanks!

Jonathan




--
Bests,

Alex Popescu | @al3xandru
Sen. Product Manager @ DataStax



Re: Connection reset during repair service

2015-06-17 Thread Alain RODRIGUEZ
Hi David, Edouard,

Depending on your data model on event_data, you might want to consider
upgrading to use DTCS (C* 2.0.11+).

Basically if those tombstones are due to a a Constant TTL and this is a
time series, it could be a real improvement.

See:
https://labs.spotify.com/2014/12/18/date-tiered-compaction/
http://www.datastax.com/dev/blog/datetieredcompactionstrategy

I am not sure this is related to your problem but having 8904 tombstones
read at once is pretty bad. Also you might want to paginate queries a bit
since it looks like you retrieve a lot of data at once.

Meanwhile, if you are using STCS you can consider performing major
compaction on a regular basis (taking into consideration major compaction
downsides)

C*heers,

Alain





2015-06-12 15:08 GMT+02:00 David CHARBONNIER :

>  Hi,
>
>
>
> We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 and we’re
> experiencing issues with OPSCenter (version 5.1.3) Repair Service.
>
> When Repair Service is running, we can see repair timing out on a few
> ranges in OPSCenter’s event log viewer. See screenshot attached.
>
>
>
> On our Cassandra nodes, we can see a lot of theese messages in
> cassandra/system.log log file while a timeout shows up in OPSCenter :
>
>
>
> ERROR [Native-Transport-Requests:3372] 2015-06-12
> 02:22:33,231 ErrorMessage.java (line 222) Unexpected exception during
> request
>
> java.io.IOException: Connection reset by peer
>
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>
> at sun.nio.ch.SocketDispatcher.read(Unknown Source)
>
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
>
> at sun.nio.ch.IOUtil.read(Unknown Source)
>
> at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
>
> at
> org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
>
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
>
> at
> org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
>
> at
> org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
>
> at
> org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
>
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
> Source)
>
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
> Source)
>
> at java.lang.Thread.run(Unknown Source)
>
>
>
> You’ll find attached an extract of the system.log file with some more
> informations.
>
>
>
> Do you have any idea of what’s happening ?
>
>
>
> We suspect timeouts happening because we have some tables with many
> tombstones, and a warning is something triggered. We have edited the
> configuration allow warning, but still perform until encounter 1.000.000
> tombstones.
>
>
>
> During a compaction, we’ve also warning messages telling us that we’ve a
> lot of tombstones too :
>
>
>
> WARN [CompactionExecutor:1584] 2015-06-11 19:22:24,904
> SliceQueryFilter.java (line 225) Read 8640 live and 8904 tombstoned cells
> in rgsupv.event_data (see tombstone_warn_threshold). 1 columns was
> requested, slices=[-], delInfo={deletedAt=-9223372036854775808,
> localDeletion=2147483647}
>
>
>
> Do you think it’s related to our first problem ?
>
>
>
> Our cluster is configured as follow :
>
> -  8 nodes with Debian 7.8 x64
>
> -  16Gb of memory and 4 CPU
>
> -  2  HDD : 1 for the system and the other for the data directory
>
>
>
> Best regards,
>
>
>
> *David CHARBONNIER*
>
> Sysadmin
>
> T : +33 411 934 200
>
> david.charbonn...@rgsystem.com
>
> ZAC Aéroport
>
> 125 Impasse Adam Smith
>
> 34470 Pérols - France
>
> *www.rgsystem.com* 
>
>
>
>
>
>
>