[jira] [Commented] (CASSANDRA-10765) add RangeIterator interface and QueryPlan for SI

2018-02-27 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16378193#comment-16378193
 ] 

Corentin Chary commented on CASSANDRA-10765:


Note: after having troubles like this with SASI, we ended moving to 
[https://github.com/Stratio/stratio-cassandra.] IMHO leveraging lucene instead 
of building yet another index makes much more sense. Would be great to see SASI 
using Lucene internally (even if it's somewhat against the current design).

Before using stratio we starting experimenting with a SASI-Like Lucene enabled 
index, see https://github.com/criteo/biggraphite/tree/master/tools/graphiteindex

> add RangeIterator interface and QueryPlan for SI
> 
>
> Key: CASSANDRA-10765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10765
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
>Priority: Major
>  Labels: 2i, sasi
> Fix For: 4.x
>
> Attachments: server-load.png
>
>
> Currently built-in indexes have only one way of handling 
> intersections/unions: pick the highest selectivity predicate and filter on 
> other index expressions. This is not always the most efficient approach. 
> Dynamic query planning based on the different index characteristics would be 
> more optimal. Query Plan should be able to choose how to do intersections, 
> unions based on the metadata provided by indexes (returned by RangeIterator) 
> and RangeIterator would became a base for cross index interactions and should 
> have information such as min/max token, estimate number of wrapped tokens etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13189) Use prompt_toolkit in cqlsh

2017-12-01 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16274190#comment-16274190
 ] 

Corentin Chary commented on CASSANDRA-13189:


I'm trying to go forward with this
* What would be missing from the current patch ?
* What's the recommended way to run cqlsh unit tests ?

Thanks 1

> Use prompt_toolkit in cqlsh
> ---
>
> Key: CASSANDRA-13189
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13189
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Minor
> Attachments: cqlsh-prompt-tookit.png
>
>
> prompt_toolkit is an alternative to readline 
> (https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
> lot of software, including the upcomming version of ipython.
> I'm working on an initial version that keeps compatibility with readline, 
> which is available here: 
> https://github.com/iksaif/cassandra/tree/prompt_toolkit
> It's still missing tests and a few things, but I'm opening this for tracking 
> and feedbacks.
> !cqlsh-prompt-tookit.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x

2017-11-28 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16268362#comment-16268362
 ] 

Corentin Chary commented on CASSANDRA-13215:


[~krummas] I tried it on our test cluster and it seems to work great. startup 
time got divided by 3~4. I expect it to have an even greater impact on prod 
(mode nodes, more sstables).

> Cassandra nodes startup time 20x more after upgarding to 3.x
> 
>
> Key: CASSANDRA-13215
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13215
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: Cluster setup: two datacenters (dc-main, dc-backup).
> dc-main - 9 servers, no vnodes
> dc-backup - 6 servers, vnodes
>Reporter: Viktor Kuzmin
>Assignee: Marcus Eriksson
> Fix For: 3.11.2, 4.0
>
> Attachments: simple-cache.patch
>
>
> CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable 
> at startup. And this function calls StorageService.getDiskBoundaries. And 
> getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges.
> It appears that last function can be really slow. In our environment we have 
> 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 
> computations in worst case (maybe I'm wrong, but it really takes lot's of 
> cpu).
> Also this function can affect runtime later, cause it is called not only 
> during startup.
> I've tried to implement simple cache for getDiskBoundaries results and now 
> startup time is about one minute instead of 25m, but I'm not sure if it's a 
> good solution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13677) Make SASI timeouts easier to debug

2017-11-23 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263997#comment-16263997
 ] 

Corentin Chary commented on CASSANDRA-13677:


Thanks ! :)

> Make SASI timeouts easier to debug
> --
>
> Key: CASSANDRA-13677
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13677
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Minor
> Fix For: 4.0
>
> Attachments: 0001-SASI-Make-timeouts-easier-to-debug.patch
>
>
> This would now give something like:
> {code}
> WARN  [ReadStage-15] 2017-06-08 12:47:57,799 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-15,5,main]: {}
> java.lang.RuntimeException: 
> org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: 
> Command 'Read(biggraphite_metadata.directories columns=* 
> rowfilter=component_0 = criteo limits=LIMIT 5000 
> range=(min(-9223372036854775808), min(-9223372036854775808)] 
> pfilter=names(EMPTY))' took too long (100 > 100ms).
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2591)
>  ~[main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_131]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  ~[main/:na]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>  [main/:na]
> at 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [main/:na]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> Caused by: 
> org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: 
> Command 'Read(biggraphite_metadata.directories columns=* 
> rowfilter=component_0 = criteo limits=LIMIT 5000 
> range=(min(-9223372036854775808), min(-9223372036854775808)] 
> pfilter=names(EMPTY))' took too long (100 > 100ms).
> at 
> org.apache.cassandra.index.sasi.plan.QueryController.checkpoint(QueryController.java:163)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.plan.QueryController.getPartition(QueryController.java:117)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:116)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:310)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:145)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:138)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:134)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:333) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1884)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2587)
>  ~[main/:na]
> ... 5 common frames omitted
> {code}
> Not having the query makes it super hard to debug. Even worse, because it 
> stops potentially before the slow_query threshold, it won't appear as one.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-11-23 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16263992#comment-16263992
 ] 

Corentin Chary commented on CASSANDRA-13651:


ping ?

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but 
> netty+epoll only support timeouts above the milliseconds and will convert 
> everything bellow to 0.
> I added some traces to netty (4.1):
> {code}

[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x

2017-10-12 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16201558#comment-16201558
 ] 

Corentin Chary commented on CASSANDRA-13215:


I'll try to test that in our test env in the next days :)

> Cassandra nodes startup time 20x more after upgarding to 3.x
> 
>
> Key: CASSANDRA-13215
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13215
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: Cluster setup: two datacenters (dc-main, dc-backup).
> dc-main - 9 servers, no vnodes
> dc-backup - 6 servers, vnodes
>Reporter: Viktor Kuzmin
>Assignee: Marcus Eriksson
> Attachments: simple-cache.patch
>
>
> CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable 
> at startup. And this function calls StorageService.getDiskBoundaries. And 
> getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges.
> It appears that last function can be really slow. In our environment we have 
> 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 
> computations in worst case (maybe I'm wrong, but it really takes lot's of 
> cpu).
> Also this function can affect runtime later, cause it is called not only 
> during startup.
> I've tried to implement simple cache for getDiskBoundaries results and now 
> startup time is about one minute instead of 25m, but I'm not sure if it's a 
> good solution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x

2017-09-20 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173029#comment-16173029
 ] 

Corentin Chary commented on CASSANDRA-13215:


Cool, will be happy to test it and report performance improvements (mostly 
during startup)

> Cassandra nodes startup time 20x more after upgarding to 3.x
> 
>
> Key: CASSANDRA-13215
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13215
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: Cluster setup: two datacenters (dc-main, dc-backup).
> dc-main - 9 servers, no vnodes
> dc-backup - 6 servers, vnodes
>Reporter: Viktor Kuzmin
>Assignee: Marcus Eriksson
> Attachments: simple-cache.patch
>
>
> CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable 
> at startup. And this function calls StorageService.getDiskBoundaries. And 
> getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges.
> It appears that last function can be really slow. In our environment we have 
> 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 
> computations in worst case (maybe I'm wrong, but it really takes lot's of 
> cpu).
> Also this function can affect runtime later, cause it is called not only 
> during startup.
> I've tried to implement simple cache for getDiskBoundaries results and now 
> startup time is about one minute instead of 25m, but I'm not sure if it's a 
> good solution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-09-19 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171291#comment-16171291
 ] 

Corentin Chary commented on CASSANDRA-13651:


Thanks for double-checking that, I pushed the wrong version. This should now be 
fixed.

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but 
> netty+epoll only support timeouts above the milliseconds and will 

[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-09-18 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169988#comment-16169988
 ] 

Corentin Chary commented on CASSANDRA-13651:


Results: the calls to timerfd end up costing almost as much as what 
epoll_wait() before. It is still more efficient to execute instead of schedule.

I added a patch to bump netty to 4.1.15, and a way simpler version of my 
previous patch that allows one to configure the task delay.

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and 

[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-09-15 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16167399#comment-16167399
 ] 

Corentin Chary commented on CASSANDRA-13651:


Ok, here is the plan (for myself):
* Separate the "use only one worker group" patch. It's useful because it 
creates less threads, but isn't really directly related to this.
* Update netty to 4.1.15 on our setup (without -Dnetty_flush_delay_nanoseconds) 
and see the effects
* Set -Dnetty_flush_delay_nanoseconds and see the effects. Depending on the 
results, propose a simpler version of the patch.


> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system 

[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-09-14 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166200#comment-16166200
 ] 

Corentin Chary commented on CASSANDRA-13651:


Ping, anything against 
https://github.com/iksaif/cassandra/commits/cassandra-13651-trunk ? If not I'll 
send a proper pull request.

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but 
> netty+epoll only 

[jira] [Commented] (CASSANDRA-10496) Make DTCS/TWCS split partitions based on time during compaction

2017-09-05 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16153196#comment-16153196
 ] 

Corentin Chary commented on CASSANDRA-10496:


* Splitting partitions is non-trivial, so I left that for later
* What do you mean by "changing locations isn't supported" ? 
https://github.com/apache/cassandra/pull/147/files#diff-be1bfb81c770dcfb7042335b699c4cc3R112
 seems to work
* Currently it will create up to "minThreshold" sstables 
https://github.com/apache/cassandra/pull/147/files#diff-e83635b2fb3079d9b91b039c605c15daR303
* Yes, getBuckets() currently use maxTimestamp, which isn't available 
(currently) in the compaction task. Thus my question: what about using 
minTimestamp (makes sense for reads), or minLocalDeletionTime (makes sense for 
deletes/ttl) ? (not thinking about upgrades ATM).
* Are you talking about sstables generated before this patch ?

> Make DTCS/TWCS split partitions based on time during compaction
> ---
>
> Key: CASSANDRA-10496
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10496
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>  Labels: dtcs
> Fix For: 4.x
>
>
> To avoid getting old data in new time windows with DTCS (or related, like 
> [TWCS|CASSANDRA-9666]), we need to split out old data into its own sstable 
> during compaction.
> My initial idea is to just create two sstables, when we create the compaction 
> task we state the start and end times for the window, and any data older than 
> the window will be put in its own sstable.
> By creating a single sstable with old data, we will incrementally get the 
> windows correct - say we have an sstable with these timestamps:
> {{[100, 99, 98, 97, 75, 50, 10]}}
> and we are compacting in window {{[100, 80]}} - we would create two sstables:
> {{[100, 99, 98, 97]}}, {{[75, 50, 10]}}, and the first window is now 
> 'correct'. The next compaction would compact in window {{[80, 60]}} and 
> create sstables {{[75]}}, {{[50, 10]}} etc.
> We will probably also want to base the windows on the newest data in the 
> sstables so that we actually have older data than the window.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10496) Make DTCS/TWCS split partitions based on time during compaction

2017-08-30 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16147182#comment-16147182
 ] 

Corentin Chary commented on CASSANDRA-10496:


So, I added https://github.com/apache/cassandra/pull/147, which seems to work 
for a few values.

I need some feedback to go forward, in SplittingTimeWindowCompactionWriter I 
use minTimestamp to group values:
* minLocalDeletionTime would make more sense if we want to optimize for 
deletions
* getBuckets() uses maxTimestamp, which is not available in the metadata stats 
(I'm unsure of the effects of changing to minTimestamp or minLocalDeletionTime 
in getBuckets).

With this value, running nodetool compact --split-output also works :)

> Make DTCS/TWCS split partitions based on time during compaction
> ---
>
> Key: CASSANDRA-10496
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10496
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>  Labels: dtcs
> Fix For: 4.x
>
>
> To avoid getting old data in new time windows with DTCS (or related, like 
> [TWCS|CASSANDRA-9666]), we need to split out old data into its own sstable 
> during compaction.
> My initial idea is to just create two sstables, when we create the compaction 
> task we state the start and end times for the window, and any data older than 
> the window will be put in its own sstable.
> By creating a single sstable with old data, we will incrementally get the 
> windows correct - say we have an sstable with these timestamps:
> {{[100, 99, 98, 97, 75, 50, 10]}}
> and we are compacting in window {{[100, 80]}} - we would create two sstables:
> {{[100, 99, 98, 97]}}, {{[75, 50, 10]}}, and the first window is now 
> 'correct'. The next compaction would compact in window {{[80, 60]}} and 
> create sstables {{[75]}}, {{[50, 10]}} etc.
> We will probably also want to base the windows on the newest data in the 
> sstables so that we actually have older data than the window.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13215) Cassandra nodes startup time 20x more after upgarding to 3.x

2017-08-29 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16145453#comment-16145453
 ] 

Corentin Chary commented on CASSANDRA-13215:


I can confirm that this is affecting us too (startup and repairs). [~krummas] 
did you end up doing something for this issue ? Else I might give it a shot.

> Cassandra nodes startup time 20x more after upgarding to 3.x
> 
>
> Key: CASSANDRA-13215
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13215
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
> Environment: Cluster setup: two datacenters (dc-main, dc-backup).
> dc-main - 9 servers, no vnodes
> dc-backup - 6 servers, vnodes
>Reporter: Viktor Kuzmin
>Assignee: Marcus Eriksson
> Attachments: simple-cache.patch
>
>
> CompactionStrategyManage.getCompactionStrategyIndex is called on each sstable 
> at startup. And this function calls StorageService.getDiskBoundaries. And 
> getDiskBoundaries calls AbstractReplicationStrategy.getAddressRanges.
> It appears that last function can be really slow. In our environment we have 
> 1545 tokens and with NetworkTopologyStrategy it can make 1545*1545 
> computations in worst case (maybe I'm wrong, but it really takes lot's of 
> cpu).
> Also this function can affect runtime later, cause it is called not only 
> during startup.
> I've tried to implement simple cache for getDiskBoundaries results and now 
> startup time is about one minute instead of 25m, but I'm not sure if it's a 
> good solution.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13743) CAPTURE not easilly usable with PAGING

2017-08-28 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143471#comment-16143471
 ] 

Corentin Chary commented on CASSANDRA-13743:


Thanks for merging it and fixing it :)

> CAPTURE not easilly usable with PAGING
> --
>
> Key: CASSANDRA-13743
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13743
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 4.0
>
>
> See 
> https://github.com/iksaif/cassandra/commit/7ed56966a7150ced44c375af307685517d7e09a3
>  for a patch fixing that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10496) Make DTCS/TWCS split partitions based on time during compaction

2017-08-28 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143470#comment-16143470
 ] 

Corentin Chary commented on CASSANDRA-10496:


I had a patch that would minimize the amount of compactions while trying to 
strictly respect the time windows (and would also make major compaction split 
correctly the sstables). I need to finish it and will try to find time this 
month.

> Make DTCS/TWCS split partitions based on time during compaction
> ---
>
> Key: CASSANDRA-10496
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10496
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>  Labels: dtcs
> Fix For: 4.x
>
>
> To avoid getting old data in new time windows with DTCS (or related, like 
> [TWCS|CASSANDRA-9666]), we need to split out old data into its own sstable 
> during compaction.
> My initial idea is to just create two sstables, when we create the compaction 
> task we state the start and end times for the window, and any data older than 
> the window will be put in its own sstable.
> By creating a single sstable with old data, we will incrementally get the 
> windows correct - say we have an sstable with these timestamps:
> {{[100, 99, 98, 97, 75, 50, 10]}}
> and we are compacting in window {{[100, 80]}} - we would create two sstables:
> {{[100, 99, 98, 97]}}, {{[75, 50, 10]}}, and the first window is now 
> 'correct'. The next compaction would compact in window {{[80, 60]}} and 
> create sstables {{[75]}}, {{[50, 10]}} etc.
> We will probably also want to base the windows on the newest data in the 
> sstables so that we actually have older data than the window.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-08-28 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16143466#comment-16143466
 ] 

Corentin Chary commented on CASSANDRA-13651:


Great ! I'm good with either bumping netty to this version or merging my patch, 
[~jjirsa] what do you think ?

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but 
> netty+epoll only support timeouts above the 

[jira] [Created] (CASSANDRA-13743) CAPTURE not easilly usable with PAGING

2017-08-04 Thread Corentin Chary (JIRA)
Corentin Chary created CASSANDRA-13743:
--

 Summary: CAPTURE not easilly usable with PAGING
 Key: CASSANDRA-13743
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13743
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Corentin Chary
 Fix For: 4.x


See 
https://github.com/iksaif/cassandra/commit/7ed56966a7150ced44c375af307685517d7e09a3
 for a patch fixing that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13743) CAPTURE not easilly usable with PAGING

2017-08-04 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13743:
---
Status: Patch Available  (was: Open)

> CAPTURE not easilly usable with PAGING
> --
>
> Key: CASSANDRA-13743
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13743
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Corentin Chary
> Fix For: 4.x
>
>
> See 
> https://github.com/iksaif/cassandra/commit/7ed56966a7150ced44c375af307685517d7e09a3
>  for a patch fixing that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-08-03 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16112292#comment-16112292
 ] 

Corentin Chary commented on CASSANDRA-13651:


Using timerfd is something that I looked at, but I though that it would be 
easier to just change the code in Cassandra for now. I'll be out in the next 
three weeks but I'll definitivement try a patched version of netty when I'm 
back.

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 

[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-08-02 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16111060#comment-16111060
 ] 

Corentin Chary commented on CASSANDRA-13651:


https://github.com/iksaif/cassandra/commit/c05f2eef6abc8066b69e50dc5025f17e17871f0c
 should fix the test.

I'm running with 4.0.44. The production test is using 3.11 as a base but I'm 
able to start trunk on my dev machine.


> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with a deadline of 10 usec 

[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-08-02 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16110416#comment-16110416
 ] 

Corentin Chary commented on CASSANDRA-13651:


[~jjirsa]: The cassandra-stress report is for 
https://github.com/criteo/biggraphite/blob/master/tools/stress/biggraphite.yaml.
 The bottleneck here was the lack of parallelization on the client I guess.
The screenshot is the actual workload of BigGraphite: 3 nodes, 100 connected 
clients.


[~norman]: Both timeouts of 1ms or no timeouts would achieve the same thing I 
guess. I tested with no timeout as it implied less changes to the actual logic 
(see https://github.com/iksaif/cassandra/tree/cassandra-13651-trunk). Currently 
(I believe that) the message itself it written after the delay, increasing the 
timeout will increase the latency of every operations. In my test I simply 
disable the timeout and schedule the flush task as soon as I can, this doesn't 
reduce the opportunities for batching that much if you keep the number of epoll 
threads low.

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   

[jira] [Comment Edited] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-08-01 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109278#comment-16109278
 ] 

Corentin Chary edited comment on CASSANDRA-13651 at 8/1/17 4:53 PM:



!cpu-usage.png|thumbnail!

Almost ~8% of CPU saving after updating all three nodes.


was (Author: iksaif):
!cpu-usage.png|thumbnail!

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but 
> netty+epoll 

[jira] [Comment Edited] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-08-01 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109278#comment-16109278
 ] 

Corentin Chary edited comment on CASSANDRA-13651 at 8/1/17 4:53 PM:


!cpu-usage.png!

Almost ~8% of CPU saving after updating all three nodes.


was (Author: iksaif):

!cpu-usage.png|thumbnail!

Almost ~8% of CPU saving after updating all three nodes.

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with a deadline of 10 

[jira] [Updated] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-08-01 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13651:
---
Attachment: cpu-usage.png

!cpu-usage.png|thumbnail!

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
> Attachments: cpu-usage.png
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> {{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but 
> netty+epoll only support timeouts above the milliseconds and will convert 
> everything bellow to 0.
> I added some traces to netty (4.1):
> {code}
> diff --git 
> 

[jira] [Updated] (CASSANDRA-13647) cassandra-test: URI is not absolute

2017-07-27 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13647:
---
Priority: Minor  (was: Major)

> cassandra-test: URI is not absolute
> ---
>
> Key: CASSANDRA-13647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13647
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Corentin Chary
>Priority: Minor
> Fix For: 4.x
>
>
> With current trunk (I just added the code to print the exception):
> {code}
> $ ./tools/bin/cassandra-stress user profile=./biggraphite.yaml n=10 
> 'ops(insert=1)' no-warmup cl=ONEjava.lang.IllegalArgumentException: URI is 
> not absolute
> at java.net.URI.toURL(URI.java:1088)
> at 
> org.apache.cassandra.stress.StressProfile.load(StressProfile.java:771)
> at 
> org.apache.cassandra.stress.settings.SettingsCommandUser.(SettingsCommandUser.java:76)
> at 
> org.apache.cassandra.stress.settings.SettingsCommandUser.build(SettingsCommandUser.java:190)
> at 
> org.apache.cassandra.stress.settings.SettingsCommand.get(SettingsCommand.java:220)
> at 
> org.apache.cassandra.stress.settings.StressSettings.get(StressSettings.java:192)
> at 
> org.apache.cassandra.stress.settings.StressSettings.parse(StressSettings.java:169)
> at org.apache.cassandra.stress.Stress.run(Stress.java:80)
> at org.apache.cassandra.stress.Stress.main(Stress.java:62)
> {code}
> I wasn't able to quickly find the change that caused that.
> cc: [~tjake]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-07-27 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16103168#comment-16103168
 ] 

Corentin Chary commented on CASSANDRA-13432:


[~rgerard] this is already part of 3.x, this only applies to 2.x.


> MemtableReclaimMemory can get stuck because of lack of timeout in 
> getTopLevelColumns()
> --
>
> Key: CASSANDRA-13432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 2.1.15
>Reporter: Corentin Chary
> Fix For: 2.1.x
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   32135875 0
>  0
> ReadStage   114 0   29492940 0
>  0
> RequestResponseStage  0 0   86090931 0
>  0
> ReadRepairStage   0 0 166645 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 47 0
>  0
> GossipStage   0 0 188769 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor0 0  86835 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0 92 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0563 0
>  0
> MemtablePostFlush 0 0   1500 0
>  0
> MemtableReclaimMemory 129534 0
>  0
> Native-Transport-Requests41 0   54819182 0
>   1896
> {code}
> {code}
> "MemtableReclaimMemory:195" - Thread t@6268
>java.lang.Thread.State: WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
>   at 
> org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "SharedPool-Worker-195" - Thread t@989
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
>   at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
>   at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
>   at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
>   at 
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
>  

[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-07-25 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099971#comment-16099971
 ] 

Corentin Chary commented on CASSANDRA-13651:


Latest patch (https://github.com/iksaif/cassandra/tree/cassandra-13651-trunk)  
tested with an actual workload, I attached the screenshot.
Looks like we can get ~2% of CPU back with 
-Dcassandra.netty_flush_delay_nanoseconds=0 and some more with 
-Dio.netty.eventLoopThreads=6. This doesn't not seem to affect latency

!Screenshot (5).png|thumbnail!

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code 

[jira] [Updated] (CASSANDRA-13677) Make SASI timeouts easier to debug

2017-07-06 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13677:
---
Description: 
This would now give something like:
{code}
WARN  [ReadStage-15] 2017-06-08 12:47:57,799 
AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
Thread[ReadStage-15,5,main]: {}
java.lang.RuntimeException: 
org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: Command 
'Read(biggraphite_metadata.directories columns=* rowfilter=component_0 = criteo 
limits=LIMIT 5000 range=(min(-9223372036854775808), min(-9223372036854775808)] 
pfilter=names(EMPTY))' took too long (100 > 100ms).
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2591)
 ~[main/:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_131]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
 ~[main/:na]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
 [main/:na]
at 
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [main/:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: 
org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: Command 
'Read(biggraphite_metadata.directories columns=* rowfilter=component_0 = criteo 
limits=LIMIT 5000 range=(min(-9223372036854775808), min(-9223372036854775808)] 
pfilter=names(EMPTY))' took too long (100 > 100ms).
at 
org.apache.cassandra.index.sasi.plan.QueryController.checkpoint(QueryController.java:163)
 ~[main/:na]
at 
org.apache.cassandra.index.sasi.plan.QueryController.getPartition(QueryController.java:117)
 ~[main/:na]
at 
org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:116)
 ~[main/:na]
at 
org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
 ~[main/:na]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[main/:na]
at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92)
 ~[main/:na]
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:310)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:145)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:138)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:134)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) 
~[main/:na]
at 
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:333) 
~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1884)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2587)
 ~[main/:na]
... 5 common frames omitted
{code}

Not having the query makes it super hard to debug. Even worse, because it stops 
potentially before the slow_query threshold, it won't appear as one.

  was:
This would now give something like:
{code}
WARN  [ReadStage-15] 2017-06-08 12:47:57,799 
AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
Thread[ReadStage-15,5,main]: {}
java.lang.RuntimeException: 
org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: Command 
'Read(biggraphite_metadata.directories columns=* rowfilter=component_0 = criteo 
limits=LIMIT 5000 range=(min(-9223372036854775808), min(-9223372036854775808)] 
pfilter=names(EMPTY))' took too long (100 > 100ms).
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2591)
 ~[main/:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_131]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
 ~[main/:na]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
 [main/:na]
at 
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [main/:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: 
org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: Command 
'Read(biggraphite_metadata.directories columns=* 

[jira] [Created] (CASSANDRA-13677) Make SASI timeouts easier to debug

2017-07-06 Thread Corentin Chary (JIRA)
Corentin Chary created CASSANDRA-13677:
--

 Summary: Make SASI timeouts easier to debug
 Key: CASSANDRA-13677
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13677
 Project: Cassandra
  Issue Type: Improvement
  Components: sasi
Reporter: Corentin Chary
Assignee: Corentin Chary
Priority: Minor
 Fix For: 4.x


This would now give something like:
{code}
WARN  [ReadStage-15] 2017-06-08 12:47:57,799 
AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
Thread[ReadStage-15,5,main]: {}
java.lang.RuntimeException: 
org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: Command 
'Read(biggraphite_metadata.directories columns=* rowfilter=component_0 = criteo 
limits=LIMIT 5000 range=(min(-9223372036854775808), min(-9223372036854775808)] 
pfilter=names(EMPTY))' took too long (100 > 100ms).
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2591)
 ~[main/:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_131]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
 ~[main/:na]
at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
 [main/:na]
at 
org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [main/:na]
at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
Caused by: 
org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: Command 
'Read(biggraphite_metadata.directories columns=* rowfilter=component_0 = criteo 
limits=LIMIT 5000 range=(min(-9223372036854775808), min(-9223372036854775808)] 
pfilter=names(EMPTY))' took too long (100 > 100ms).
at 
org.apache.cassandra.index.sasi.plan.QueryController.checkpoint(QueryController.java:163)
 ~[main/:na]
at 
org.apache.cassandra.index.sasi.plan.QueryController.getPartition(QueryController.java:117)
 ~[main/:na]
at 
org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:116)
 ~[main/:na]
at 
org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
 ~[main/:na]
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[main/:na]
at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92)
 ~[main/:na]
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:310)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:145)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:138)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:134)
 ~[main/:na]
at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) 
~[main/:na]
at 
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:333) 
~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1884)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2587)
 ~[main/:na]
... 5 common frames omitted
{code}

Not having the query makes it super hard to debug



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13677) Make SASI timeouts easier to debug

2017-07-06 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13677:
---
Attachment: 0001-SASI-Make-timeouts-easier-to-debug.patch

> Make SASI timeouts easier to debug
> --
>
> Key: CASSANDRA-13677
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13677
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 0001-SASI-Make-timeouts-easier-to-debug.patch
>
>
> This would now give something like:
> {code}
> WARN  [ReadStage-15] 2017-06-08 12:47:57,799 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-15,5,main]: {}
> java.lang.RuntimeException: 
> org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: 
> Command 'Read(biggraphite_metadata.directories columns=* 
> rowfilter=component_0 = criteo limits=LIMIT 5000 
> range=(min(-9223372036854775808), min(-9223372036854775808)] 
> pfilter=names(EMPTY))' took too long (100 > 100ms).
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2591)
>  ~[main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_131]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  ~[main/:na]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>  [main/:na]
> at 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [main/:na]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> Caused by: 
> org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: 
> Command 'Read(biggraphite_metadata.directories columns=* 
> rowfilter=component_0 = criteo limits=LIMIT 5000 
> range=(min(-9223372036854775808), min(-9223372036854775808)] 
> pfilter=names(EMPTY))' took too long (100 > 100ms).
> at 
> org.apache.cassandra.index.sasi.plan.QueryController.checkpoint(QueryController.java:163)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.plan.QueryController.getPartition(QueryController.java:117)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:116)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:310)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:145)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:138)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:134)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:333) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1884)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2587)
>  ~[main/:na]
> ... 5 common frames omitted
> {code}
> Not having the query makes it super hard to debug



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13677) Make SASI timeouts easier to debug

2017-07-06 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13677:
---
Status: Patch Available  (was: Open)

> Make SASI timeouts easier to debug
> --
>
> Key: CASSANDRA-13677
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13677
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 0001-SASI-Make-timeouts-easier-to-debug.patch
>
>
> This would now give something like:
> {code}
> WARN  [ReadStage-15] 2017-06-08 12:47:57,799 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-15,5,main]: {}
> java.lang.RuntimeException: 
> org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: 
> Command 'Read(biggraphite_metadata.directories columns=* 
> rowfilter=component_0 = criteo limits=LIMIT 5000 
> range=(min(-9223372036854775808), min(-9223372036854775808)] 
> pfilter=names(EMPTY))' took too long (100 > 100ms).
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2591)
>  ~[main/:na]
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_131]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
>  ~[main/:na]
> at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
>  [main/:na]
> at 
> org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:109) [main/:na]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_131]
> Caused by: 
> org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException: 
> Command 'Read(biggraphite_metadata.directories columns=* 
> rowfilter=component_0 = criteo limits=LIMIT 5000 
> range=(min(-9223372036854775808), min(-9223372036854775808)] 
> pfilter=names(EMPTY))' took too long (100 > 100ms).
> at 
> org.apache.cassandra.index.sasi.plan.QueryController.checkpoint(QueryController.java:163)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.plan.QueryController.getPartition(QueryController.java:117)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:116)
>  ~[main/:na]
> at 
> org.apache.cassandra.index.sasi.plan.QueryPlan$ResultIterator.computeNext(QueryPlan.java:71)
>  ~[main/:na]
> at 
> org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:310)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:145)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:138)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse$LocalDataResponse.(ReadResponse.java:134)
>  ~[main/:na]
> at 
> org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76) 
> ~[main/:na]
> at 
> org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:333) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1884)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2587)
>  ~[main/:na]
> ... 5 common frames omitted
> {code}
> Not having the query makes it super hard to debug



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-07-04 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073502#comment-16073502
 ] 

Corentin Chary commented on CASSANDRA-13651:


I ran tests on a 3 node cluster, I can confirm that not using scheduled tasks 
and using a simpler batcher removes all the {{epoll_wait(..., 0)}} calls. This 
reduces the CPU used by epoll threads.
I need to take more time do check how efficient the batching still is, and 
compare the context switchs with and without it.

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a 

[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-07-03 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16072162#comment-16072162
 ] 

Corentin Chary commented on CASSANDRA-13651:


I cooked a patch to use spotify's netty-batch-flusher instead of the current 
flusher. Here are some results:

{code}
normal:

Results:
Op rate   :4,220 op/s  [insert: 4,220 op/s]
Partition rate:4,220 pk/s  [insert: 4,220 pk/s]
Row rate  :   41,851 row/s [insert: 41,851 row/s]
Latency mean  :0.2 ms [insert: 0.2 ms]
Latency median:0.2 ms [insert: 0.2 ms]
Latency 95th percentile   :0.2 ms [insert: 0.2 ms]
Latency 99th percentile   :0.3 ms [insert: 0.3 ms]
Latency 99.9th percentile :0.4 ms [insert: 0.4 ms]
Latency max   :   65.5 ms [insert: 65.5 ms]
Total partitions  :100,000 [insert: 100,000]
Total errors  :  0 [insert: 0]
Total GC count: 6
Total GC memory   : 3.473 GiB
Total GC time :0.4 seconds
Avg GC time   :   60.0 ms
StdDev GC time:5.1 ms
Total operation time  : 00:00:23
{code}

{code}
batched:

Results:
Op rate   :4,344 op/s  [insert: 4,344 op/s]
Partition rate:4,344 pk/s  [insert: 4,344 pk/s]
Row rate  :   43,121 row/s [insert: 43,121 row/s]
Latency mean  :0.2 ms [insert: 0.2 ms]
Latency median:0.2 ms [insert: 0.2 ms]
Latency 95th percentile   :0.2 ms [insert: 0.2 ms]
Latency 99th percentile   :0.3 ms [insert: 0.3 ms]
Latency 99.9th percentile :0.4 ms [insert: 0.4 ms]
Latency max   :   63.4 ms [insert: 63.4 ms]
Total partitions  :100,000 [insert: 100,000]
Total errors  :  0 [insert: 0]
Total GC count: 6
Total GC memory   : 3.467 GiB
Total GC time :0.4 seconds
Avg GC time   :   60.0 ms
StdDev GC time:3.3 ms
Total operation time  : 00:00:23
{code}

So slightly more QPS, but more interestingly, the epoll thread now uses ~4 
times less CPU. I'll try to do a full scale benchmark on a bigger workload with 
3 nodes tomorrow.

Patch at https://github.com/iksaif/cassandra/tree/cassandra-13651-trunk

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> 

[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-06-30 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070997#comment-16070997
 ] 

Corentin Chary commented on CASSANDRA-13651:


Also check:
* https://github.com/netty/netty/issues/1759
* https://gist.github.com/jadbaz/47d98da0ead2e71659f343b14ef05de6
* Benchmark batching vs. stupid writeAndFlush()
* It's unclear why sending the response is done in the flusher right now
* https://github.com/spotify/netty-batch-flusher

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 
> 

[jira] [Commented] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-06-30 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16070535#comment-16070535
 ] 

Corentin Chary commented on CASSANDRA-13651:


Things to check or try (for me):
* io.netty.eventLoopThreads
* Check if we could use the same eventloop instead of starting two
* Create a custom SelectStrategy that skips looking at fds if there is a 
scheduled task happening in a few microseconds
* Try to understand why Message::Flusher currently works this way

> Large amount of CPU used by epoll_wait(.., .., .., 0)
> -
>
> Key: CASSANDRA-13651
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 4.x
>
>
> I was trying to profile Cassandra under my workload and I kept seeing this 
> backtrace:
> {code}
> epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
> io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
> (native)
> io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
> Native.java:111
> io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) 
> EpollEventLoop.java:230
> io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
> SingleThreadEventExecutor.java:858
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
> DefaultThreadFactory.java:138
> java.lang.Thread.run() Thread.java:745
> {code}
> At fist I though that the profiler might not be able to profile native code 
> properly, but I wen't further and I realized that most of the CPU was used by 
> {{epoll_wait()}} calls with a timeout of zero.
> Here is the output of perf on this system, which confirms that most of the 
> overhead was with timeout == 0.
> {code}
> Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
> 11594448
> Overhead  Trace output
>   
>  ◆
>   90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
> timeout: 0x   
> ▒
>5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x   
> ▒
>1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
> timeout: 0x03e8   
> ▒
>0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
> timeout: 0x   
> ▒
>0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
> timeout: 0x   
> ▒
>0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
> timeout: 0x
> {code}
> Running this time with perf record -ag for call traces:
> {code}
> # Children  Self   sys   usr  Trace output
> 
> #         
> 
> #
>  8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x
> |
> ---0x1000200af313
>|  
> --8.61%--0x7fca6117bdac
>   0x7fca60459804
>   epoll_wait
>  2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
> 0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
> |
> ---0x1000200af313
>0x7fca6117b830
>0x7fca60459804
>epoll_wait
> {code}
> That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
> reports a per-CPU percentage or a per-system percentage, but that would be 
> still be 10% of the total CPU usage of Cassandra at the minimum.
> I went further and found the code of all that: We schedule a lot of 

[jira] [Updated] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-06-30 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13651:
---
Description: 
I was trying to profile Cassandra under my workload and I kept seeing this 
backtrace:
{code}
epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
(native)
io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
Native.java:111
io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) EpollEventLoop.java:230
io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
SingleThreadEventExecutor.java:858
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
DefaultThreadFactory.java:138
java.lang.Thread.run() Thread.java:745
{code}

At fist I though that the profiler might not be able to profile native code 
properly, but I wen't further and I realized that most of the CPU was used by 
{{epoll_wait()}} calls with a timeout of zero.

Here is the output of perf on this system, which confirms that most of the 
overhead was with timeout == 0.

{code}
Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
11594448
Overhead  Trace output  

 ◆
  90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
timeout: 0x 
  ▒
   5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
timeout: 0x 
  ▒
   1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
timeout: 0x03e8 
  ▒
   0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
timeout: 0x 
  ▒
   0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
timeout: 0x 
  ▒
   0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
timeout: 0x 
  ▒
   0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
timeout: 0x
{code}

Running this time with perf record -ag for call traces:
{code}
# Children  Self   sys   usr  Trace output  
  
#         

#
 8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
0x7fca452d6000, maxevents: 0x1000, timeout: 0x
|
---0x1000200af313
   |  
--8.61%--0x7fca6117bdac
  0x7fca60459804
  epoll_wait

 2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
|
---0x1000200af313
   0x7fca6117b830
   0x7fca60459804
   epoll_wait
{code}

That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
reports a per-CPU percentage or a per-system percentage, but that would be 
still be 10% of the total CPU usage of Cassandra at the minimum.

I went further and found the code of all that: We schedule a lot of 
{{Message::Flusher}} with a deadline of 10 usec (5 per messages I think) but 
netty+epoll only support timeouts above the milliseconds and will convert 
everything bellow to 0.

I added some traces to netty (4.1):
{code}
diff --git 
a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
 
b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
index 909088fde..8734bbfd4 100644
--- 
a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
+++ 
b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
@@ -208,10 +208,15 @@ final class EpollEventLoop extends SingleThreadEventLoop {
 long currentTimeNanos = System.nanoTime();
 long selectDeadLineNanos = currentTimeNanos + 
delayNanos(currentTimeNanos);
 for (;;) 

[jira] [Created] (CASSANDRA-13651) Large amount of CPU used by epoll_wait(.., .., .., 0)

2017-06-30 Thread Corentin Chary (JIRA)
Corentin Chary created CASSANDRA-13651:
--

 Summary: Large amount of CPU used by epoll_wait(.., .., .., 0)
 Key: CASSANDRA-13651
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13651
 Project: Cassandra
  Issue Type: Bug
Reporter: Corentin Chary
 Fix For: 4.x


I was trying to profile Cassandra under my workload and I kept seeing this 
backtrace:
{code}
epollEventLoopGroup-2-3 State: RUNNABLE CPU usage on sample: 240ms
io.netty.channel.epoll.Native.epollWait0(int, long, int, int) Native.java 
(native)
io.netty.channel.epoll.Native.epollWait(int, EpollEventArray, int) 
Native.java:111
io.netty.channel.epoll.EpollEventLoop.epollWait(boolean) EpollEventLoop.java:230
io.netty.channel.epoll.EpollEventLoop.run() EpollEventLoop.java:254
io.netty.util.concurrent.SingleThreadEventExecutor$5.run() 
SingleThreadEventExecutor.java:858
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run() 
DefaultThreadFactory.java:138
java.lang.Thread.run() Thread.java:745
{code}

At fist I though that the profiler might not be able to profile native code 
properly, but I wen't further and I realized that most of the CPU was used by 
epoll_wait() calls with a timeout of zero.

Here is the output of perf on this system, which confirms that most of the 
overhead was with timeout == 0.

{code}
Samples: 11M of event 'syscalls:sys_enter_epoll_wait', Event count (approx.): 
11594448
Overhead  Trace output  

 ◆
  90.06%  epfd: 0x0047, events: 0x7f5588c0c000, maxevents: 0x2000, 
timeout: 0x 
  ▒
   5.77%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
timeout: 0x 
  ▒
   1.98%  epfd: 0x00b5, events: 0x7fca419ef000, maxevents: 0x1000, 
timeout: 0x03e8 
  ▒
   0.04%  epfd: 0x0003, events: 0x2f6af77b9c00, maxevents: 0x0020, 
timeout: 0x 
  ▒
   0.04%  epfd: 0x002b, events: 0x121ebf63ac00, maxevents: 0x0040, 
timeout: 0x 
  ▒
   0.03%  epfd: 0x0026, events: 0x7f51f80019c0, maxevents: 0x0020, 
timeout: 0x 
  ▒
   0.02%  epfd: 0x0003, events: 0x7fe4d80019d0, maxevents: 0x0020, 
timeout: 0x
{code}

Running this time with perf record -ag for call traces:
{code}
# Children  Self   sys   usr  Trace output  
  
#         

#
 8.61% 8.61% 0.00% 8.61%  epfd: 0x00a7, events: 
0x7fca452d6000, maxevents: 0x1000, timeout: 0x
|
---0x1000200af313
   |  
--8.61%--0x7fca6117bdac
  0x7fca60459804
  epoll_wait

 2.98% 2.98% 0.00% 2.98%  epfd: 0x00a7, events: 
0x7fca452d6000, maxevents: 0x1000, timeout: 0x03e8
|
---0x1000200af313
   0x7fca6117b830
   0x7fca60459804
   epoll_wait
{code}

That looks like a lot of CPU used to wait for nothing. I'm not sure if pref 
reports a per-CPU percentage or a per-system percentage, but that would be 
still be 10% of the total CPU usage of Cassandra at the minimum.

I went further and found the code of all that: We schedule a lot of 
Message::Flusher with a deadline of 10 usec (5 per messages I think) but 
netty+epoll only support timeouts above the milliseconds and will convert 
everything bellow to 0.

I added some traces to netty (4.1):
{code}
diff --git 
a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
 
b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
index 909088fde..8734bbfd4 100644
--- 
a/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
+++ 
b/transport-native-epoll/src/main/java/io/netty/channel/epoll/EpollEventLoop.java
@@ -208,10 +208,15 @@ final class EpollEventLoop extends SingleThreadEventLoop 

[jira] [Commented] (CASSANDRA-13647) cassandra-test: URI is not absolute

2017-06-29 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068416#comment-16068416
 ] 

Corentin Chary commented on CASSANDRA-13647:


Note: using file:/// works

> cassandra-test: URI is not absolute
> ---
>
> Key: CASSANDRA-13647
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13647
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Corentin Chary
> Fix For: 4.x
>
>
> With current trunk (I just added the code to print the exception):
> {code}
> $ ./tools/bin/cassandra-stress user profile=./biggraphite.yaml n=10 
> 'ops(insert=1)' no-warmup cl=ONEjava.lang.IllegalArgumentException: URI is 
> not absolute
> at java.net.URI.toURL(URI.java:1088)
> at 
> org.apache.cassandra.stress.StressProfile.load(StressProfile.java:771)
> at 
> org.apache.cassandra.stress.settings.SettingsCommandUser.(SettingsCommandUser.java:76)
> at 
> org.apache.cassandra.stress.settings.SettingsCommandUser.build(SettingsCommandUser.java:190)
> at 
> org.apache.cassandra.stress.settings.SettingsCommand.get(SettingsCommand.java:220)
> at 
> org.apache.cassandra.stress.settings.StressSettings.get(StressSettings.java:192)
> at 
> org.apache.cassandra.stress.settings.StressSettings.parse(StressSettings.java:169)
> at org.apache.cassandra.stress.Stress.run(Stress.java:80)
> at org.apache.cassandra.stress.Stress.main(Stress.java:62)
> {code}
> I wasn't able to quickly find the change that caused that.
> cc: [~tjake]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-13647) cassandra-test: URI is not absolute

2017-06-29 Thread Corentin Chary (JIRA)
Corentin Chary created CASSANDRA-13647:
--

 Summary: cassandra-test: URI is not absolute
 Key: CASSANDRA-13647
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13647
 Project: Cassandra
  Issue Type: Bug
  Components: Tools
Reporter: Corentin Chary
 Fix For: 4.x


With current trunk (I just added the code to print the exception):

{code}
$ ./tools/bin/cassandra-stress user profile=./biggraphite.yaml n=10 
'ops(insert=1)' no-warmup cl=ONEjava.lang.IllegalArgumentException: URI is not 
absolute
at java.net.URI.toURL(URI.java:1088)
at 
org.apache.cassandra.stress.StressProfile.load(StressProfile.java:771)
at 
org.apache.cassandra.stress.settings.SettingsCommandUser.(SettingsCommandUser.java:76)
at 
org.apache.cassandra.stress.settings.SettingsCommandUser.build(SettingsCommandUser.java:190)
at 
org.apache.cassandra.stress.settings.SettingsCommand.get(SettingsCommand.java:220)
at 
org.apache.cassandra.stress.settings.StressSettings.get(StressSettings.java:192)
at 
org.apache.cassandra.stress.settings.StressSettings.parse(StressSettings.java:169)
at org.apache.cassandra.stress.Stress.run(Stress.java:80)
at org.apache.cassandra.stress.Stress.main(Stress.java:62)
{code}

I wasn't able to quickly find the change that caused that.

cc: [~tjake]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13444) Fast and garbage-free Streaming Histogram

2017-06-22 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16058932#comment-16058932
 ] 

Corentin Chary commented on CASSANDRA-13444:


Should we consider for inclusion in 3.11 ?

> Fast and garbage-free Streaming Histogram
> -
>
> Key: CASSANDRA-13444
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13444
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Fuud
>Assignee: Fuud
> Fix For: 4.0
>
> Attachments: results.csv, results.xlsx
>
>
> StreamingHistogram is cause of high cpu usage and GC pressure.
> It was improved at CASSANDRA-13038 by introducing intermediate buffer to try 
> accumulate different values into the big map before merging them into smaller 
> one.
> But there was not enought for TTL's distributed within large time. Rounding 
> (also introduced at 13038) can help but it reduce histogram precision 
> specially in case where TTL's does not distributed uniformly.
> There are several improvements that can help to reduce cpu and gc usage. Them 
> all included in the pool-request as separate revisions thus you can test them 
> independently.
> Improvements list:
> # Use Map.computeIfAbsent instead of get->checkIfNull->put chain. In this way 
> "add-or-accumulate" operation takes one map operation instead of two. But 
> this method (default-defined in interface Map) is overriden in HashMap but 
> not in TreeMap. Thus I changed spool type to HashMap.
> # As we round incoming values to _roundSeconds_ we can also round value on 
> merge. It will enlarge hit rate for bin operations.
> # Because we inserted only integers into Histogram and rounding values to 
> integers we can use *int* type everywhere.
> # Histogram takes huge amount of time merging values. In merge method largest 
> amount of time taken by finding nearest points. It can be eliminated by 
> holding additional TreeSet with differences, sorted from smalest to greatest.
> # Because we know max size of _bin_ and _differences_ maps we can replace 
> them with sorted arrays. Search can be done with _Arrays.binarySearch_ and 
> insertion/deletions can be done by _System.arraycopy_. Also it helps to merge 
> some operations into one.
> # Because spool map is also limited we can replace it with open address 
> primitive map. It's finaly reduce allocation rate to zero.
> You can see gain given by each step in the attached file. First number is 
> time for one benchmark invocation and second - is allocation rate in Mb per 
> operation.
> Dependends of payload time is reduced up to 90%.
> Overall gain:
> |.|.|Payload/SpoolSize|.|.|.|% from original
> |.|.|.|original|.|optimized|
> |.|.|secondInMonth/0|.|.|.|
> |time ms/op|.|.|10747,684|.|5545,063|51,6
> |allocation Mb/op|.|.|2441,38858|.|0,002105713|0
> |.|.|.|.|.|.|
> |.|.|secondInMonth/1000|.|.|.|
> |time ms/op|.|.|8988,578|.|5791,179|64,4
> |allocation Mb/op|.|.|2440,951141|.|0,017715454|0
> |.|.|.|.|.|.|
> |.|.|secondInMonth/1|.|.|.|
> |time ms/op|.|.|10711,671|.|5765,243|53,8
> |allocation Mb/op|.|.|2437,022537|.|0,264083862|0
> |.|.|.|.|.|.|
> |.|.|secondInMonth/10|.|.|.|
> |time ms/op|.|.|13001,841|.|5638,069|43,4
> |allocation Mb/op|.|.|2396,947113|.|2,003662109|0,1
> |.|.|.|.|.|.|
> |.|.|secondInDay/0|.|.|.|
> |time ms/op|.|.|10381,833|.|5497,804|53
> |allocation Mb/op|.|.|2441,166107|.|0,002105713|0
> |.|.|.|.|.|.|
> |.|.|secondInDay/1000|.|.|.|
> |time ms/op|.|.|8522,157|.|5929,871|69,6
> |allocation Mb/op|.|.|1973,112381|.|0,017715454|0
> |.|.|.|.|.|.|
> |.|.|secondInDay/1|.|.|.|
> |time ms/op|.|.|10234,978|.|5480,077|53,5
> |allocation Mb/op|.|.|2306,057404|.|0,262969971|0
> |.|.|.|.|.|.|
> |.|.|secondInDay/10|.|.|.|
> |time ms/op|.|.|2971,178|.|139,079|4,7
> |allocation Mb/op|.|.|172,1276245|.|2,001721191|1,2
> |.|.|.|.|.|.|
> |.|.|secondIn3Hour/0|.|.|.|
> |time ms/op|.|.|10663,123|.|5605,672|52,6
> |allocation Mb/op|.|.|2439,456818|.|0,002105713|0
> |.|.|.|.|.|.|
> |.|.|secondIn3Hour/1000|.|.|.|
> |time ms/op|.|.|9029,788|.|5838,618|64,7
> |allocation Mb/op|.|.|2331,839249|.|0,180664063|0
> |.|.|.|.|.|.|
> |.|.|secondIn3Hour/1|.|.|.|
> |time ms/op|.|.|4862,409|.|89,001|1,8
> |allocation Mb/op|.|.|965,4871887|.|0,251711652|0
> |.|.|.|.|.|.|
> |.|.|secondIn3Hour/10|.|.|.|
> |time ms/op|.|.|1484,454|.|95,044|6,4
> |allocation Mb/op|.|.|153,2464722|.|2,001712809|1,3
> |.|.|.|.|.|.|
> |.|.|secondInMin/0|.|.|.|
> |time ms/op|.|.|875,118|.|424,11|48,5
> |allocation Mb/op|.|.|610,3554993|.|0,001776123|0
> |.|.|.|.|.|.|
> |.|.|secondInMin/1000|.|.|.|
> |time ms/op|.|.|568,7|.|84,208|14,8
> |allocation Mb/op|.|.|0,007598114|.|0,01810023|238,2
> |.|.|.|.|.|.|
> |.|.|secondInMin/1|.|.|.|
> |time ms/op|.|.|573,595|.|83,862|14,6
> |allocation 

[jira] [Comment Edited] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-21 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057392#comment-16057392
 ] 

Corentin Chary edited comment on CASSANDRA-13418 at 6/21/17 12:21 PM:
--

Latest version of the patch works as it should: 
https://github.com/criteo-forks/cassandra/commit/da4a5c17448dab64aeb4295bb7401afbea9edf51

!twcs-cleanup.png!


was (Author: iksaif):
Latest version of the patch works as it should: 
https://github.com/criteo-forks/cassandra/commit/da4a5c17448dab64aeb4295bb7401afbea9edf51

!twcs-cleanup.png|thumbnail!

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-21 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057392#comment-16057392
 ] 

Corentin Chary commented on CASSANDRA-13418:


Latest version of the patch works as it should: 
https://github.com/criteo-forks/cassandra/commit/da4a5c17448dab64aeb4295bb7401afbea9edf51

!twcs-cleanup.png|thumbnail!

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-06-21 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13418:
---
Attachment: twcs-cleanup.png

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
> Attachments: twcs-cleanup.png
>
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-06-09 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16044493#comment-16044493
 ] 

Corentin Chary commented on CASSANDRA-13432:


We have a case internally were upgrading to 3.0 or changing the data model 
won't happen, and we know that we *need* this patch for another year. We're 
currently keeping a forked version, so that's not so much of an issue.

I don't believe this patch really changes the behavior as it simply aborts 
earlier what would anyway would have been aborted later (later may currently be 
minutes~hours later).

> MemtableReclaimMemory can get stuck because of lack of timeout in 
> getTopLevelColumns()
> --
>
> Key: CASSANDRA-13432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 2.1.15
>Reporter: Corentin Chary
> Fix For: 2.1.x
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   32135875 0
>  0
> ReadStage   114 0   29492940 0
>  0
> RequestResponseStage  0 0   86090931 0
>  0
> ReadRepairStage   0 0 166645 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 47 0
>  0
> GossipStage   0 0 188769 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor0 0  86835 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0 92 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0563 0
>  0
> MemtablePostFlush 0 0   1500 0
>  0
> MemtableReclaimMemory 129534 0
>  0
> Native-Transport-Requests41 0   54819182 0
>   1896
> {code}
> {code}
> "MemtableReclaimMemory:195" - Thread t@6268
>java.lang.Thread.State: WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
>   at 
> org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "SharedPool-Worker-195" - Thread t@989
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
>   at 

[jira] [Comment Edited] (CASSANDRA-10765) add RangeIterator interface and QueryPlan for SI

2017-06-09 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042791#comment-16042791
 ] 

Corentin Chary edited comment on CASSANDRA-10765 at 6/9/17 12:20 PM:
-

Note: 
https://github.com/iksaif/cassandra/commit/edbc0a0572b47ef5d5f25d56bd43587eb136170a
 was an attempt at improving that, which work very well in the cases where 
multiple indexes are queried and some of them intersect but all of them do not.

Before/After: !server-load.png|thumbnail!


was (Author: iksaif):
Note: 
https://github.com/iksaif/cassandra/commit/edbc0a0572b47ef5d5f25d56bd43587eb136170a
 was an attempt at improving that, which work very well in the cases where 
multiple indexes are queried and some of them intersect but all of them do not.

Before/After: !server-load.jpg|thumbnail!

> add RangeIterator interface and QueryPlan for SI
> 
>
> Key: CASSANDRA-10765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10765
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 4.x
>
> Attachments: server-load.png
>
>
> Currently built-in indexes have only one way of handling 
> intersections/unions: pick the highest selectivity predicate and filter on 
> other index expressions. This is not always the most efficient approach. 
> Dynamic query planning based on the different index characteristics would be 
> more optimal. Query Plan should be able to choose how to do intersections, 
> unions based on the metadata provided by indexes (returned by RangeIterator) 
> and RangeIterator would became a base for cross index interactions and should 
> have information such as min/max token, estimate number of wrapped tokens etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-10765) add RangeIterator interface and QueryPlan for SI

2017-06-09 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042791#comment-16042791
 ] 

Corentin Chary edited comment on CASSANDRA-10765 at 6/9/17 12:20 PM:
-

Note: 
https://github.com/iksaif/cassandra/commit/edbc0a0572b47ef5d5f25d56bd43587eb136170a
 was an attempt at improving that, which work very well in the cases where 
multiple indexes are queried and some of them intersect but all of them do not.

Before/After:
 !server-load.png!


was (Author: iksaif):
Note: 
https://github.com/iksaif/cassandra/commit/edbc0a0572b47ef5d5f25d56bd43587eb136170a
 was an attempt at improving that, which work very well in the cases where 
multiple indexes are queried and some of them intersect but all of them do not.

Before/After: !server-load.png|thumbnail!

> add RangeIterator interface and QueryPlan for SI
> 
>
> Key: CASSANDRA-10765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10765
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 4.x
>
> Attachments: server-load.png
>
>
> Currently built-in indexes have only one way of handling 
> intersections/unions: pick the highest selectivity predicate and filter on 
> other index expressions. This is not always the most efficient approach. 
> Dynamic query planning based on the different index characteristics would be 
> more optimal. Query Plan should be able to choose how to do intersections, 
> unions based on the metadata provided by indexes (returned by RangeIterator) 
> and RangeIterator would became a base for cross index interactions and should 
> have information such as min/max token, estimate number of wrapped tokens etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-10765) add RangeIterator interface and QueryPlan for SI

2017-06-09 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-10765:
---
Attachment: server-load.png

> add RangeIterator interface and QueryPlan for SI
> 
>
> Key: CASSANDRA-10765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10765
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 4.x
>
> Attachments: server-load.png
>
>
> Currently built-in indexes have only one way of handling 
> intersections/unions: pick the highest selectivity predicate and filter on 
> other index expressions. This is not always the most efficient approach. 
> Dynamic query planning based on the different index characteristics would be 
> more optimal. Query Plan should be able to choose how to do intersections, 
> unions based on the metadata provided by indexes (returned by RangeIterator) 
> and RangeIterator would became a base for cross index interactions and should 
> have information such as min/max token, estimate number of wrapped tokens etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-10765) add RangeIterator interface and QueryPlan for SI

2017-06-09 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042791#comment-16042791
 ] 

Corentin Chary edited comment on CASSANDRA-10765 at 6/9/17 12:20 PM:
-

Note: 
https://github.com/iksaif/cassandra/commit/edbc0a0572b47ef5d5f25d56bd43587eb136170a
 was an attempt at improving that, which work very well in the cases where 
multiple indexes are queried and some of them intersect but all of them do not.

Before/After: !server-load.jpg|thumbnail!


was (Author: iksaif):
Note: 
https://github.com/iksaif/cassandra/commit/edbc0a0572b47ef5d5f25d56bd43587eb136170a
 was an attempt at improving that, which work very well in the cases where 
multiple indexes are queried and some of them intersect but all of them do not.


> add RangeIterator interface and QueryPlan for SI
> 
>
> Key: CASSANDRA-10765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10765
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 4.x
>
> Attachments: server-load.png
>
>
> Currently built-in indexes have only one way of handling 
> intersections/unions: pick the highest selectivity predicate and filter on 
> other index expressions. This is not always the most efficient approach. 
> Dynamic query planning based on the different index characteristics would be 
> more optimal. Query Plan should be able to choose how to do intersections, 
> unions based on the metadata provided by indexes (returned by RangeIterator) 
> and RangeIterator would became a base for cross index interactions and should 
> have information such as min/max token, estimate number of wrapped tokens etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10765) add RangeIterator interface and QueryPlan for SI

2017-06-08 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042791#comment-16042791
 ] 

Corentin Chary commented on CASSANDRA-10765:


Note: 
https://github.com/iksaif/cassandra/commit/edbc0a0572b47ef5d5f25d56bd43587eb136170a
 was an attempt at improving that, which work very well in the cases where 
multiple indexes are queried and some of them intersect but all of them do not.


> add RangeIterator interface and QueryPlan for SI
> 
>
> Key: CASSANDRA-10765
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10765
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Local Write-Read Paths
>Reporter: Pavel Yaskevich
>Assignee: Pavel Yaskevich
> Fix For: 4.x
>
>
> Currently built-in indexes have only one way of handling 
> intersections/unions: pick the highest selectivity predicate and filter on 
> other index expressions. This is not always the most efficient approach. 
> Dynamic query planning based on the different index characteristics would be 
> more optimal. Query Plan should be able to choose how to do intersections, 
> unions based on the metadata provided by indexes (returned by RangeIterator) 
> and RangeIterator would became a base for cross index interactions and should 
> have information such as min/max token, estimate number of wrapped tokens etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-28 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988490#comment-15988490
 ] 

Corentin Chary commented on CASSANDRA-13418:


I agree that fixing CASSANDRA-13418 would be a better solution, but this is 
likely to take more time, and I'm unsure we will get to the point where it 
really solve our issues in all the cases.

I'll would be inclined to add the option too, with appropriate documentation.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10496) Make DTCS/TWCS split partitions based on time during compaction

2017-04-28 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988485#comment-15988485
 ] 

Corentin Chary commented on CASSANDRA-10496:


Inspecting each timestamp on each cell is surely more correct, but in the first 
version I'll be looking only at the minTimestamp of the partition (as long as 
you have short living partitions).

With the current writer mechanism I didn't find a way to switch the writer in 
the middle of a partition anyway..

> Make DTCS/TWCS split partitions based on time during compaction
> ---
>
> Key: CASSANDRA-10496
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10496
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>  Labels: dtcs
> Fix For: 3.11.x
>
>
> To avoid getting old data in new time windows with DTCS (or related, like 
> [TWCS|CASSANDRA-9666]), we need to split out old data into its own sstable 
> during compaction.
> My initial idea is to just create two sstables, when we create the compaction 
> task we state the start and end times for the window, and any data older than 
> the window will be put in its own sstable.
> By creating a single sstable with old data, we will incrementally get the 
> windows correct - say we have an sstable with these timestamps:
> {{[100, 99, 98, 97, 75, 50, 10]}}
> and we are compacting in window {{[100, 80]}} - we would create two sstables:
> {{[100, 99, 98, 97]}}, {{[75, 50, 10]}}, and the first window is now 
> 'correct'. The next compaction would compact in window {{[80, 60]}} and 
> create sstables {{[75]}}, {{[50, 10]}} etc.
> We will probably also want to base the windows on the newest data in the 
> sstables so that we actually have older data than the window.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-10496) Make DTCS/TWCS split partitions based on time during compaction

2017-04-28 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15988455#comment-15988455
 ] 

Corentin Chary commented on CASSANDRA-10496:


I wanted to give it a shot for TWCS because of CASSANDRA-13418, I was thinking 
about using a custom CompactionAwareWriter to seggregate data by timestamp in 
the first window (and also make --split-output work). Currently I'm planning to 
use partition.stats().minTimestamp, but I'm not sure how it is affect by 
read-repairs. It may be a better idea to group data by deletion time instead ..

> Make DTCS/TWCS split partitions based on time during compaction
> ---
>
> Key: CASSANDRA-10496
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10496
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>  Labels: dtcs
> Fix For: 3.11.x
>
>
> To avoid getting old data in new time windows with DTCS (or related, like 
> [TWCS|CASSANDRA-9666]), we need to split out old data into its own sstable 
> during compaction.
> My initial idea is to just create two sstables, when we create the compaction 
> task we state the start and end times for the window, and any data older than 
> the window will be put in its own sstable.
> By creating a single sstable with old data, we will incrementally get the 
> windows correct - say we have an sstable with these timestamps:
> {{[100, 99, 98, 97, 75, 50, 10]}}
> and we are compacting in window {{[100, 80]}} - we would create two sstables:
> {{[100, 99, 98, 97]}}, {{[75, 50, 10]}}, and the first window is now 
> 'correct'. The next compaction would compact in window {{[80, 60]}} and 
> create sstables {{[75]}}, {{[50, 10]}} etc.
> We will probably also want to base the windows on the newest data in the 
> sstables so that we actually have older data than the window.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-26 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984600#comment-15984600
 ] 

Corentin Chary commented on CASSANDRA-13418:


[~krummas]: good point, CASSANDRA-10496 seems to come with its own set of 
issues: number of sstables would probably get huge, except if you add some kind 
of "buffering" like what is done for the first window.
I'll see if I can find a reasonable solution for TWCS or propose it in the 
related ticket. If can't agree on a good solution, we can fallback to what is 
proposed here.

[~adejanovski]: about skipping getOverlappingSSTables() completely, I though 
about that too, but I think it's used in some other place and I wasn't sure 
what the result would be.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-26 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15984354#comment-15984354
 ] 

Corentin Chary commented on CASSANDRA-13418:


Trying to go forward, [~jjirsa], [~adejanovski], [~krummas]: what is your 
opinion on adding a custom option to TWCS and DTCS only doing basically what my 
current patch does ?
The only drawback that I see if a fully expired overlapping table is removed is 
that read-repaired data that was explicitely deleted could eventually 
re-appear. If you're aware of more dangerous situations I'd be glad to hear 
about it.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-21 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13432:
---
Attachment: (was: CASSANDRA-13432.patch)

> MemtableReclaimMemory can get stuck because of lack of timeout in 
> getTopLevelColumns()
> --
>
> Key: CASSANDRA-13432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 2.1.15
>Reporter: Corentin Chary
> Fix For: 2.1.x
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   32135875 0
>  0
> ReadStage   114 0   29492940 0
>  0
> RequestResponseStage  0 0   86090931 0
>  0
> ReadRepairStage   0 0 166645 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 47 0
>  0
> GossipStage   0 0 188769 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor0 0  86835 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0 92 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0563 0
>  0
> MemtablePostFlush 0 0   1500 0
>  0
> MemtableReclaimMemory 129534 0
>  0
> Native-Transport-Requests41 0   54819182 0
>   1896
> {code}
> {code}
> "MemtableReclaimMemory:195" - Thread t@6268
>java.lang.Thread.State: WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
>   at 
> org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "SharedPool-Worker-195" - Thread t@989
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
>   at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
>   at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
>   at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
>   at 
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
>   at 
> 

[jira] [Updated] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-19 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13418:
---

Agreed for the option. Would be easy to implement it using a new one.
IMOH it's more dangerous to have nothing as this would degrade write
performances and take up to twice the space originally planned.  Compared
to that it isn't really an issue to have re-appearing data after an
explicit deletion (I think that's the worse that can happen, can be wrong)




> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-18 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15974024#comment-15974024
 ] 

Corentin Chary commented on CASSANDRA-13418:


[~rgerard]: No it should certainly not be the default. If you look at the 
description of our usecase, it's only necessary when you have short-lived data 
with a lot of cells, which make running periodic repairs impossible or very 
impractical and when you also need/want read-repairs because you can't afford 
QUORUM reads (datacenters on separate continents and low latency requirements). 
So there is a need for it, but it should not be the default.

[~jjirsa], any opinion ?

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-18 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972898#comment-15972898
 ] 

Corentin Chary commented on CASSANDRA-13418:


Here is an attempt at a patch: 
https://github.com/iksaif/cassandra/tree/CASSANDRA-13005-trunk

Works with:
{code}
ALTER TABLE test.test WITH compaction = {'class': 
'TimeWindowCompactionStrategy', 'provide_overlapping_tombstones': 
'ignore_overlaps'};
{code}

This outputs:
{code}
WARN  [CompactionExecutor:4] 2017-04-18 17:17:00,538 
CompactionController.java:96 - You are running with overlapping sstable sanity 
checks for tombstones disabled on test:test,this can lead to inconsistencies 
when running explicit deletions.
{code}

I'm still not sure about reusing the existing option, I could be conviced 
overwise (but it should not be hard to change).

Once we agree on that I can add documentation and unit tests.


> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-18 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15972825#comment-15972825
 ] 

Corentin Chary commented on CASSANDRA-13432:


Latest patch https://github.com/iksaif/cassandra/commits/CASSANDRA-13432-2.x

> MemtableReclaimMemory can get stuck because of lack of timeout in 
> getTopLevelColumns()
> --
>
> Key: CASSANDRA-13432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 2.1.15
>Reporter: Corentin Chary
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-13432.patch
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   32135875 0
>  0
> ReadStage   114 0   29492940 0
>  0
> RequestResponseStage  0 0   86090931 0
>  0
> ReadRepairStage   0 0 166645 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 47 0
>  0
> GossipStage   0 0 188769 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor0 0  86835 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0 92 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0563 0
>  0
> MemtablePostFlush 0 0   1500 0
>  0
> MemtableReclaimMemory 129534 0
>  0
> Native-Transport-Requests41 0   54819182 0
>   1896
> {code}
> {code}
> "MemtableReclaimMemory:195" - Thread t@6268
>java.lang.Thread.State: WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
>   at 
> org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "SharedPool-Worker-195" - Thread t@989
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
>   at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
>   at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
>   at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
>   at 
> 

[jira] [Updated] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-13 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13432:
---
Status: Patch Available  (was: Open)

> MemtableReclaimMemory can get stuck because of lack of timeout in 
> getTopLevelColumns()
> --
>
> Key: CASSANDRA-13432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 2.1.15
>Reporter: Corentin Chary
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-13432.patch
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   32135875 0
>  0
> ReadStage   114 0   29492940 0
>  0
> RequestResponseStage  0 0   86090931 0
>  0
> ReadRepairStage   0 0 166645 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 47 0
>  0
> GossipStage   0 0 188769 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor0 0  86835 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0 92 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0563 0
>  0
> MemtablePostFlush 0 0   1500 0
>  0
> MemtableReclaimMemory 129534 0
>  0
> Native-Transport-Requests41 0   54819182 0
>   1896
> {code}
> {code}
> "MemtableReclaimMemory:195" - Thread t@6268
>java.lang.Thread.State: WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
>   at 
> org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "SharedPool-Worker-195" - Thread t@989
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
>   at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
>   at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
>   at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
>   at 
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
>   at 
> 

[jira] [Updated] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-13 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13432:
---
Attachment: CASSANDRA-13432.patch

> MemtableReclaimMemory can get stuck because of lack of timeout in 
> getTopLevelColumns()
> --
>
> Key: CASSANDRA-13432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 2.1.15
>Reporter: Corentin Chary
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-13432.patch
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   32135875 0
>  0
> ReadStage   114 0   29492940 0
>  0
> RequestResponseStage  0 0   86090931 0
>  0
> ReadRepairStage   0 0 166645 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 47 0
>  0
> GossipStage   0 0 188769 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor0 0  86835 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0 92 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0563 0
>  0
> MemtablePostFlush 0 0   1500 0
>  0
> MemtableReclaimMemory 129534 0
>  0
> Native-Transport-Requests41 0   54819182 0
>   1896
> {code}
> {code}
> "MemtableReclaimMemory:195" - Thread t@6268
>java.lang.Thread.State: WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
>   at 
> org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "SharedPool-Worker-195" - Thread t@989
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
>   at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
>   at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
>   at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
>   at 
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
>   at 
> 

[jira] [Comment Edited] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-13 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967256#comment-15967256
 ] 

Corentin Chary edited comment on CASSANDRA-13432 at 4/13/17 7:54 AM:
-

Tried the patch, setting the tombstone threshold to one:
{code}
ERROR [SharedPool-Worker-4] 2017-04-13 09:51:55,891 QueryFilter.java:201 - 
Scanned over 1 tombstones in system.size_estimates for key: unknown; query 
aborted (see tombstone_failure_threshold).
WARN  [SharedPool-Worker-4] 2017-04-13 09:51:55,894 
AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-4,10,main]: {}
java.lang.RuntimeException: 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2249)
 ~[main/:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_121]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
 ~[main/:na]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$TraceSessionFutureTask.run(AbstractTracingAwareExecutorService.java:136)
 [main/:na]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[main/:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException: null
at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:202) 
~[main/:na]
at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:163) 
~[main/:na]
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
 ~[main/:na]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
 ~[main/:na]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
 ~[main/:na]
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 ~[guava-16.0.jar:na]
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
~[guava-16.0.jar:na]
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:263)
 ~[main/:na]
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:114) 
~[main/:na]
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:88)
 ~[main/:na]
at 
org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:99)
 ~[main/:na]
at 
org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:71)
 ~[main/:na]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:117)
 ~[main/:na]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:100)
 ~[main/:na]
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 ~[guava-16.0.jar:na]
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
~[guava-16.0.jar:na]
at 
org.apache.cassandra.db.ColumnFamilyStore$9.computeNext(ColumnFamilyStore.java:2115)
 ~[main/:na]
at 
org.apache.cassandra.db.ColumnFamilyStore$9.computeNext(ColumnFamilyStore.java:2111)
 ~[main/:na]
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 ~[guava-16.0.jar:na]
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
~[guava-16.0.jar:na]
at 
org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:2266) 
~[main/:na]
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:2224)
 ~[main/:na]
at 
org.apache.cassandra.db.PagedRangeCommand.executeLocally(PagedRangeCommand.java:115)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1572)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2246)
 ~[main/:na]
... 5 common frames omitted
{code}


was (Author: iksaif):
Tried the patch, setting the tombstone threshold to one:
{code}
WARN  [SharedPool-Worker-4] 2017-04-13 09:51:55,894 
AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-4,10,main]: {}
java.lang.RuntimeException: 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2249)
 ~[main/:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_121]
at 

[jira] [Commented] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-13 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15967256#comment-15967256
 ] 

Corentin Chary commented on CASSANDRA-13432:


Tried the patch, setting the tombstone threshold to one:
{code}
WARN  [SharedPool-Worker-4] 2017-04-13 09:51:55,894 
AbstractTracingAwareExecutorService.java:169 - Uncaught exception on thread 
Thread[SharedPool-Worker-4,10,main]: {}
java.lang.RuntimeException: 
org.apache.cassandra.db.filter.TombstoneOverwhelmingException
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2249)
 ~[main/:na]
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_121]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
 ~[main/:na]
at 
org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$TraceSessionFutureTask.run(AbstractTracingAwareExecutorService.java:136)
 [main/:na]
at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[main/:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
Caused by: org.apache.cassandra.db.filter.TombstoneOverwhelmingException: null
at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:202) 
~[main/:na]
at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:163) 
~[main/:na]
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
 ~[main/:na]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
 ~[main/:na]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
 ~[main/:na]
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 ~[guava-16.0.jar:na]
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
~[guava-16.0.jar:na]
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:263)
 ~[main/:na]
at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:114) 
~[main/:na]
at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:88)
 ~[main/:na]
at 
org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:99)
 ~[main/:na]
at 
org.apache.cassandra.db.RowIteratorFactory$2.getReduced(RowIteratorFactory.java:71)
 ~[main/:na]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:117)
 ~[main/:na]
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:100)
 ~[main/:na]
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 ~[guava-16.0.jar:na]
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
~[guava-16.0.jar:na]
at 
org.apache.cassandra.db.ColumnFamilyStore$9.computeNext(ColumnFamilyStore.java:2115)
 ~[main/:na]
at 
org.apache.cassandra.db.ColumnFamilyStore$9.computeNext(ColumnFamilyStore.java:2111)
 ~[main/:na]
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 ~[guava-16.0.jar:na]
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) 
~[guava-16.0.jar:na]
at 
org.apache.cassandra.db.ColumnFamilyStore.filter(ColumnFamilyStore.java:2266) 
~[main/:na]
at 
org.apache.cassandra.db.ColumnFamilyStore.getRangeSlice(ColumnFamilyStore.java:2224)
 ~[main/:na]
at 
org.apache.cassandra.db.PagedRangeCommand.executeLocally(PagedRangeCommand.java:115)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$LocalRangeSliceRunnable.runMayThrow(StorageProxy.java:1572)
 ~[main/:na]
at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2246)
 ~[main/:na]
... 5 common frames omitted
{code}

> MemtableReclaimMemory can get stuck because of lack of timeout in 
> getTopLevelColumns()
> --
>
> Key: CASSANDRA-13432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
> Project: Cassandra
>  Issue Type: Bug
> Environment: cassandra 2.1.15
>Reporter: Corentin Chary
> Fix For: 2.1.x
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   32135875 0
>  0
> ReadStage   114 0   29492940 0
>  0
> 

[jira] [Commented] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-12 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15965534#comment-15965534
 ] 

Corentin Chary commented on CASSANDRA-13432:


It's 2.1.15 but I don't believe it has been fixed. I believe that it's stuck in 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156) 
which doesn't count tombstones. A simple patch could be something like:

{code}
diff --git a/src/java/org/apache/cassandra/db/filter/QueryFilter.java 
b/src/java/org/apache/cassandra/db/filter/QueryFilter.java
index db531a5..8b718db 100644
--- a/src/java/org/apache/cassandra/db/filter/QueryFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/QueryFilter.java
@@ -23,6 +23,10 @@ import java.util.Iterator;
 import java.util.List;
 import java.util.SortedSet;
 
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.Cell;
 import org.apache.cassandra.db.ColumnFamily;
 import org.apache.cassandra.db.DecoratedKey;
@@ -34,10 +38,12 @@ import 
org.apache.cassandra.db.columniterator.OnDiskAtomIterator;
 import org.apache.cassandra.db.composites.CellName;
 import org.apache.cassandra.db.composites.Composite;
 import org.apache.cassandra.io.sstable.SSTableReader;
+import org.apache.cassandra.tracing.Tracing;
 import org.apache.cassandra.utils.MergeIterator;
 
 public class QueryFilter
 {
+private static final Logger logger = 
LoggerFactory.getLogger(QueryFilter.class);
 public final DecoratedKey key;
 public final String cfName;
 public final IDiskAtomFilter filter;
@@ -147,6 +153,7 @@ public class QueryFilter
 return new Iterator()
 {
 private Cell next;
+private int tombstoneCount = 0;


 



 
 public boolean hasNext()   


 
 {  


 
@@ -181,6 +188,19 @@ public class QueryFilter   


 
 }  


 
 else   


 
 {  


 
+tombstoneCount++;  


 
+if (tombstoneCount > 
DatabaseDescriptor.getTombstoneFailureThreshold())  

   
+{  


 
+Tracing.trace("Scanned over {} tombstones; query 
aborted (see tombstone_failure_threshold)", 

   
+
DatabaseDescriptor.getTombstoneFailureThreshold());

[jira] [Commented] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-11 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15964465#comment-15964465
 ] 

Corentin Chary commented on CASSANDRA-13432:


I checked, 3.x has a different code to count tombstones so it's likely not 
affected

> MemtableReclaimMemory can get stuck because of lack of timeout in 
> getTopLevelColumns()
> --
>
> Key: CASSANDRA-13432
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Corentin Chary
> Fix For: 2.1.x
>
>
> This might affect 3.x too, I'm not sure.
> {code}
> $ nodetool tpstats
> Pool NameActive   Pending  Completed   Blocked  All 
> time blocked
> MutationStage 0 0   32135875 0
>  0
> ReadStage   114 0   29492940 0
>  0
> RequestResponseStage  0 0   86090931 0
>  0
> ReadRepairStage   0 0 166645 0
>  0
> CounterMutationStage  0 0  0 0
>  0
> MiscStage 0 0  0 0
>  0
> HintedHandoff 0 0 47 0
>  0
> GossipStage   0 0 188769 0
>  0
> CacheCleanupExecutor  0 0  0 0
>  0
> InternalResponseStage 0 0  0 0
>  0
> CommitLogArchiver 0 0  0 0
>  0
> CompactionExecutor0 0  86835 0
>  0
> ValidationExecutor0 0  0 0
>  0
> MigrationStage0 0  0 0
>  0
> AntiEntropyStage  0 0  0 0
>  0
> PendingRangeCalculator0 0 92 0
>  0
> Sampler   0 0  0 0
>  0
> MemtableFlushWriter   0 0563 0
>  0
> MemtablePostFlush 0 0   1500 0
>  0
> MemtableReclaimMemory 129534 0
>  0
> Native-Transport-Requests41 0   54819182 0
>   1896
> {code}
> {code}
> "MemtableReclaimMemory:195" - Thread t@6268
>java.lang.Thread.State: WAITING
>   at sun.misc.Unsafe.park(Native Method)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
>   at 
> org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
>   at 
> org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
>   at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
>   at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)
> "SharedPool-Worker-195" - Thread t@989
>java.lang.Thread.State: RUNNABLE
>   at 
> org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
>   at 
> org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
>   at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
>   at 
> org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
>   at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
>   at 
> org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
>   at 
> 

[jira] [Updated] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-11 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13432:
---
Description: 
This might affect 3.x too, I'm not sure.

{code}
$ nodetool tpstats
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
MutationStage 0 0   32135875 0  
   0
ReadStage   114 0   29492940 0  
   0
RequestResponseStage  0 0   86090931 0  
   0
ReadRepairStage   0 0 166645 0  
   0
CounterMutationStage  0 0  0 0  
   0
MiscStage 0 0  0 0  
   0
HintedHandoff 0 0 47 0  
   0
GossipStage   0 0 188769 0  
   0
CacheCleanupExecutor  0 0  0 0  
   0
InternalResponseStage 0 0  0 0  
   0
CommitLogArchiver 0 0  0 0  
   0
CompactionExecutor0 0  86835 0  
   0
ValidationExecutor0 0  0 0  
   0
MigrationStage0 0  0 0  
   0
AntiEntropyStage  0 0  0 0  
   0
PendingRangeCalculator0 0 92 0  
   0
Sampler   0 0  0 0  
   0
MemtableFlushWriter   0 0563 0  
   0
MemtablePostFlush 0 0   1500 0  
   0
MemtableReclaimMemory 129534 0  
   0
Native-Transport-Requests41 0   54819182 0  
1896
{code}

{code}
"MemtableReclaimMemory:195" - Thread t@6268
   java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
at 
org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
at 
org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"SharedPool-Worker-195" - Thread t@989
   java.lang.Thread.State: RUNNABLE
at 
org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
at 
org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
at 
org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
at 
org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
at 
org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:263)
at 

[jira] [Updated] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-11 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13432?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13432:
---
Description: 
This might affect 3.x too, I'm not sure.

{code}
$ nodetool tpstats
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
MutationStage 0 0   32135875 0  
   0
ReadStage   114 0   29492940 0  
   0
RequestResponseStage  0 0   86090931 0  
   0
ReadRepairStage   0 0 166645 0  
   0
CounterMutationStage  0 0  0 0  
   0
MiscStage 0 0  0 0  
   0
HintedHandoff 0 0 47 0  
   0
GossipStage   0 0 188769 0  
   0
CacheCleanupExecutor  0 0  0 0  
   0
InternalResponseStage 0 0  0 0  
   0
CommitLogArchiver 0 0  0 0  
   0
CompactionExecutor0 0  86835 0  
   0
ValidationExecutor0 0  0 0  
   0
MigrationStage0 0  0 0  
   0
AntiEntropyStage  0 0  0 0  
   0
PendingRangeCalculator0 0 92 0  
   0
Sampler   0 0  0 0  
   0
MemtableFlushWriter   0 0563 0  
   0
MemtablePostFlush 0 0   1500 0  
   0
MemtableReclaimMemory 129534 0  
   0
Native-Transport-Requests41 0   54819182 0  
1896
{code}

{code}
"MemtableReclaimMemory:195" - Thread t@6268
   java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
at 
org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
at 
org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"SharedPool-Worker-195" - Thread t@989
   java.lang.Thread.State: RUNNABLE
at 
org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
at 
org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
at 
org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
at 
org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
at 
org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:263)
at 

[jira] [Created] (CASSANDRA-13432) MemtableReclaimMemory can get stuck because of lack of timeout in getTopLevelColumns()

2017-04-11 Thread Corentin Chary (JIRA)
Corentin Chary created CASSANDRA-13432:
--

 Summary: MemtableReclaimMemory can get stuck because of lack of 
timeout in getTopLevelColumns()
 Key: CASSANDRA-13432
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13432
 Project: Cassandra
  Issue Type: Bug
Reporter: Corentin Chary
 Fix For: 2.1.x


This might affect 3.x too, I'm not sure.

{code}
$ nodetool tpstats
Pool NameActive   Pending  Completed   Blocked  All 
time blocked
MutationStage 0 0   32135875 0  
   0
ReadStage   114 0   29492940 0  
   0
RequestResponseStage  0 0   86090931 0  
   0
ReadRepairStage   0 0 166645 0  
   0
CounterMutationStage  0 0  0 0  
   0
MiscStage 0 0  0 0  
   0
HintedHandoff 0 0 47 0  
   0
GossipStage   0 0 188769 0  
   0
CacheCleanupExecutor  0 0  0 0  
   0
InternalResponseStage 0 0  0 0  
   0
CommitLogArchiver 0 0  0 0  
   0
CompactionExecutor0 0  86835 0  
   0
ValidationExecutor0 0  0 0  
   0
MigrationStage0 0  0 0  
   0
AntiEntropyStage  0 0  0 0  
   0
PendingRangeCalculator0 0 92 0  
   0
Sampler   0 0  0 0  
   0
MemtableFlushWriter   0 0563 0  
   0
MemtablePostFlush 0 0   1500 0  
   0
MemtableReclaimMemory 129534 0  
   0
Native-Transport-Requests41 0   54819182 0  
1896
{code}

{code}
"MemtableReclaimMemory:195" - Thread t@6268
   java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at 
org.apache.cassandra.utils.concurrent.WaitQueue$AbstractSignal.awaitUninterruptibly(WaitQueue.java:283)
at 
org.apache.cassandra.utils.concurrent.OpOrder$Barrier.await(OpOrder.java:417)
at 
org.apache.cassandra.db.ColumnFamilyStore$Flush$1.runMayThrow(ColumnFamilyStore.java:1151)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- locked <6e7b1160> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"SharedPool-Worker-195" - Thread t@989
   java.lang.Thread.State: RUNNABLE
at 
org.apache.cassandra.db.RangeTombstoneList.addInternal(RangeTombstoneList.java:690)
at 
org.apache.cassandra.db.RangeTombstoneList.insertFrom(RangeTombstoneList.java:650)
at 
org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:171)
at 
org.apache.cassandra.db.RangeTombstoneList.add(RangeTombstoneList.java:143)
at org.apache.cassandra.db.DeletionInfo.add(DeletionInfo.java:240)
at 
org.apache.cassandra.db.ArrayBackedSortedColumns.delete(ArrayBackedSortedColumns.java:483)
at org.apache.cassandra.db.ColumnFamily.addAtom(ColumnFamily.java:153)
at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:184)
at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156)
at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 

[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-09 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962212#comment-15962212
 ] 

Corentin Chary commented on CASSANDRA-13418:


If I understand things correctly here, the worst that can happen is that data 
could re-appear. Remember that we just drop SSTables were *all* the items have 
expired.

(The worst that can happen if you don't have the option is that you suddently 
stop dropping SSTables and all your disk full up.)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-09 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15962183#comment-15962183
 ] 

Corentin Chary commented on CASSANDRA-13418:


AFAIK provide_overlapping_tombstones is a compaction property that we already 
have. I'm suggesting to add "ignore" on top of the existing "none" (default), 
"cell" and "row" values.

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13418) Allow TWCS to ignore overlaps when dropping fully expired sstables

2017-04-06 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13418:
---
Summary: Allow TWCS to ignore overlaps when dropping fully expired sstables 
 (was: Allow TWCS to ignore overlaps)

> Allow TWCS to ignore overlaps when dropping fully expired sstables
> --
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-12962) SASI: Index are rebuilt on restart

2017-04-06 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958936#comment-15958936
 ] 

Corentin Chary edited comment on CASSANDRA-12962 at 4/6/17 1:45 PM:


Sure. I looked at the patch again and it looks prefectly fine.


was (Author: iksaif):
Sure

> SASI: Index are rebuilt on restart
> --
>
> Key: CASSANDRA-12962
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12962
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Alex Petrov
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: screenshot-1.png
>
>
> Apparently when cassandra any index that does not index a value in *every* 
> live SSTable gets rebuild. The offending code can be found in the constructor 
> of SASIIndex.
> You can easilly reproduce it:
> {code}
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE test.test (
> a text PRIMARY KEY,
> b text,
> c text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX test_b_idx ON test.test (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> CREATE CUSTOM INDEX test_c_idx ON test.test (c) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> INSERT INTO test.test (a, b) VALUES ('a', 'b');
> {code}
> Log (I added additional traces):
> {code}
> INFO  [main] 2016-11-28 15:32:21,191 ColumnFamilyStore.java:406 - 
> Initializing test.test
> DEBUG [SSTableBatchOpen:1] 2016-11-28 15:32:21,192 SSTableReader.java:505 - 
> Opening 
> /mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big 
> (0.034KiB)
> DEBUG [main] 2016-11-28 15:32:21,194 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@2f661b1a[id=6b00489b-7010-396e-9348-9f32f5167f88,name=test_b_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=b}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> INFO  [main] 2016-11-28 15:32:21,194 DataTracker.java:152 - 
> SSTableIndex.open(column: b, minTerm: value, maxTerm: value, minKey: key, 
> maxKey: key, sstable: BigTableReader(path='/mnt/ssd/tmp/data/data/test/test\
> -229e6380b57711e68407158fde22e121/mc-1-big-Data.db'))
> DEBUG [main] 2016-11-28 15:32:21,195 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: {}
> DEBUG [main] 2016-11-28 15:32:21,195 ColumnFamilyStore.java:895 - Enqueuing 
> flush of IndexInfo: 0.386KiB (0%) on-heap, 0.000KiB (0%) off-heap
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:465 - Writing Memtable-IndexInfo@748981977(0.054KiB serialized 
> bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = (min(-9223\
> 372036854775808), max(9223372036854775807)]
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:494 - Completed flushing 
> /mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db
>  (0.035KiB) for\
>  commitlog position CommitLogPosition(segmentId=1480343535479, position=15652)
> DEBUG [MemtableFlushWriter:1] 2016-11-28 15:32:21,224 
> ColumnFamilyStore.java:1200 - Flushed to 
> [BigTableReader(path='/mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db\
> ')] (1 sstables, 4.838KiB), biggest 4.838KiB, smallest 4.838KiB
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@12f3d291[id=45fcb286-b87a-3d18-a04b-b899a9880c91,name=test_c_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=c}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:121 - to rebuild: index: 
> BigTableReader(path='/mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big-Data.db'),
>  sstable: org.apache.cassa\
> ndra.index.sasi.conf.ColumnIndex@6cbb6b0e
> DEBUG 

[jira] [Updated] (CASSANDRA-12962) SASI: Index are rebuilt on restart

2017-04-06 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-12962:
---
Reviewer: Corentin Chary

> SASI: Index are rebuilt on restart
> --
>
> Key: CASSANDRA-12962
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12962
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Alex Petrov
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: screenshot-1.png
>
>
> Apparently when cassandra any index that does not index a value in *every* 
> live SSTable gets rebuild. The offending code can be found in the constructor 
> of SASIIndex.
> You can easilly reproduce it:
> {code}
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE test.test (
> a text PRIMARY KEY,
> b text,
> c text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX test_b_idx ON test.test (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> CREATE CUSTOM INDEX test_c_idx ON test.test (c) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> INSERT INTO test.test (a, b) VALUES ('a', 'b');
> {code}
> Log (I added additional traces):
> {code}
> INFO  [main] 2016-11-28 15:32:21,191 ColumnFamilyStore.java:406 - 
> Initializing test.test
> DEBUG [SSTableBatchOpen:1] 2016-11-28 15:32:21,192 SSTableReader.java:505 - 
> Opening 
> /mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big 
> (0.034KiB)
> DEBUG [main] 2016-11-28 15:32:21,194 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@2f661b1a[id=6b00489b-7010-396e-9348-9f32f5167f88,name=test_b_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=b}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> INFO  [main] 2016-11-28 15:32:21,194 DataTracker.java:152 - 
> SSTableIndex.open(column: b, minTerm: value, maxTerm: value, minKey: key, 
> maxKey: key, sstable: BigTableReader(path='/mnt/ssd/tmp/data/data/test/test\
> -229e6380b57711e68407158fde22e121/mc-1-big-Data.db'))
> DEBUG [main] 2016-11-28 15:32:21,195 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: {}
> DEBUG [main] 2016-11-28 15:32:21,195 ColumnFamilyStore.java:895 - Enqueuing 
> flush of IndexInfo: 0.386KiB (0%) on-heap, 0.000KiB (0%) off-heap
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:465 - Writing Memtable-IndexInfo@748981977(0.054KiB serialized 
> bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = (min(-9223\
> 372036854775808), max(9223372036854775807)]
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:494 - Completed flushing 
> /mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db
>  (0.035KiB) for\
>  commitlog position CommitLogPosition(segmentId=1480343535479, position=15652)
> DEBUG [MemtableFlushWriter:1] 2016-11-28 15:32:21,224 
> ColumnFamilyStore.java:1200 - Flushed to 
> [BigTableReader(path='/mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db\
> ')] (1 sstables, 4.838KiB), biggest 4.838KiB, smallest 4.838KiB
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@12f3d291[id=45fcb286-b87a-3d18-a04b-b899a9880c91,name=test_c_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=c}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:121 - to rebuild: index: 
> BigTableReader(path='/mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big-Data.db'),
>  sstable: org.apache.cassa\
> ndra.index.sasi.conf.ColumnIndex@6cbb6b0e
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: 
> 

[jira] [Commented] (CASSANDRA-12962) SASI: Index are rebuilt on restart

2017-04-06 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958936#comment-15958936
 ] 

Corentin Chary commented on CASSANDRA-12962:


Sure

> SASI: Index are rebuilt on restart
> --
>
> Key: CASSANDRA-12962
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12962
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Alex Petrov
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: screenshot-1.png
>
>
> Apparently when cassandra any index that does not index a value in *every* 
> live SSTable gets rebuild. The offending code can be found in the constructor 
> of SASIIndex.
> You can easilly reproduce it:
> {code}
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE test.test (
> a text PRIMARY KEY,
> b text,
> c text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX test_b_idx ON test.test (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> CREATE CUSTOM INDEX test_c_idx ON test.test (c) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> INSERT INTO test.test (a, b) VALUES ('a', 'b');
> {code}
> Log (I added additional traces):
> {code}
> INFO  [main] 2016-11-28 15:32:21,191 ColumnFamilyStore.java:406 - 
> Initializing test.test
> DEBUG [SSTableBatchOpen:1] 2016-11-28 15:32:21,192 SSTableReader.java:505 - 
> Opening 
> /mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big 
> (0.034KiB)
> DEBUG [main] 2016-11-28 15:32:21,194 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@2f661b1a[id=6b00489b-7010-396e-9348-9f32f5167f88,name=test_b_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=b}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> INFO  [main] 2016-11-28 15:32:21,194 DataTracker.java:152 - 
> SSTableIndex.open(column: b, minTerm: value, maxTerm: value, minKey: key, 
> maxKey: key, sstable: BigTableReader(path='/mnt/ssd/tmp/data/data/test/test\
> -229e6380b57711e68407158fde22e121/mc-1-big-Data.db'))
> DEBUG [main] 2016-11-28 15:32:21,195 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: {}
> DEBUG [main] 2016-11-28 15:32:21,195 ColumnFamilyStore.java:895 - Enqueuing 
> flush of IndexInfo: 0.386KiB (0%) on-heap, 0.000KiB (0%) off-heap
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:465 - Writing Memtable-IndexInfo@748981977(0.054KiB serialized 
> bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = (min(-9223\
> 372036854775808), max(9223372036854775807)]
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:494 - Completed flushing 
> /mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db
>  (0.035KiB) for\
>  commitlog position CommitLogPosition(segmentId=1480343535479, position=15652)
> DEBUG [MemtableFlushWriter:1] 2016-11-28 15:32:21,224 
> ColumnFamilyStore.java:1200 - Flushed to 
> [BigTableReader(path='/mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db\
> ')] (1 sstables, 4.838KiB), biggest 4.838KiB, smallest 4.838KiB
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@12f3d291[id=45fcb286-b87a-3d18-a04b-b899a9880c91,name=test_c_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=c}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:121 - to rebuild: index: 
> BigTableReader(path='/mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big-Data.db'),
>  sstable: org.apache.cassa\
> ndra.index.sasi.conf.ColumnIndex@6cbb6b0e
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: 
> 

[jira] [Commented] (CASSANDRA-13418) Allow TWCS to ignore overlaps

2017-04-05 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958337#comment-15958337
 ] 

Corentin Chary commented on CASSANDRA-13418:


What do you think about provide_overlapping_tombstones = "ignore" ? This is a 
little would integrate nicely with the code and does not add yet another 
compaction option (but sounds a little weird).


> Allow TWCS to ignore overlaps
> -
>
> Key: CASSANDRA-13418
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Corentin Chary
>  Labels: twcs
>
> http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
> you really want read-repairs you're going to have sstables blocking the 
> expiration of other fully expired SSTables because they overlap.
> You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
> very low value and that will purge the blockers of old data that should 
> already have expired, thus removing the overlaps and allowing the other 
> SSTables to expire.
> The thing is that this is rather CPU intensive and not optimal. If you have 
> time series, you might not care if all your data doesn't exactly expire at 
> the right time, or if data re-appears for some time, as long as it gets 
> deleted as soon as it can. And in this situation I believe it would be really 
> beneficial to allow users to simply ignore overlapping SSTables when looking 
> for fully expired ones.
> To the question: why would you need read-repairs ?
> - Full repairs basically take longer than the TTL of the data on my dataset, 
> so this isn't really effective.
> - Even with a 10% chances of doing a repair, we found out that this would be 
> enough to greatly reduce entropy of the most used data (and if you have 
> timeseries, you're likely to have a dashboard doing the same important 
> queries over and over again).
> - LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.
> I'll try to come up with a patch demonstrating how this would work, try it on 
> our system and report the effects.
> cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CASSANDRA-13418) Allow TWCS to ignore overlaps

2017-04-05 Thread Corentin Chary (JIRA)
Corentin Chary created CASSANDRA-13418:
--

 Summary: Allow TWCS to ignore overlaps
 Key: CASSANDRA-13418
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13418
 Project: Cassandra
  Issue Type: Improvement
  Components: Compaction
Reporter: Corentin Chary


http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html explains it well. If 
you really want read-repairs you're going to have sstables blocking the 
expiration of other fully expired SSTables because they overlap.

You can set unchecked_tombstone_compaction = true or tombstone_threshold to a 
very low value and that will purge the blockers of old data that should already 
have expired, thus removing the overlaps and allowing the other SSTables to 
expire.

The thing is that this is rather CPU intensive and not optimal. If you have 
time series, you might not care if all your data doesn't exactly expire at the 
right time, or if data re-appears for some time, as long as it gets deleted as 
soon as it can. And in this situation I believe it would be really beneficial 
to allow users to simply ignore overlapping SSTables when looking for fully 
expired ones.

To the question: why would you need read-repairs ?
- Full repairs basically take longer than the TTL of the data on my dataset, so 
this isn't really effective.
- Even with a 10% chances of doing a repair, we found out that this would be 
enough to greatly reduce entropy of the most used data (and if you have 
timeseries, you're likely to have a dashboard doing the same important queries 
over and over again).
- LOCAL_QUORUM is too expensive (need >3 replicas), QUORUM is too slow.

I'll try to come up with a patch demonstrating how this would work, try it on 
our system and report the effects.

cc: [~adejanovski], [~rgerard] as I know you worked on similar issues already.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12962) SASI: Index are rebuilt on restart

2017-03-27 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15943403#comment-15943403
 ] 

Corentin Chary commented on CASSANDRA-12962:


Looks robust enough to me :)

> SASI: Index are rebuilt on restart
> --
>
> Key: CASSANDRA-12962
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12962
> Project: Cassandra
>  Issue Type: Bug
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Alex Petrov
>Priority: Minor
> Fix For: 3.11.x
>
> Attachments: screenshot-1.png
>
>
> Apparently when cassandra any index that does not index a value in *every* 
> live SSTable gets rebuild. The offending code can be found in the constructor 
> of SASIIndex.
> You can easilly reproduce it:
> {code}
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE test.test (
> a text PRIMARY KEY,
> b text,
> c text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX test_b_idx ON test.test (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> CREATE CUSTOM INDEX test_c_idx ON test.test (c) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> INSERT INTO test.test (a, b) VALUES ('a', 'b');
> {code}
> Log (I added additional traces):
> {code}
> INFO  [main] 2016-11-28 15:32:21,191 ColumnFamilyStore.java:406 - 
> Initializing test.test
> DEBUG [SSTableBatchOpen:1] 2016-11-28 15:32:21,192 SSTableReader.java:505 - 
> Opening 
> /mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big 
> (0.034KiB)
> DEBUG [main] 2016-11-28 15:32:21,194 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@2f661b1a[id=6b00489b-7010-396e-9348-9f32f5167f88,name=test_b_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=b}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> INFO  [main] 2016-11-28 15:32:21,194 DataTracker.java:152 - 
> SSTableIndex.open(column: b, minTerm: value, maxTerm: value, minKey: key, 
> maxKey: key, sstable: BigTableReader(path='/mnt/ssd/tmp/data/data/test/test\
> -229e6380b57711e68407158fde22e121/mc-1-big-Data.db'))
> DEBUG [main] 2016-11-28 15:32:21,195 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: {}
> DEBUG [main] 2016-11-28 15:32:21,195 ColumnFamilyStore.java:895 - Enqueuing 
> flush of IndexInfo: 0.386KiB (0%) on-heap, 0.000KiB (0%) off-heap
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:465 - Writing Memtable-IndexInfo@748981977(0.054KiB serialized 
> bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = (min(-9223\
> 372036854775808), max(9223372036854775807)]
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:494 - Completed flushing 
> /mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db
>  (0.035KiB) for\
>  commitlog position CommitLogPosition(segmentId=1480343535479, position=15652)
> DEBUG [MemtableFlushWriter:1] 2016-11-28 15:32:21,224 
> ColumnFamilyStore.java:1200 - Flushed to 
> [BigTableReader(path='/mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db\
> ')] (1 sstables, 4.838KiB), biggest 4.838KiB, smallest 4.838KiB
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@12f3d291[id=45fcb286-b87a-3d18-a04b-b899a9880c91,name=test_c_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=c}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:121 - to rebuild: index: 
> BigTableReader(path='/mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big-Data.db'),
>  sstable: org.apache.cassa\
> ndra.index.sasi.conf.ColumnIndex@6cbb6b0e
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: 
> 

[jira] [Commented] (CASSANDRA-12962) SASI: Index are rebuilt on restart

2017-03-21 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935171#comment-15935171
 ] 

Corentin Chary commented on CASSANDRA-12962:


Alex: I do not expect to have time to work on that in the next weeks, so feel 
free to take it :)

> SASI: Index are rebuilt on restart
> --
>
> Key: CASSANDRA-12962
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12962
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Priority: Minor
> Fix For: 3.11.x
>
>
> Apparently when cassandra any index that does not index a value in *every* 
> live SSTable gets rebuild. The offending code can be found in the constructor 
> of SASIIndex.
> You can easilly reproduce it:
> {code}
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE test.test (
> a text PRIMARY KEY,
> b text,
> c text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX test_b_idx ON test.test (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> CREATE CUSTOM INDEX test_c_idx ON test.test (c) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> INSERT INTO test.test (a, b) VALUES ('a', 'b');
> {code}
> Log (I added additional traces):
> {code}
> INFO  [main] 2016-11-28 15:32:21,191 ColumnFamilyStore.java:406 - 
> Initializing test.test
> DEBUG [SSTableBatchOpen:1] 2016-11-28 15:32:21,192 SSTableReader.java:505 - 
> Opening 
> /mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big 
> (0.034KiB)
> DEBUG [main] 2016-11-28 15:32:21,194 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@2f661b1a[id=6b00489b-7010-396e-9348-9f32f5167f88,name=test_b_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=b}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> INFO  [main] 2016-11-28 15:32:21,194 DataTracker.java:152 - 
> SSTableIndex.open(column: b, minTerm: value, maxTerm: value, minKey: key, 
> maxKey: key, sstable: BigTableReader(path='/mnt/ssd/tmp/data/data/test/test\
> -229e6380b57711e68407158fde22e121/mc-1-big-Data.db'))
> DEBUG [main] 2016-11-28 15:32:21,195 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: {}
> DEBUG [main] 2016-11-28 15:32:21,195 ColumnFamilyStore.java:895 - Enqueuing 
> flush of IndexInfo: 0.386KiB (0%) on-heap, 0.000KiB (0%) off-heap
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:465 - Writing Memtable-IndexInfo@748981977(0.054KiB serialized 
> bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = (min(-9223\
> 372036854775808), max(9223372036854775807)]
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:494 - Completed flushing 
> /mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db
>  (0.035KiB) for\
>  commitlog position CommitLogPosition(segmentId=1480343535479, position=15652)
> DEBUG [MemtableFlushWriter:1] 2016-11-28 15:32:21,224 
> ColumnFamilyStore.java:1200 - Flushed to 
> [BigTableReader(path='/mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db\
> ')] (1 sstables, 4.838KiB), biggest 4.838KiB, smallest 4.838KiB
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@12f3d291[id=45fcb286-b87a-3d18-a04b-b899a9880c91,name=test_c_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=c}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:121 - to rebuild: index: 
> BigTableReader(path='/mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big-Data.db'),
>  sstable: org.apache.cassa\
> ndra.index.sasi.conf.ColumnIndex@6cbb6b0e
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: 
> 

[jira] [Commented] (CASSANDRA-12962) SASI: Index are rebuilt on restart

2017-03-21 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15935166#comment-15935166
 ] 

Corentin Chary commented on CASSANDRA-12962:


Exact. In my degenerate case I had 64 columns, all indexed but most data was 
sparse. This lead to ~2h rebuilds after each restarts.

> SASI: Index are rebuilt on restart
> --
>
> Key: CASSANDRA-12962
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12962
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Priority: Minor
> Fix For: 3.11.x
>
>
> Apparently when cassandra any index that does not index a value in *every* 
> live SSTable gets rebuild. The offending code can be found in the constructor 
> of SASIIndex.
> You can easilly reproduce it:
> {code}
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE test.test (
> a text PRIMARY KEY,
> b text,
> c text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX test_b_idx ON test.test (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> CREATE CUSTOM INDEX test_c_idx ON test.test (c) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> INSERT INTO test.test (a, b) VALUES ('a', 'b');
> {code}
> Log (I added additional traces):
> {code}
> INFO  [main] 2016-11-28 15:32:21,191 ColumnFamilyStore.java:406 - 
> Initializing test.test
> DEBUG [SSTableBatchOpen:1] 2016-11-28 15:32:21,192 SSTableReader.java:505 - 
> Opening 
> /mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big 
> (0.034KiB)
> DEBUG [main] 2016-11-28 15:32:21,194 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@2f661b1a[id=6b00489b-7010-396e-9348-9f32f5167f88,name=test_b_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=b}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> INFO  [main] 2016-11-28 15:32:21,194 DataTracker.java:152 - 
> SSTableIndex.open(column: b, minTerm: value, maxTerm: value, minKey: key, 
> maxKey: key, sstable: BigTableReader(path='/mnt/ssd/tmp/data/data/test/test\
> -229e6380b57711e68407158fde22e121/mc-1-big-Data.db'))
> DEBUG [main] 2016-11-28 15:32:21,195 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: {}
> DEBUG [main] 2016-11-28 15:32:21,195 ColumnFamilyStore.java:895 - Enqueuing 
> flush of IndexInfo: 0.386KiB (0%) on-heap, 0.000KiB (0%) off-heap
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:465 - Writing Memtable-IndexInfo@748981977(0.054KiB serialized 
> bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = (min(-9223\
> 372036854775808), max(9223372036854775807)]
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:494 - Completed flushing 
> /mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db
>  (0.035KiB) for\
>  commitlog position CommitLogPosition(segmentId=1480343535479, position=15652)
> DEBUG [MemtableFlushWriter:1] 2016-11-28 15:32:21,224 
> ColumnFamilyStore.java:1200 - Flushed to 
> [BigTableReader(path='/mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db\
> ')] (1 sstables, 4.838KiB), biggest 4.838KiB, smallest 4.838KiB
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@12f3d291[id=45fcb286-b87a-3d18-a04b-b899a9880c91,name=test_c_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=c}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:121 - to rebuild: index: 
> BigTableReader(path='/mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big-Data.db'),
>  sstable: org.apache.cassa\
> ndra.index.sasi.conf.ColumnIndex@6cbb6b0e
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:129 - Rebuilding SASI 
> 

[jira] [Created] (CASSANDRA-13338) JMX: EstimatedPartitionCount / SnapshotSize are expensive

2017-03-16 Thread Corentin Chary (JIRA)
Corentin Chary created CASSANDRA-13338:
--

 Summary: JMX: EstimatedPartitionCount / SnapshotSize are expensive
 Key: CASSANDRA-13338
 URL: https://issues.apache.org/jira/browse/CASSANDRA-13338
 Project: Cassandra
  Issue Type: Improvement
  Components: Observability
Reporter: Corentin Chary


EstimatedPartitionCount / EstimatedRowCount / SnapshotSize seem particularly 
expensive. For example on our system 
org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize can take as 
much as half a second.

All this cumulated means that export stats for all your tables (with 
metrics-graphite or jmx_exporter) is going to take quite some time.

We should certainly try to find the most expensive end points and see if there 
is a way to cache some of the values.

cc: [~rgerard]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-11380) Client visible backpressure mechanism

2017-03-16 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927952#comment-15927952
 ] 

Corentin Chary commented on CASSANDRA-11380:


>From my tests I didn't find a way to create a setup were there would be a fair 
>backpressure using this (which is an issue when you have a cluster shared by 
>multiple clients/workloads).

> Client visible backpressure mechanism
> -
>
> Key: CASSANDRA-11380
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11380
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Coordination
>Reporter: Wei Deng
>
> Cassandra currently lacks a sophisticated back pressure mechanism to prevent 
> clients ingesting data at too high throughput. One of the reasons why it 
> hasn't done so is because of its SEDA (Staged Event Driven Architecture) 
> design. With SEDA, an overloaded thread pool can drop those droppable 
> messages (in this case, MutationStage can drop mutation or counter mutation 
> messages) when they exceed the 2-second timeout. This can save the JVM from 
> running out of memory and crash. However, one downside from this kind of 
> load-shedding based backpressure approach is that increased number of dropped 
> mutations will increase the chance of inconsistency among replicas and will 
> likely require more repair (hints can help to some extent, but it's not 
> designed to cover all inconsistencies); another downside is that excessive 
> writes will also introduce much more pressure on compaction (especially LCS), 
>  and backlogged compaction will increase read latency and cause more frequent 
> GC pauses, and depending on the type of compaction, some backlog can take a 
> long time to clear up even after the write is removed. It seems that the 
> current load-shedding mechanism is not adequate to address a common bulk 
> loading scenario, where clients are trying to ingest data at highest 
> throughput possible. We need a more direct way to tell the client drivers to 
> slow down.
> It appears that HBase had suffered similar situation as discussed in 
> HBASE-5162, and they introduced some special exception type to tell the 
> client to slow down when a certain "overloaded" criteria is met. If we can 
> leverage a similar mechanism, our dropped mutation event can be used to 
> trigger such exceptions to push back on the client; at the same time, 
> backlogged compaction (when the number of pending compactions exceeds a 
> certain threshold) can also be used for the push back and this can prevent 
> vicious cycle mentioned in 
> https://issues.apache.org/jira/browse/CASSANDRA-11366?focusedCommentId=15198786=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15198786.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12962) SASI: Index are rebuilt on restart

2017-03-16 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927949#comment-15927949
 ] 

Corentin Chary commented on CASSANDRA-12962:


[~ifesdjeen] any idea why the code was made like that in the first place ?

> SASI: Index are rebuilt on restart
> --
>
> Key: CASSANDRA-12962
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12962
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Priority: Minor
> Fix For: 3.11.x
>
>
> Apparently when cassandra any index that does not index a value in *every* 
> live SSTable gets rebuild. The offending code can be found in the constructor 
> of SASIIndex.
> You can easilly reproduce it:
> {code}
> CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': '1'}  AND durable_writes = true;
> CREATE TABLE test.test (
> a text PRIMARY KEY,
> b text,
> c text
> ) WITH bloom_filter_fp_chance = 0.01
> AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
> 'max_threshold': '32', 'min_threshold': '4'}
> AND compression = {'chunk_length_in_kb': '64', 'class': 
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
> AND crc_check_chance = 1.0
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = '99PERCENTILE';
> CREATE CUSTOM INDEX test_b_idx ON test.test (b) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> CREATE CUSTOM INDEX test_c_idx ON test.test (c) USING 
> 'org.apache.cassandra.index.sasi.SASIIndex';
> INSERT INTO test.test (a, b) VALUES ('a', 'b');
> {code}
> Log (I added additional traces):
> {code}
> INFO  [main] 2016-11-28 15:32:21,191 ColumnFamilyStore.java:406 - 
> Initializing test.test
> DEBUG [SSTableBatchOpen:1] 2016-11-28 15:32:21,192 SSTableReader.java:505 - 
> Opening 
> /mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big 
> (0.034KiB)
> DEBUG [main] 2016-11-28 15:32:21,194 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@2f661b1a[id=6b00489b-7010-396e-9348-9f32f5167f88,name=test_b_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=b}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> INFO  [main] 2016-11-28 15:32:21,194 DataTracker.java:152 - 
> SSTableIndex.open(column: b, minTerm: value, maxTerm: value, minKey: key, 
> maxKey: key, sstable: BigTableReader(path='/mnt/ssd/tmp/data/data/test/test\
> -229e6380b57711e68407158fde22e121/mc-1-big-Data.db'))
> DEBUG [main] 2016-11-28 15:32:21,195 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: {}
> DEBUG [main] 2016-11-28 15:32:21,195 ColumnFamilyStore.java:895 - Enqueuing 
> flush of IndexInfo: 0.386KiB (0%) on-heap, 0.000KiB (0%) off-heap
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:465 - Writing Memtable-IndexInfo@748981977(0.054KiB serialized 
> bytes, 1 ops, 0%/0% of on/off-heap limit), flushed range = (min(-9223\
> 372036854775808), max(9223372036854775807)]
> DEBUG [PerDiskMemtableFlushWriter_0:1] 2016-11-28 15:32:21,204 
> Memtable.java:494 - Completed flushing 
> /mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db
>  (0.035KiB) for\
>  commitlog position CommitLogPosition(segmentId=1480343535479, position=15652)
> DEBUG [MemtableFlushWriter:1] 2016-11-28 15:32:21,224 
> ColumnFamilyStore.java:1200 - Flushed to 
> [BigTableReader(path='/mnt/ssd/tmp/data/data/system/IndexInfo-9f5c6374d48532299a0a5094af9ad1e3/mc-4256-big-Data.db\
> ')] (1 sstables, 4.838KiB), biggest 4.838KiB, smallest 4.838KiB
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:118 - index: 
> org.apache.cassandra.schema.IndexMetadata@12f3d291[id=45fcb286-b87a-3d18-a04b-b899a9880c91,name=test_c_idx,kind=CUSTOM,options={class_name=org.a\
> pache.cassandra.index.sasi.SASIIndex, target=c}], base CFS(Keyspace='test', 
> ColumnFamily='test'), tracker 
> org.apache.cassandra.db.lifecycle.Tracker@15900b83
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:121 - to rebuild: index: 
> BigTableReader(path='/mnt/ssd/tmp/data/data/test/test-229e6380b57711e68407158fde22e121/mc-1-big-Data.db'),
>  sstable: org.apache.cassa\
> ndra.index.sasi.conf.ColumnIndex@6cbb6b0e
> DEBUG [main] 2016-11-28 15:32:21,224 SASIIndex.java:129 - Rebuilding SASI 
> Indexes: 
> 

[jira] [Commented] (CASSANDRA-13189) Use prompt_toolkit in cqlsh

2017-03-16 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927945#comment-15927945
 ] 

Corentin Chary commented on CASSANDRA-13189:


I'll try to add some unit tests and send a more formal patch later this month. 
But if anybody has time to play with it before, feel free to !

> Use prompt_toolkit in cqlsh
> ---
>
> Key: CASSANDRA-13189
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13189
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Minor
> Attachments: cqlsh-prompt-tookit.png
>
>
> prompt_toolkit is an alternative to readline 
> (https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
> lot of software, including the upcomming version of ipython.
> I'm working on an initial version that keeps compatibility with readline, 
> which is available here: 
> https://github.com/iksaif/cassandra/tree/prompt_toolkit
> It's still missing tests and a few things, but I'm opening this for tracking 
> and feedbacks.
> !cqlsh-prompt-tookit.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

2017-03-09 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903086#comment-15903086
 ] 

Corentin Chary commented on CASSANDRA-12915:


LGTM, Thanks for cleaning up, this is way better now

> SASI: Index intersection with an empty range really inefficient
> ---
>
> Key: CASSANDRA-12915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

2017-03-08 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901335#comment-15901335
 ] 

Corentin Chary commented on CASSANDRA-12915:


Looks good now, would be nice to see the results of the CI on this version :)

> SASI: Index intersection with an empty range really inefficient
> ---
>
> Key: CASSANDRA-12915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

2017-03-08 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15900977#comment-15900977
 ] 

Corentin Chary commented on CASSANDRA-12915:


{code} this(range == null ? null : range.min, range == null ? null : range.max, 
range == null ? 0 : range.count);{code}

I think it would be better not to make the assumption that null range == empty 
range. Mostly because it isn't treated the same way in add()

{code} If either range is empty. Empty range is a subrange of (overlaps with) 
any range.{code}

That's not how intersection usually works, shouldn't the result of an empty 
range intersection with anything be an empty range ? (which means that an empty 
range overlaps with nothing)


> SASI: Index intersection with an empty range really inefficient
> ---
>
> Key: CASSANDRA-12915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

2017-03-06 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15898148#comment-15898148
 ] 

Corentin Chary commented on CASSANDRA-12915:


Could you re-phrase the question ? I though I answered everything from [this 
comment|https://issues.apache.org/jira/browse/CASSANDRA-12915?focusedCommentId=15897393=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15897393]
 but it looks like I didn't.

The idea of my approach is that I'm looking for this behavior:

{code}
builder = RangeIntersectionIterator.builder(strategy);
builder.add(new LongIterator(new long[] {}));
builder.add(new LongIterator(new long[] {1}));
range = builder.build();

Assert.assertEquals(0, range.getCount());
Assert.assertFalse(range.hasNext()); // (optimized though isOverlapping() 
returning false
{code}

In other words, adding an empty iterator to a RangeIntersectionIterator should 
make it empty and there is a strong different between an empty and null 
iterator. I believe in your case your empty iterator will just get ignored 
because you need to remove this check: 
https://github.com/ifesdjeen/cassandra/blob/78b1ff630536b0f48787ced74a66d702d13637ba/src/java/org/apache/cassandra/index/sasi/utils/RangeIterator.java#L151

> SASI: Index intersection with an empty range really inefficient
> ---
>
> Key: CASSANDRA-12915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

2017-03-06 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897545#comment-15897545
 ] 

Corentin Chary commented on CASSANDRA-12915:


The fact that you didn't change the following line makes me thing that your 
patch doesn't really do what we need:
Assert.assertEquals(1L, builder.add(new LongIterator(new long[] 
{})).rangeCount());

Empty ranges really should not get ignored, and the changes made in 
https://github.com/ifesdjeen/cassandra/commit/78b1ff630536b0f48787ced74a66d702d13637ba#diff-22e58be2cfd42af959cb63c97de7eb3cR246
 show that the code do not behave like we would like it to.

> SASI: Index intersection with an empty range really inefficient
> ---
>
> Key: CASSANDRA-12915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

2017-03-06 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15897430#comment-15897430
 ] 

Corentin Chary commented on CASSANDRA-12915:


* Removing ranges.isEmpty() happens in another function. Removing it doesn't 
change anything as forEach() will iterate on an empty list.
* True for min() and max(). It's this way for the switch() because computing 
min / max keys with an empty range doesn't make much sense.

Anything else ? If not I'll remove the duplicated code in min() and max()

> SASI: Index intersection with an empty range really inefficient
> ---
>
> Key: CASSANDRA-12915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

2017-02-17 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-12915:
---
Fix Version/s: 4.x
   Status: Patch Available  (was: Open)

> SASI: Index intersection with an empty range really inefficient
> ---
>
> Key: CASSANDRA-12915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 3.11.x, 4.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

2017-02-17 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871958#comment-15871958
 ] 

Corentin Chary commented on CASSANDRA-12915:


{code}
CREATE KEYSPACE test WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': '1'}  AND durable_writes = true;

CREATE TABLE test.test (
r text PRIMARY KEY,
a text,
b text,
c text,
data text
);

CREATE CUSTOM INDEX test_a_idx ON test.test (a) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'true'};
CREATE CUSTOM INDEX test_c_idx ON test.test (c) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'true'};
CREATE CUSTOM INDEX test_b_idx ON test.test (b) USING 
'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = {'analyzer_class': 
'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', 
'case_sensitive': 'true'};
{code}

{code}
$ cat > generate.py
import sys
import random

def main(args):
n = int(args[1])

for i in xrange(n):
a = '0'
b = i % 10
c = i % (n / 10) + random.randint(0, 10)
print ("%d,%s,%d,%d,%d" % (i, a, b, c, i))

if __name__ == '__main__':
main(sys.argv)
$ python generate.py 200 > test.csv
{code}
{code}
COPY test.test FROM 'test.csv'  WITH MAXBATCHSIZE = 100 AND MAXATTEMPTS = 10 
AND MAXINSERTERRORS = 99;
{code}

{code}
cqlsh> SELECT * FROM test.test WHERE a = '1' AND c = '38151' LIMIT 1 ALLOW 
FILTERING;

 r | a | b | c | data
---+---+---+---+--

(0 rows)

Tracing session: fbc23200-f522-11e6-95df-69d39475f5a8

 activity   
   | 
timestamp  | source| source_elapsed | client
---++---++---

Execute CQL3 query | 
2017-02-17 16:08:48.288000 | 127.0.0.1 |  0 | 127.0.0.1
  Parsing SELECT * FROM test.test WHERE a = '1' 
AND c = '38151' LIMIT 1 ALLOW FILTERING; [Native-Transport-Requests-1] | 
2017-02-17 16:08:48.288000 | 127.0.0.1 |268 | 127.0.0.1

 Preparing statement [Native-Transport-Requests-1] | 
2017-02-17 16:08:48.289000 | 127.0.0.1 |513 | 127.0.0.1
 Index mean cardinalities are 
test_a_idx:-9223372036854775808,test_c_idx:-9223372036854775808. Scanning with 
test_a_idx. [Native-Transport-Requests-1] | 2017-02-17 16:08:48.289000 | 
127.0.0.1 |913 | 127.0.0.1

   Computing ranges to query [Native-Transport-Requests-1] | 
2017-02-17 16:08:48.289000 | 127.0.0.1 |   1027 | 127.0.0.1
Submitting range requests on 257 ranges with a concurrency of 1 
(-3.24259165E16 rows per range expected) [Native-Transport-Requests-1] | 
2017-02-17 16:08:48.289001 | 127.0.0.1 |   1319 | 127.0.0.1

   Submitted 1 concurrent range requests [Native-Transport-Requests-1] | 
2017-02-17 16:08:48.29 | 127.0.0.1 |   2229 | 127.0.0.1

  Executing read on test.test using index test_a_idx [ReadStage-3] | 
2017-02-17 16:08:48.292000 | 127.0.0.1 |   3494 | 127.0.0.1

   Read 0 live and 0 tombstone cells [ReadStage-3] | 
2017-02-17 16:08:48.293000 | 127.0.0.1 |   4694 | 127.0.0.1

  Request complete | 
2017-02-17 16:08:48.292930 | 127.0.0.1 |   4930 | 127.0.0.1
{code}


Yay ! No more iterating on the useless index.

Patch is on https://github.com/iksaif/cassandra/tree/sasi-null-intersect


> SASI: Index intersection with an empty range really inefficient
> ---
>
> Key: CASSANDRA-12915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
> Project: Cassandra
>  Issue Type: Improvement
>  Components: 

[jira] [Updated] (CASSANDRA-13189) Use prompt_toolkit in cqlsh

2017-02-17 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13189:
---
Description: 
prompt_toolkit is an alternative to readline 
(https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
lot of software, including the upcomming version of ipython.

I'm working on an initial version that keeps compatibility with readline, which 
is available here: https://github.com/iksaif/cassandra/tree/prompt_toolkit

It's still missing tests and a few things, but I'm opening this for tracking 
and feedbacks.

!cqlsh-prompt-tookit.png|thumbnail!

  was:
prompt_toolkit is an alternative to readline 
(https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
lot of software, including the upcomming version of ipython.

I'm working on an initial version that keeps compatibility with readline, which 
is available here: https://github.com/iksaif/cassandra/tree/prompt_toolkit

It's still missing tests and a few things, but I'm opening this for tracking 
and feedbacks.

!https://issues.apache.org/jira/secure/attachment/12851335/cqlsh-prompt-tookit.png|thumbnail!


> Use prompt_toolkit in cqlsh
> ---
>
> Key: CASSANDRA-13189
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13189
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Minor
> Attachments: cqlsh-prompt-tookit.png
>
>
> prompt_toolkit is an alternative to readline 
> (https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
> lot of software, including the upcomming version of ipython.
> I'm working on an initial version that keeps compatibility with readline, 
> which is available here: 
> https://github.com/iksaif/cassandra/tree/prompt_toolkit
> It's still missing tests and a few things, but I'm opening this for tracking 
> and feedbacks.
> !cqlsh-prompt-tookit.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13189) Use prompt_toolkit in cqlsh

2017-02-17 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871933#comment-15871933
 ] 

Corentin Chary commented on CASSANDRA-13189:


!cqlsh-prompt-tookit.png|thumbnail!

> Use prompt_toolkit in cqlsh
> ---
>
> Key: CASSANDRA-13189
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13189
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Minor
> Attachments: cqlsh-prompt-tookit.png
>
>
> prompt_toolkit is an alternative to readline 
> (https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
> lot of software, including the upcomming version of ipython.
> I'm working on an initial version that keeps compatibility with readline, 
> which is available here: 
> https://github.com/iksaif/cassandra/tree/prompt_toolkit
> It's still missing tests and a few things, but I'm opening this for tracking 
> and feedbacks.
> !cqlsh-prompt-tookit.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13189) Use prompt_toolkit in cqlsh

2017-02-17 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13189:
---
Description: 
prompt_toolkit is an alternative to readline 
(https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
lot of software, including the upcomming version of ipython.

I'm working on an initial version that keeps compatibility with readline, which 
is available here: https://github.com/iksaif/cassandra/tree/prompt_toolkit

It's still missing tests and a few things, but I'm opening this for tracking 
and feedbacks.

!https://issues.apache.org/jira/secure/attachment/12851335/cqlsh-prompt-tookit.png|thumbnail!

  was:
prompt_toolkit is an alternative to readline 
(https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
lot of software, including the upcomming version of ipython.

I'm working on an initial version that keeps compatibility with readline, which 
is available here: https://github.com/iksaif/cassandra/tree/prompt_toolkit

It's still missing tests and a few things, but I'm opening this for tracking 
and feedbacks.

!cqlsh-prompt-toolkit.png!


> Use prompt_toolkit in cqlsh
> ---
>
> Key: CASSANDRA-13189
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13189
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Minor
> Attachments: cqlsh-prompt-tookit.png
>
>
> prompt_toolkit is an alternative to readline 
> (https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
> lot of software, including the upcomming version of ipython.
> I'm working on an initial version that keeps compatibility with readline, 
> which is available here: 
> https://github.com/iksaif/cassandra/tree/prompt_toolkit
> It's still missing tests and a few things, but I'm opening this for tracking 
> and feedbacks.
> !https://issues.apache.org/jira/secure/attachment/12851335/cqlsh-prompt-tookit.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13189) Use prompt_toolkit in cqlsh

2017-02-17 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15871933#comment-15871933
 ] 

Corentin Chary edited comment on CASSANDRA-13189 at 2/17/17 2:48 PM:
-

!cqlsh-prompt-tookit.png!


was (Author: iksaif):
!cqlsh-prompt-tookit.png|thumbnail!

> Use prompt_toolkit in cqlsh
> ---
>
> Key: CASSANDRA-13189
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13189
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Minor
> Attachments: cqlsh-prompt-tookit.png
>
>
> prompt_toolkit is an alternative to readline 
> (https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
> lot of software, including the upcomming version of ipython.
> I'm working on an initial version that keeps compatibility with readline, 
> which is available here: 
> https://github.com/iksaif/cassandra/tree/prompt_toolkit
> It's still missing tests and a few things, but I'm opening this for tracking 
> and feedbacks.
> !cqlsh-prompt-tookit.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-13189) Use prompt_toolkit in cqlsh

2017-02-17 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-13189:
---
Description: 
prompt_toolkit is an alternative to readline 
(https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
lot of software, including the upcomming version of ipython.

I'm working on an initial version that keeps compatibility with readline, which 
is available here: https://github.com/iksaif/cassandra/tree/prompt_toolkit

It's still missing tests and a few things, but I'm opening this for tracking 
and feedbacks.

!cqlsh-prompt-toolkit.png!

  was:
prompt_toolkit is an alternative to readline 
(https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
lot of software, including the upcomming version of ipython.

I'm working on an initial version that keeps compatibility with readline, which 
is available here: https://github.com/iksaif/cassandra/tree/prompt_toolkit

It's still missing tests and a few things, but I'm opening this for tracking 
and feedbacks.


> Use prompt_toolkit in cqlsh
> ---
>
> Key: CASSANDRA-13189
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13189
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Corentin Chary
>Assignee: Corentin Chary
>Priority: Minor
> Attachments: cqlsh-prompt-tookit.png
>
>
> prompt_toolkit is an alternative to readline 
> (https://github.com/jonathanslenders/python-prompt-toolkit) and is used in a 
> lot of software, including the upcomming version of ipython.
> I'm working on an initial version that keeps compatibility with readline, 
> which is available here: 
> https://github.com/iksaif/cassandra/tree/prompt_toolkit
> It's still missing tests and a few things, but I'm opening this for tracking 
> and feedbacks.
> !cqlsh-prompt-toolkit.png!



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CASSANDRA-12915) SASI: Index intersection with an empty range really inefficient

2017-02-17 Thread Corentin Chary (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Corentin Chary updated CASSANDRA-12915:
---
Summary: SASI: Index intersection with an empty range really inefficient  
(was: SASI: Index intersection can be very inefficient)

> SASI: Index intersection with an empty range really inefficient
> ---
>
> Key: CASSANDRA-12915
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12915
> Project: Cassandra
>  Issue Type: Improvement
>  Components: sasi
>Reporter: Corentin Chary
>Assignee: Corentin Chary
> Fix For: 3.11.x
>
>
> It looks like RangeIntersectionIterator.java and be pretty inefficient in 
> some cases. Let's take the following query:
> SELECT data FROM table WHERE index1 = 'foo' AND index2 = 'bar';
> In this case:
> * index1 = 'foo' will match 2 items
> * index2 = 'bar' will match ~300k items
> On my setup, the query will take ~1 sec, most of the time being spent in 
> disk.TokenTree.getTokenAt().
> if I patch RangeIntersectionIterator so that it doesn't try to do the 
> intersection (and effectively only use 'index1') the query will run in a few 
> tenth of milliseconds.
> I see multiple solutions for that:
> * Add a static thresold to avoid the use of the index for the intersection 
> when we know it will be slow. Probably when the range size factor is very 
> small and the range size is big.
> * CASSANDRA-10765



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13038) 33% of compaction time spent in StreamingHistogram.update()

2017-02-13 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865188#comment-15865188
 ] 

Corentin Chary commented on CASSANDRA-13038:


The code and the remaining property looks good to me.
The code of the benchmark could probably be slightly refactored but that's not 
really a big deal.

> 33% of compaction time spent in StreamingHistogram.update()
> ---
>
> Key: CASSANDRA-13038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Jeff Jirsa
> Attachments: compaction-speedup.patch, 
> compaction-streaminghistrogram.png, profiler-snapshot.nps
>
>
> With the following table, that contains a *lot* of cells: 
> {code}
> CREATE TABLE biggraphite.datapoints_11520p_60s (
> metric uuid,
> time_start_ms bigint,
> offset smallint,
> count int,
> value double,
> PRIMARY KEY ((metric, time_start_ms), offset)
> ) WITH CLUSTERING ORDER BY (offset DESC);
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
> 'compaction_window_size': '6', 'compaction_window_unit': 'HOURS', 
> 'max_threshold': '32', 'min_threshold': '6'}
> Keyspace : biggraphite
> Read Count: 1822
> Read Latency: 1.8870054884742042 ms.
> Write Count: 2212271647
> Write Latency: 0.027705127678653473 ms.
> Pending Flushes: 0
> Table: datapoints_11520p_60s
> SSTable count: 47
> Space used (live): 300417555945
> Space used (total): 303147395017
> Space used by snapshots (total): 0
> Off heap memory used (total): 207453042
> SSTable Compression Ratio: 0.4955200053039823
> Number of keys (estimate): 16343723
> Memtable cell count: 220576
> Memtable data size: 17115128
> Memtable off heap memory used: 0
> Memtable switch count: 2872
> Local read count: 0
> Local read latency: NaN ms
> Local write count: 1103167888
> Local write latency: 0.025 ms
> Pending flushes: 0
> Percent repaired: 0.0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 105118296
> Bloom filter off heap memory used: 106547192
> Index summary off heap memory used: 27730962
> Compression metadata off heap memory used: 73174888
> Compacted partition minimum bytes: 61
> Compacted partition maximum bytes: 51012
> Compacted partition mean bytes: 7899
> Average live cells per slice (last five minutes): NaN
> Maximum live cells per slice (last five minutes): 0
> Average tombstones per slice (last five minutes): NaN
> Maximum tombstones per slice (last five minutes): 0
> Dropped Mutations: 0
> {code}
> It looks like a good chunk of the compaction time is lost in 
> StreamingHistogram.update() (which is used to store the estimated tombstone 
> drop times).
> This could be caused by a huge number of different deletion times which would 
> makes the bin huge but it this histogram should be capped to 100 keys. It's 
> more likely caused by the huge number of cells.
> A simple solutions could be to only take into accounts part of the cells, the 
> fact the this table has a TWCS also gives us an additional hint that sampling 
> deletion times would be fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (CASSANDRA-13038) 33% of compaction time spent in StreamingHistogram.update()

2017-02-13 Thread Corentin Chary (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15865188#comment-15865188
 ] 

Corentin Chary edited comment on CASSANDRA-13038 at 2/14/17 6:41 AM:
-

The code and the remaining property looks good to me.
The code of the benchmark could probably be slightly refactored but that's not 
really a big deal.
Thanks for doing it !


was (Author: iksaif):
The code and the remaining property looks good to me.
The code of the benchmark could probably be slightly refactored but that's not 
really a big deal.

> 33% of compaction time spent in StreamingHistogram.update()
> ---
>
> Key: CASSANDRA-13038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13038
> Project: Cassandra
>  Issue Type: Bug
>  Components: Compaction
>Reporter: Corentin Chary
>Assignee: Jeff Jirsa
> Attachments: compaction-speedup.patch, 
> compaction-streaminghistrogram.png, profiler-snapshot.nps
>
>
> With the following table, that contains a *lot* of cells: 
> {code}
> CREATE TABLE biggraphite.datapoints_11520p_60s (
> metric uuid,
> time_start_ms bigint,
> offset smallint,
> count int,
> value double,
> PRIMARY KEY ((metric, time_start_ms), offset)
> ) WITH CLUSTERING ORDER BY (offset DESC);
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.TimeWindowCompactionStrategy', 
> 'compaction_window_size': '6', 'compaction_window_unit': 'HOURS', 
> 'max_threshold': '32', 'min_threshold': '6'}
> Keyspace : biggraphite
> Read Count: 1822
> Read Latency: 1.8870054884742042 ms.
> Write Count: 2212271647
> Write Latency: 0.027705127678653473 ms.
> Pending Flushes: 0
> Table: datapoints_11520p_60s
> SSTable count: 47
> Space used (live): 300417555945
> Space used (total): 303147395017
> Space used by snapshots (total): 0
> Off heap memory used (total): 207453042
> SSTable Compression Ratio: 0.4955200053039823
> Number of keys (estimate): 16343723
> Memtable cell count: 220576
> Memtable data size: 17115128
> Memtable off heap memory used: 0
> Memtable switch count: 2872
> Local read count: 0
> Local read latency: NaN ms
> Local write count: 1103167888
> Local write latency: 0.025 ms
> Pending flushes: 0
> Percent repaired: 0.0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 105118296
> Bloom filter off heap memory used: 106547192
> Index summary off heap memory used: 27730962
> Compression metadata off heap memory used: 73174888
> Compacted partition minimum bytes: 61
> Compacted partition maximum bytes: 51012
> Compacted partition mean bytes: 7899
> Average live cells per slice (last five minutes): NaN
> Maximum live cells per slice (last five minutes): 0
> Average tombstones per slice (last five minutes): NaN
> Maximum tombstones per slice (last five minutes): 0
> Dropped Mutations: 0
> {code}
> It looks like a good chunk of the compaction time is lost in 
> StreamingHistogram.update() (which is used to store the estimated tombstone 
> drop times).
> This could be caused by a huge number of different deletion times which would 
> makes the bin huge but it this histogram should be capped to 100 keys. It's 
> more likely caused by the huge number of cells.
> A simple solutions could be to only take into accounts part of the cells, the 
> fact the this table has a TWCS also gives us an additional hint that sampling 
> deletion times would be fine.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   >