[jira] [Updated] (CASSANDRA-11052) Cannot use Java 8 lambda expression inside UDF code body
[ https://issues.apache.org/jira/browse/CASSANDRA-11052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-11052: - Attachment: 11052-2.patch > Cannot use Java 8 lambda expression inside UDF code body > > > Key: CASSANDRA-11052 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11052 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: DOAN DuyHai >Assignee: Robert Stupp > Fix For: 3.x > > Attachments: 11052-2.patch, 11052.patch > > > When creating the following **UDF** using Java 8 lambda syntax > {code:sql} > CREATE FUNCTION IF NOT EXISTS music.udf(state map, styles > list) > RETURNS NULL ON NULL INPUT > RETURNS map > LANGUAGE java > AS $$ >styles.forEach((Object o) -> { >String style = (String)o; >if(state.containsKey(style)) { > state.put(style, (Long)state.get(style)+1); >} else { > state.put(style, 1L); >} >}); > >return state; > $$; > {code} > I got the following exception: > {code:java} > Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Could > not compile function 'music.udf' from Java source: > org.apache.cassandra.exceptions.InvalidRequestException: Java source > compilation failed: > Line 2: The type java.util.function.Consumer cannot be resolved. It is > indirectly referenced from required .class files > Line 2: The method forEach(Consumer) from the type Iterable refers to the > missing type Consumer > Line 2: The target type of this expression must be a functional interface > at > com.datastax.driver.core.Responses$Error.asException(Responses.java:136) > at > com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179) > at > com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:184) > at > com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:798) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:617) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) > ... 1 more > {code} > It looks like the compiler requires importing java.util.Consumer but I have > checked the source code and compiler options already support Java 8 source > code so I'm pretty puzzled here ... > /cc [~snazy] --
[jira] [Commented] (CASSANDRA-11052) Cannot use Java 8 lambda expression inside UDF code body
[ https://issues.apache.org/jira/browse/CASSANDRA-11052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136352#comment-15136352 ] Sean Bridges commented on CASSANDRA-11052: -- What is the purpose of UDFByteCodeVerifier? Non java UDFs don't use UDFByteCodeVerifier, so UDFByteCodeVerifier shouldn't be used to enforce policies like not allowing java.lang.invoke or using the common fork join pool. Javascript UDFs can do both now. It seems all policy/security enforcement should be done using ThreadAwareSecurityManager or some other method common to all UDFs. To stop UDFs using the common fork join pool we could set the system property java.util.concurrent.ForkJoinPool.common.threadFactory to a thread factory that never creates new threads. This would disallow using the common pool for the entire jvm though. I can add tests to make sure ThreadAwareSecurityManager does not allow using java.lang.invoke in a malicious way. I can also add tests that make sure ThreadAwareSecurityManagerudfs doesn't allow creating a new ForkJoinPool. > Cannot use Java 8 lambda expression inside UDF code body > > > Key: CASSANDRA-11052 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11052 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: DOAN DuyHai >Assignee: Robert Stupp > Fix For: 3.x > > Attachments: 11052.patch > > > When creating the following **UDF** using Java 8 lambda syntax > {code:sql} > CREATE FUNCTION IF NOT EXISTS music.udf(state map, styles > list) > RETURNS NULL ON NULL INPUT > RETURNS map > LANGUAGE java > AS $$ >styles.forEach((Object o) -> { >String style = (String)o; >if(state.containsKey(style)) { > state.put(style, (Long)state.get(style)+1); >} else { > state.put(style, 1L); >} >}); > >return state; > $$; > {code} > I got the following exception: > {code:java} > Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Could > not compile function 'music.udf' from Java source: > org.apache.cassandra.exceptions.InvalidRequestException: Java source > compilation failed: > Line 2: The type java.util.function.Consumer cannot be resolved. It is > indirectly referenced from required .class files > Line 2: The method forEach(Consumer) from the type Iterable refers to the > missing type Consumer > Line 2: The target type of this expression must be a functional interface > at > com.datastax.driver.core.Responses$Error.asException(Responses.java:136) > at > com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179) > at > com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:184) > at > com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:798) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:617) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at >
[jira] [Commented] (CASSANDRA-11052) Cannot use Java 8 lambda expression inside UDF code body
[ https://issues.apache.org/jira/browse/CASSANDRA-11052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136326#comment-15136326 ] Sean Bridges commented on CASSANDRA-11052: -- Thanks for the feedback, I'll try to limit what is allowed. What are we trying to protect against? Is this to secure cassandra in a multi tenant environment from a malicious tenant, or to stop a user from accidentally causing instability? > Cannot use Java 8 lambda expression inside UDF code body > > > Key: CASSANDRA-11052 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11052 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: DOAN DuyHai >Assignee: Robert Stupp > Fix For: 3.x > > Attachments: 11052.patch > > > When creating the following **UDF** using Java 8 lambda syntax > {code:sql} > CREATE FUNCTION IF NOT EXISTS music.udf(state map, styles > list) > RETURNS NULL ON NULL INPUT > RETURNS map > LANGUAGE java > AS $$ >styles.forEach((Object o) -> { >String style = (String)o; >if(state.containsKey(style)) { > state.put(style, (Long)state.get(style)+1); >} else { > state.put(style, 1L); >} >}); > >return state; > $$; > {code} > I got the following exception: > {code:java} > Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Could > not compile function 'music.udf' from Java source: > org.apache.cassandra.exceptions.InvalidRequestException: Java source > compilation failed: > Line 2: The type java.util.function.Consumer cannot be resolved. It is > indirectly referenced from required .class files > Line 2: The method forEach(Consumer) from the type Iterable refers to the > missing type Consumer > Line 2: The target type of this expression must be a functional interface > at > com.datastax.driver.core.Responses$Error.asException(Responses.java:136) > at > com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179) > at > com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:184) > at > com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:798) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:617) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) >
[jira] [Updated] (CASSANDRA-11052) Cannot use Java 8 lambda expression inside UDF code body
[ https://issues.apache.org/jira/browse/CASSANDRA-11052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-11052: - Attachment: 11052.patch > Cannot use Java 8 lambda expression inside UDF code body > > > Key: CASSANDRA-11052 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11052 > Project: Cassandra > Issue Type: Bug > Components: CQL >Reporter: DOAN DuyHai >Assignee: Robert Stupp > Fix For: 3.x > > Attachments: 11052.patch > > > When creating the following **UDF** using Java 8 lambda syntax > {code:sql} > CREATE FUNCTION IF NOT EXISTS music.udf(state map, styles > list) > RETURNS NULL ON NULL INPUT > RETURNS map > LANGUAGE java > AS $$ >styles.forEach((Object o) -> { >String style = (String)o; >if(state.containsKey(style)) { > state.put(style, (Long)state.get(style)+1); >} else { > state.put(style, 1L); >} >}); > >return state; > $$; > {code} > I got the following exception: > {code:java} > Caused by: com.datastax.driver.core.exceptions.InvalidQueryException: Could > not compile function 'music.udf' from Java source: > org.apache.cassandra.exceptions.InvalidRequestException: Java source > compilation failed: > Line 2: The type java.util.function.Consumer cannot be resolved. It is > indirectly referenced from required .class files > Line 2: The method forEach(Consumer) from the type Iterable refers to the > missing type Consumer > Line 2: The target type of this expression must be a functional interface > at > com.datastax.driver.core.Responses$Error.asException(Responses.java:136) > at > com.datastax.driver.core.DefaultResultSetFuture.onSet(DefaultResultSetFuture.java:179) > at > com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:184) > at > com.datastax.driver.core.RequestHandler.access$2500(RequestHandler.java:43) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.setFinalResult(RequestHandler.java:798) > at > com.datastax.driver.core.RequestHandler$SpeculativeExecution.onSet(RequestHandler.java:617) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:1005) > at > com.datastax.driver.core.Connection$Dispatcher.channelRead0(Connection.java:928) > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:276) > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263) > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:318) > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:304) > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) > at > io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) > at > io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) > at > io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) > at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) > ... 1 more > {code} > It looks like the compiler requires importing java.util.Consumer but I have > checked the source code and compiler options already support Java 8 source > code so I'm pretty puzzled here ... > /cc [~snazy] -- This message was sent by
[jira] [Commented] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14185680#comment-14185680 ] Sean Bridges commented on CASSANDRA-8177: - [~jbellis] Is fix version of 2.1.2 right, I'm not sure if this affects 2.1 sequential repair is much more expensive than parallel repair - Key: CASSANDRA-8177 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges Assignee: Yuki Morishita Fix For: 2.1.2 Attachments: cassc-week.png, iostats.png This is with 2.0.10 The attached graph shows io read/write throughput (as measured with iostat) when doing repairs. The large hump on the left is a sequential repair of one node. The two much smaller peaks on the right are parallel repairs. This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't recommended). Cassandra reports load of 40 gigs. We noticed a similar problem with a larger cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183783#comment-14183783 ] Sean Bridges edited comment on CASSANDRA-8177 at 10/25/14 12:18 AM: {quote} My guess for sequential repair generating lots of IO is that, when reading from snapshot, it is hitting disk for each snapshot SSTable to read its bloom filters, index files etc {quote} When you snapshot you are hardlinking the old and original sstables, they are the same files, so the os cache shouldn't be the difference was (Author: sgbridges): {quote} My guess for sequential repair generating lots of IO is that, when reading from snapshot, it is hitting disk for each snapshot SSTable to read its bloom filters, index files etc {quote} When you snapshot you are hardlinking the old and original sstables, they are the same file, so the os cache shouldn't be the difference sequential repair is much more expensive than parallel repair - Key: CASSANDRA-8177 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges Assignee: Yuki Morishita Attachments: cassc-week.png, iostats.png This is with 2.0.10 The attached graph shows io read/write throughput (as measured with iostat) when doing repairs. The large hump on the left is a sequential repair of one node. The two much smaller peaks on the right are parallel repairs. This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't recommended). Cassandra reports load of 40 gigs. We noticed a similar problem with a larger cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183783#comment-14183783 ] Sean Bridges commented on CASSANDRA-8177: - {quote} My guess for sequential repair generating lots of IO is that, when reading from snapshot, it is hitting disk for each snapshot SSTable to read its bloom filters, index files etc {quote} When you snapshot you are hardlinking the old and original sstables, they are the same file, so the os cache shouldn't be the difference sequential repair is much more expensive than parallel repair - Key: CASSANDRA-8177 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges Assignee: Yuki Morishita Attachments: cassc-week.png, iostats.png This is with 2.0.10 The attached graph shows io read/write throughput (as measured with iostat) when doing repairs. The large hump on the left is a sequential repair of one node. The two much smaller peaks on the right are parallel repairs. This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't recommended). Cassandra reports load of 40 gigs. We noticed a similar problem with a larger cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair
Sean Bridges created CASSANDRA-8177: --- Summary: sequential repair is much more expensive than parallel repair Key: CASSANDRA-8177 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges This is with 2.0.10 The attached graph shows io read/write throughput (as measured with iostat) when doing repairs. The large hump on the left is a sequential repair of one node. The two much smaller peaks on the right are parallel repairs. This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't recommended). Cassandra reports load of 40 gigs. We noticed a similar problem with a larger cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-8177: Attachment: iostats.png sequential repair is much more expensive than parallel repair - Key: CASSANDRA-8177 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges Attachments: iostats.png This is with 2.0.10 The attached graph shows io read/write throughput (as measured with iostat) when doing repairs. The large hump on the left is a sequential repair of one node. The two much smaller peaks on the right are parallel repairs. This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't recommended). Cassandra reports load of 40 gigs. We noticed a similar problem with a larger cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182157#comment-14182157 ] Sean Bridges commented on CASSANDRA-8177: - We can't easily upgrade to 2.1. I don't think this issue is a dupe of CASSANDRA-5220. Looking at the graphs, I think something is quite wrong with sequential or parallel repair. With a 3 node cluster, using sequential shouldn't cause repairs to take 13 times as long, and use a lot more io. sequential repair is much more expensive than parallel repair - Key: CASSANDRA-8177 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges Attachments: iostats.png This is with 2.0.10 The attached graph shows io read/write throughput (as measured with iostat) when doing repairs. The large hump on the left is a sequential repair of one node. The two much smaller peaks on the right are parallel repairs. This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't recommended). Cassandra reports load of 40 gigs. We noticed a similar problem with a larger cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-8177) sequential repair is much more expensive than parallel repair
[ https://issues.apache.org/jira/browse/CASSANDRA-8177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges reopened CASSANDRA-8177: - sequential repair is much more expensive than parallel repair - Key: CASSANDRA-8177 URL: https://issues.apache.org/jira/browse/CASSANDRA-8177 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges Attachments: iostats.png This is with 2.0.10 The attached graph shows io read/write throughput (as measured with iostat) when doing repairs. The large hump on the left is a sequential repair of one node. The two much smaller peaks on the right are parallel repairs. This is a 3 node cluster using vnodes (I know vnodes on small clusters isn't recommended). Cassandra reports load of 40 gigs. We noticed a similar problem with a larger cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6456) log listen address at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13879451#comment-13879451 ] Sean Bridges commented on CASSANDRA-6456: - Sorry, didn't know you were waiting for me. Latest patch looks good to me. log listen address at startup - Key: CASSANDRA-6456 URL: https://issues.apache.org/jira/browse/CASSANDRA-6456 Project: Cassandra Issue Type: Wish Components: Core Reporter: Jeremy Hanna Assignee: Sean Bridges Priority: Trivial Attachments: 6456_v4_trunk.patch, CASSANDRA-6456-2.patch, CASSANDRA-6456-3.patch, CASSANDRA-6456.patch When looking through logs from a cluster, sometimes it's handy to know the address a node is from the logs. It would be convenient if on startup, we indicated the listen address for that node. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6456) log listen address at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-6456: Attachment: CASSANDRA-6456-3.patch log listen address at startup - Key: CASSANDRA-6456 URL: https://issues.apache.org/jira/browse/CASSANDRA-6456 Project: Cassandra Issue Type: Wish Components: Core Reporter: Jeremy Hanna Assignee: Sean Bridges Priority: Trivial Attachments: CASSANDRA-6456-2.patch, CASSANDRA-6456-3.patch, CASSANDRA-6456.patch When looking through logs from a cluster, sometimes it's handy to know the address a node is from the logs. It would be convenient if on startup, we indicated the listen address for that node. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6456) log listen address at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862774#comment-13862774 ] Sean Bridges commented on CASSANDRA-6456: - New patch removes all lines covered by this can go log listen address at startup - Key: CASSANDRA-6456 URL: https://issues.apache.org/jira/browse/CASSANDRA-6456 Project: Cassandra Issue Type: Wish Components: Core Reporter: Jeremy Hanna Assignee: Sean Bridges Priority: Trivial Attachments: CASSANDRA-6456-2.patch, CASSANDRA-6456-3.patch, CASSANDRA-6456.patch When looking through logs from a cluster, sometimes it's handy to know the address a node is from the logs. It would be convenient if on startup, we indicated the listen address for that node. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6456) log listen address at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-6456: Attachment: CASSANDRA-6456-2.patch log listen address at startup - Key: CASSANDRA-6456 URL: https://issues.apache.org/jira/browse/CASSANDRA-6456 Project: Cassandra Issue Type: Wish Components: Core Reporter: Jeremy Hanna Assignee: Sean Bridges Priority: Trivial Attachments: CASSANDRA-6456-2.patch, CASSANDRA-6456.patch When looking through logs from a cluster, sometimes it's handy to know the address a node is from the logs. It would be convenient if on startup, we indicated the listen address for that node. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6456) log listen address at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861183#comment-13861183 ] Sean Bridges commented on CASSANDRA-6456: - New patch attached. {quote} I think we should change the format to a single line (helps when grep'ing) to a single line (see this gist) {quote} Changed to log on a single line with slightly modified format to be consistent with other log lines. {quote} For the original intent of this JIRA I think we need to add a call to get address or something. As the IP's in the yaml can be left blank. {quote} I added a line to log InetAddress.getLocalHost() on startup in case listen address is not set {quote} I think this makes some ad-hoc config logging redundant as well? {quote} A couple of log lines were removed with the original patch, let me know if there are more to remove. log listen address at startup - Key: CASSANDRA-6456 URL: https://issues.apache.org/jira/browse/CASSANDRA-6456 Project: Cassandra Issue Type: Wish Components: Core Reporter: Jeremy Hanna Assignee: Sean Bridges Priority: Trivial Attachments: CASSANDRA-6456-2.patch, CASSANDRA-6456.patch When looking through logs from a cluster, sometimes it's handy to know the address a node is from the logs. It would be convenient if on startup, we indicated the listen address for that node. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (CASSANDRA-6456) log listen address at startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-6456: Attachment: CASSANDRA-6456.patch This patch logs all config settings on startup, excepting some settings which may contain passwords log listen address at startup - Key: CASSANDRA-6456 URL: https://issues.apache.org/jira/browse/CASSANDRA-6456 Project: Cassandra Issue Type: Wish Components: Core Reporter: Jeremy Hanna Assignee: Jeremy Hanna Priority: Trivial Attachments: CASSANDRA-6456.patch When looking through logs from a cluster, sometimes it's handy to know the address a node is from the logs. It would be convenient if on startup, we indicated the listen address for that node. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5293) formalize that timestamps are epoch-in-micros in 2.0
[ https://issues.apache.org/jira/browse/CASSANDRA-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620144#comment-13620144 ] Sean Bridges commented on CASSANDRA-5293: - This will break a lot of our code as we use non epoch-in-micro values as timestamps quite a bit. It is very handy for ensuring order when you have another monotonically increasing id available. As an example we compute meta data for versioned objects, and store the meta data in cassandra. The version id is a monotonically increasing long, and we write the meta data to cassandra with a timestamp of the version id. Due to retries, multiple machines may be processing the same object with different version ids, but since we always write to cassandra with a timestamp of the version id, the latest version id always wins. We have a couple other use cases, but having a user set timestamp that does not have to be an epoch-in-micros is very useful. If you want a real timestamp, perhaps it is better to add a new timestamp-micros field which is set by the co-ordinator, and not visible to thrift/cql. formalize that timestamps are epoch-in-micros in 2.0 Key: CASSANDRA-5293 URL: https://issues.apache.org/jira/browse/CASSANDRA-5293 Project: Cassandra Issue Type: Task Components: Core Reporter: Jonathan Ellis Fix For: 2.0 We've worked around don't assume timestamps are actually timestamps but the utility is not worth the complexity and lost opportunities to optimize this imposes. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-5392) cassandra-all 1.2.0 pom missing netty dependency
Sean Bridges created CASSANDRA-5392: --- Summary: cassandra-all 1.2.0 pom missing netty dependency Key: CASSANDRA-5392 URL: https://issues.apache.org/jira/browse/CASSANDRA-5392 Project: Cassandra Issue Type: Bug Components: Packaging Affects Versions: 1.2.3 Reporter: Sean Bridges Fix For: 1.2.4 It seems that cassandra depends on netty now, however the pom excludes this dependency. This was previously reported as CASSANDRA-5181, but the fix for 5181 added netty to the dependency-management section of the pom, not the depencies section -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-5392) cassandra-all 1.2.0 pom missing netty dependency
[ https://issues.apache.org/jira/browse/CASSANDRA-5392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-5392: Attachment: CASSANDRA-5392.txt cassandra-all 1.2.0 pom missing netty dependency Key: CASSANDRA-5392 URL: https://issues.apache.org/jira/browse/CASSANDRA-5392 Project: Cassandra Issue Type: Bug Components: Packaging Affects Versions: 1.2.3 Reporter: Sean Bridges Fix For: 1.2.4 Attachments: CASSANDRA-5392.txt It seems that cassandra depends on netty now, however the pom excludes this dependency. This was previously reported as CASSANDRA-5181, but the fix for 5181 added netty to the dependency-management section of the pom, not the depencies section -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023233#comment-13023233 ] Sean Bridges commented on CASSANDRA-2494: - I think the guarantee of quorum reads not seeing old writes once a quorum read sees a new write is very useful. I suspect most people already think that this guarantee occurs, including, it seems, Jonathan Ellis whose quote can be found in the email thread linked to in the bug, The important guarantee this gives you is that once one quorum read sees the new value, all others will too. You can't see the newest version, then see an older version on a subsequent write [sic, I assume he meant read], which is the characteristic of non-strong consistency Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023242#comment-13023242 ] Sean Bridges commented on CASSANDRA-2494: - To be clear, this is a new guarantee. The current guarantee is R+WN gives you consistency. This bug is asking that a quorum read of A means that A has been committed to a quorum of nodes. How can we ensure the quorum read property that you want ? If when reading at quorum, and no quorum can be found which agrees on a particular value, then the coordinator (?) will wait for acks of read repair writes (or perhaps just do normal writes) to be returned from a sufficient number of nodes to ensure that the value has been committed to a quorum of nodes. Without this new guarantee it is hard for readers to function correctly. The reader does not know that the quorum write failed, or is still in progress, so without reading at ALL, the R+WN guarantee does not help the reader. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13023242#comment-13023242 ] Sean Bridges edited comment on CASSANDRA-2494 at 4/22/11 3:23 PM: -- To be clear, this is a new guarantee. The current guarantee is R+WN gives you consistency. This bug is asking that a successful quorum read of A means that A has been committed to a quorum of nodes. How can we ensure the quorum read property that you want ? If when reading at quorum, and no quorum can be found which agrees on a particular value, then the coordinator ( ? ) will wait for acks of read repair writes (or perhaps just do normal writes) to be returned from a sufficient number of nodes to ensure that the value has been committed to a quorum of nodes. Without this new guarantee it is hard for readers to function correctly. The reader does not know that the quorum write failed, or is still in progress, so without reading at ALL, the R+WN guarantee does not help the reader. was (Author: sbridges): To be clear, this is a new guarantee. The current guarantee is R+WN gives you consistency. This bug is asking that a quorum read of A means that A has been committed to a quorum of nodes. How can we ensure the quorum read property that you want ? If when reading at quorum, and no quorum can be found which agrees on a particular value, then the coordinator (?) will wait for acks of read repair writes (or perhaps just do normal writes) to be returned from a sufficient number of nodes to ensure that the value has been committed to a quorum of nodes. Without this new guarantee it is hard for readers to function correctly. The reader does not know that the quorum write failed, or is still in progress, so without reading at ALL, the R+WN guarantee does not help the reader. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2494) Quorum reads are not consistent
Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html If we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3, Quorum reads should be consistent. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-2494: Description: As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. was: As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html If we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3, Quorum reads should be consistent. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2494) Quorum reads are not consistent
[ https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13020898#comment-13020898 ] Sean Bridges commented on CASSANDRA-2494: - Peter Shuller wrote, However, it sounds like what is being asked for is not that they don't propagate in the event of a write failure, but just that reads don't see the writes until they are sufficiently propagated to guarantee that any future QUORUM read will also see the data. Yes, that is the issue. The comment in the bug about writing at ONE and reading at QUORUM is just a way of testing this new guarantee in a distributed test, if Cassandra has those. Quorum reads are not consistent --- Key: CASSANDRA-2494 URL: https://issues.apache.org/jira/browse/CASSANDRA-2494 Project: Cassandra Issue Type: Bug Reporter: Sean Bridges As discussed in this thread, http://www.mail-archive.com/user@cassandra.apache.org/msg12421.html Quorum reads should be consistent. Assume we have a cluster of 3 nodes (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but not Y and Z, then a read from X should not return N unless the read is committed to at least two nodes. To ensure this, a read from X should wait for an ack of the read repair write from either Y or Z before returning. Are there system tests for cassandra? If so, there should be a test similar to the original post in the email thread. One thread should write 1,2,3... at consistency level ONE. Another thread should read at consistency level QUORUM from a random host, and verify that each read is = the last read. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-1187) make the number of compaction threads configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-1187: Attachment: CASSANDRA-1187-2.patch Is this what you were thinking of? The patch adds a new ConcurrentCompactedRow which can read columns from multiple SSTables in parallell. I'm not sure how much parallelism this patch gives. For the case where two SSTables have no rows in common, there is no benefit. Trying to read from multiple rows in parallell seems like it would get messy. make the number of compaction threads configurable -- Key: CASSANDRA-1187 URL: https://issues.apache.org/jira/browse/CASSANDRA-1187 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Sean Bridges Attachments: CASSANDRA-1187-2.patch, CASSANDRA-1187.patch On our test machines, compaction is the limiting factor when we are writing to Cassandra. It's easy to write to Cassandra faster than the single compaction thread can keep up, leading to a large number of sstables. In one extreme example, we inserted a TB of data into a single cassandra node overnight, and ended up with 100,000 sstables, which took another two days to finish compacting. If the number of compaction threads was configurable, we could tune cassandra to support a higher write workload. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-912) First-class commandline interface
[ https://issues.apache.org/jira/browse/CASSANDRA-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-912: --- Attachment: CASSANDRA-912-2.patch.txt rebased previous patch to trunk First-class commandline interface - Key: CASSANDRA-912 URL: https://issues.apache.org/jira/browse/CASSANDRA-912 Project: Cassandra Issue Type: Improvement Components: Tools Affects Versions: 0.6 Reporter: Eric Evans Fix For: 0.7 Attachments: CASSANDRA-912-2.patch.txt, CASSANDRA-912.patch While a useful tool for education and simple tests, cassandra-cli is ultimately limted by the fact that column names and values are binary, (and eventually keys will be as well, see CASSANDRA-767). The current approach when writing consists of encoding column names as UTF8, and passing the value as a byte[] of the String parsed from the command. When performing a read, the column names outputted are the result of the toString() method of the comparator (the result of which is not always meaningful), and values are again treated as raw strings. This is almost certainly broken anywhere that the CF comparator is not UTF8Type and values are anything but strings. One possible approach would be to follow HBase's lead and simply allow binary values to be encoded as strings (see: http://wiki.apache.org/hadoop/Hbase/Shell). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-1187) make the number of compaction threads configurable
[ https://issues.apache.org/jira/browse/CASSANDRA-1187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-1187: Attachment: CASSANDRA-1187.patch This patch allows setting the number of threads used in compaction. A queue is created for each column family, and only one compaction thread is allowed to compact a column family at a time. make the number of compaction threads configurable -- Key: CASSANDRA-1187 URL: https://issues.apache.org/jira/browse/CASSANDRA-1187 Project: Cassandra Issue Type: Improvement Components: Core Affects Versions: 0.6.1 Reporter: Sean Bridges Attachments: CASSANDRA-1187.patch On our test machines, compaction is the limiting factor when we are writing to Cassandra. It's easy to write to Cassandra faster than the single compaction thread can keep up, leading to a large number of sstables. In one extreme example, we inserted a TB of data into a single cassandra node overnight, and ended up with 100,000 sstables, which took another two days to finish compacting. If the number of compaction threads was configurable, we could tune cassandra to support a higher write workload. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (CASSANDRA-912) First-class commandline interface
[ https://issues.apache.org/jira/browse/CASSANDRA-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Bridges updated CASSANDRA-912: --- Attachment: CASSANDRA-912.patch First-class commandline interface - Key: CASSANDRA-912 URL: https://issues.apache.org/jira/browse/CASSANDRA-912 Project: Cassandra Issue Type: Improvement Components: Tools Affects Versions: 0.6 Reporter: Eric Evans Attachments: CASSANDRA-912.patch While a useful tool for education and simple tests, cassandra-cli is ultimately limted by the fact that column names and values are binary, (and eventually keys will be as well, see CASSANDRA-767). The current approach when writing consists of encoding column names as UTF8, and passing the value as a byte[] of the String parsed from the command. When performing a read, the column names outputted are the result of the toString() method of the comparator (the result of which is not always meaningful), and values are again treated as raw strings. This is almost certainly broken anywhere that the CF comparator is not UTF8Type and values are anything but strings. One possible approach would be to follow HBase's lead and simply allow binary values to be encoded as strings (see: http://wiki.apache.org/hadoop/Hbase/Shell). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.