[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044921#comment-15044921 ] Jim Meyer commented on CASSANDRA-10070: --- Wouldn't it be safer if node A checked itself how long it had been down and scheduled its own repairs? Why have node B guess that node A was down? I've seen cases where nodes couldn't communicate so they think the other node is down, when actually both nodes are up. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045336#comment-15045336 ] Jim Meyer commented on CASSANDRA-10070: --- I don't know much about Cassandra internals, so one of the regular devs would know better, buy my thought would be during a restart, somewhere it figures out that it needs to replay part of the commit log to rebuild memtables that hadn't been flushed to disk. The timestamp of the last thing in the commit log might be a good estimate of when the node went down, and you could compare that to the current time to figure out how long the node was down. I wouldn't worry about the second case since it would be hard to get that right. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045350#comment-15045350 ] Jim Meyer commented on CASSANDRA-10070: --- I think this is part of the motivation for building repair scheduling into Cassandra. When we write an external repair scheduler, it has no idea what the state of the cluster is, so it just blindly issues repairs based on a time schedule. > Automatic repair scheduling > --- > > Key: CASSANDRA-10070 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 > Project: Cassandra > Issue Type: Improvement >Reporter: Marcus Olsson >Assignee: Marcus Olsson >Priority: Minor > Fix For: 3.x > > > Scheduling and running repairs in a Cassandra cluster is most often a > required task, but this can both be hard for new users and it also requires a > bit of manual configuration. There are good tools out there that can be used > to simplify things, but wouldn't this be a good feature to have inside of > Cassandra? To automatically schedule and run repairs, so that when you start > up your cluster it basically maintains itself in terms of normal > anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-6246) EPaxos
[ https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980634#comment-14980634 ] Jim Meyer commented on CASSANDRA-6246: -- Does anyone know if this patch will help with CASSANDRA-9328 (i.e. outcome of LWT not reported to client when there is contention). There's a suggestion to that effect in the comments of 9328, but I don't know if anyone has tried running the test code in 9328 to see if this patch has an effect on that issue. Is this patch compatible with rc2 of Cassandra 3.0.0 or does it need to be updated? When is it planned to add epaxos to an official build? Thanks for any info. > EPaxos > -- > > Key: CASSANDRA-6246 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6246 > Project: Cassandra > Issue Type: Improvement >Reporter: Jonathan Ellis >Assignee: Blake Eggleston > Fix For: 3.x > > > One reason we haven't optimized our Paxos implementation with Multi-paxos is > that Multi-paxos requires leader election and hence, a period of > unavailability when the leader dies. > EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, > (2) is particularly useful across multiple datacenters, and (3) allows any > node to act as coordinator: > http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf > However, there is substantial additional complexity involved if we choose to > implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms
[ https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14980613#comment-14980613 ] Jim Meyer commented on CASSANDRA-9328: -- I'm not sure I understand the proposed workaround. For example, suppose two clients are trying to create the same username in a table. If one of them gets a WTE, then they can't do a simple read to see if their insert was successful since the other client may have been the one that created it. So it seems that each client would need to set the data in a unique way, such that it can do a simple read and determine that its specific transaction was applied. For example, the client could set a UUID field as a transaction id, and then on the extra read, check if the UUID matched what they wrote to differentiate their LWT from those of other clients. I guess that would work, but besides being slow, it would waste quite a bit of space to store transaction ID's. Is there a better way? > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms > > > Key: CASSANDRA-9328 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9328 > Project: Cassandra > Issue Type: Bug >Reporter: Aaron Whiteside >Priority: Critical > Fix For: 2.1.x > > Attachments: CassandraLWTTest.java, CassandraLWTTest2.java > > > WriteTimeoutException thrown when LWT concurrency > 1, despite the query > duration taking MUCH less than cas_contention_timeout_in_ms. > Unit test attached, run against a 3 node cluster running 2.1.5. > If you reduce the threadCount to 1, you never see a WriteTimeoutException. If > the WTE is due to not being able to communicate with other nodes, why does > the concurrency >1 cause inter-node communication to fail? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718522#comment-14718522 ] Jim Meyer commented on CASSANDRA-10070: --- Would it be difficult to add an option like that? One of the advantages of building the scheduler into C* is that it could have insight into the state of the cluster and respond to node downtime. It could reduce the consistency gap between the hint window being exceeded and the next regularly scheduled repair for a critical table. Then one could set the hint window smaller, the regular schedule to once a week, and the recovery repair to queue a repair when downtime had exceeded the hint window. Separate question, is the 'parallelism' attribute scalable for large clusters? If I have a 1000 node cluster and want to allow up to 10% of my nodes to run repairs at the same time, how would I specify that? Would that be a system config param or a table level attribute? Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14718527#comment-14718527 ] Jim Meyer commented on CASSANDRA-10070: --- +1 to including. This feature would help with multi-tenancy and extend the idea of tunable consistency since different use cases will have different repair requirements. Individual applications could self serve their repair frequency via the table properties instead of having an administrator guess what frequency is needed. It is a difficult and error prone chore for an application developer to devise a reliable external mechanism for scheduling repairs. It often ends up as a simple cron job that blindly repairs all keyspaces once a day. Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10070) Automatic repair scheduling
[ https://issues.apache.org/jira/browse/CASSANDRA-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14713496#comment-14713496 ] Jim Meyer commented on CASSANDRA-10070: --- This sounds like a very useful feature. I'm wondering what the behavior will be when a node that has been down for a while comes back up. I assume it would see that it is overdue for some repairs and schedule them in a load friendly manner. Now suppose I have a table where consistency is very important. Would I be able to set table attributes to schedule a high priority repair if the node had been down longer than max_hint_window_in_ms, so that it can be made consistent as soon as possible? Or would that still need to be done manually? Automatic repair scheduling --- Key: CASSANDRA-10070 URL: https://issues.apache.org/jira/browse/CASSANDRA-10070 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Marcus Olsson Assignee: Marcus Olsson Priority: Minor Fix For: 3.x Scheduling and running repairs in a Cassandra cluster is most often a required task, but this can both be hard for new users and it also requires a bit of manual configuration. There are good tools out there that can be used to simplify things, but wouldn't this be a good feature to have inside of Cassandra? To automatically schedule and run repairs, so that when you start up your cluster it basically maintains itself in terms of normal anti-entropy, with the possibility for manual configuration. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10074) cqlsh HELP SELECT_EXPR gives outdated incorrect information
Jim Meyer created CASSANDRA-10074: - Summary: cqlsh HELP SELECT_EXPR gives outdated incorrect information Key: CASSANDRA-10074 URL: https://issues.apache.org/jira/browse/CASSANDRA-10074 Project: Cassandra Issue Type: Bug Components: Tools Environment: 3.0.0-alpha1-SNAPSHOT Reporter: Jim Meyer Priority: Trivial Fix For: 3.x Within cqlsh, the HELP SELECT_EXPR states that COUNT is the only function supported by CQL. It is missing a description of the SUM, AVG, MIN, and MAX built in functions. It should probably also mention that user defined functions can be invoked via SELECT. The outdated text is in pylib/cqlshlib/helptopics.py under def help_select_expr -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-9952) UDF with no parameters prevents cqlsh DESCRIBE from working
Jim Meyer created CASSANDRA-9952: Summary: UDF with no parameters prevents cqlsh DESCRIBE from working Key: CASSANDRA-9952 URL: https://issues.apache.org/jira/browse/CASSANDRA-9952 Project: Cassandra Issue Type: Bug Components: Tools Environment: ubuntu 64 bit, using ccm tool with a one node cluster, release 2.2.0 Reporter: Jim Meyer Priority: Minor Fix For: 2.2.x If I create a user defined function that takes no parameters like this: cqlsh:test CREATE FUNCTION no_parm() CALLED ON NULL INPUT RETURNS bigint LANGUAGE java AS 'return System.currentTimeMillis() / 1000L;'; The function works fine in queries, but in cqlsh the describe command stops working: cqlsh:test DESC KEYSPACE test; izip argument #1 must support iteration If I drop the function, then describe starts working normally again. It appears describe assumes there is at least one argument for UDF's, but creating and using the functions does not. -- This message was sent by Atlassian JIRA (v6.3.4#6332)