[jira] [Commented] (CASSANDRA-8032) User based request scheduler
[ https://issues.apache.org/jira/browse/CASSANDRA-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155395#comment-14155395 ] Mck SembWever commented on CASSANDRA-8032: -- Here's a quick initial [attempt|https://github.com/michaelsembwever/cassandra/commit/4516f635b923763155c524b04235a6aa39e2e5a3]. Looks like this could be but two lines of code. But the unit tests… h… I give this more testing and post an update. User based request scheduler Key: CASSANDRA-8032 URL: https://issues.apache.org/jira/browse/CASSANDRA-8032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Mck SembWever Priority: Minor Today only a keyspace based request scheduler exists. Post CASSANDRA-4898 it could be possible to implement a request_scheduler based on users (from system_auth.credentials) rather than keyspaces. This could offer a finer granularity of control, from read-only vs read-write users on keyspaces, to application dedicated vs ad-hoc users. Alternatively it could also offer a granularity larger and easier to work with than per keyspace. The request scheduler is a useful concept but i think that setups with enough nodes often favour separate clusters rather than either creating separate virtual datacenters or using the request scheduler. To give the request scheduler another, and more flexible, implementation could especially help those users that don't yet have enough nodes to warrant separate clusters, or even separate virtual datacenters. On such smaller clusters cassandra can still be seen as an unstable technology because poor consumers/schemas can easily affect, even bring down, a whole cluster. I haven't look into the feasibility of this within the code, but it comes to mind as rather simple, and i would be interested in offering a patch if the idea carries validity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-8032) User based request scheduler
[ https://issues.apache.org/jira/browse/CASSANDRA-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155395#comment-14155395 ] Mck SembWever edited comment on CASSANDRA-8032 at 10/1/14 8:10 PM: --- Here's a quick initial [attempt|https://github.com/michaelsembwever/cassandra/commit/c1f87ad3be011444d6f9d15f7d8e9e1014244cf3]. Looks like this could be but two lines of code. But the unit tests… h… I give this more testing and post an update. was (Author: michaelsembwever): Here's a quick initial [attempt|https://github.com/michaelsembwever/cassandra/commit/4516f635b923763155c524b04235a6aa39e2e5a3]. Looks like this could be but two lines of code. But the unit tests… h… I give this more testing and post an update. User based request scheduler Key: CASSANDRA-8032 URL: https://issues.apache.org/jira/browse/CASSANDRA-8032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Mck SembWever Priority: Minor Today only a keyspace based request scheduler exists. Post CASSANDRA-4898 it could be possible to implement a request_scheduler based on users (from system_auth.credentials) rather than keyspaces. This could offer a finer granularity of control, from read-only vs read-write users on keyspaces, to application dedicated vs ad-hoc users. Alternatively it could also offer a granularity larger and easier to work with than per keyspace. The request scheduler is a useful concept but i think that setups with enough nodes often favour separate clusters rather than either creating separate virtual datacenters or using the request scheduler. To give the request scheduler another, and more flexible, implementation could especially help those users that don't yet have enough nodes to warrant separate clusters, or even separate virtual datacenters. On such smaller clusters cassandra can still be seen as an unstable technology because poor consumers/schemas can easily affect, even bring down, a whole cluster. I haven't look into the feasibility of this within the code, but it comes to mind as rather simple, and i would be interested in offering a patch if the idea carries validity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-8032) User based request scheduler
Mck SembWever created CASSANDRA-8032: Summary: User based request scheduler Key: CASSANDRA-8032 URL: https://issues.apache.org/jira/browse/CASSANDRA-8032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Mck SembWever Priority: Minor Today only a keyspace based request scheduler exists. Post CASSANDRA-4898 it could be possible to implement a request_scheduler based on users (from system_auth.credentials) rather than keyspaces. This could offer a finer granularity of control, from read-only vs read-write users on keyspaces, to application dedicated vs ad-hoc users. Alternatively it could also offer a granularity larger and easier to work with than per keyspace. The request scheduler is a useful concept but i think that setups with enough nodes often favour separate clusters rather than either creating separate virtual datacenters or using the request scheduler. To give the request scheduler another, and more flexible, implementation could especially help those users that don't yet have enough nodes to warrant separate clusters, or even separate virtual datacenters. On such smaller clusters cassandra can still be seen as an unstable technology because poor consumers/schemas can easily affect, even bring down, a whole cluster. I haven't look into the feasibility of this within the code, but it comes to mind as rather simple, and i would be interested in offering a patch if the idea carries validity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8032) User based request scheduler
[ https://issues.apache.org/jira/browse/CASSANDRA-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154380#comment-14154380 ] Mck SembWever commented on CASSANDRA-8032: -- Oh if it's already implemented i've missed it. Looking a little deeper into trunk, cassandra.yaml still reports that only keyspace is available as a request_scheduler_id, and looking into ThriftClientState i can only see support for a scheduling value based on keyspace. User based request scheduler Key: CASSANDRA-8032 URL: https://issues.apache.org/jira/browse/CASSANDRA-8032 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Mck SembWever Priority: Minor Today only a keyspace based request scheduler exists. Post CASSANDRA-4898 it could be possible to implement a request_scheduler based on users (from system_auth.credentials) rather than keyspaces. This could offer a finer granularity of control, from read-only vs read-write users on keyspaces, to application dedicated vs ad-hoc users. Alternatively it could also offer a granularity larger and easier to work with than per keyspace. The request scheduler is a useful concept but i think that setups with enough nodes often favour separate clusters rather than either creating separate virtual datacenters or using the request scheduler. To give the request scheduler another, and more flexible, implementation could especially help those users that don't yet have enough nodes to warrant separate clusters, or even separate virtual datacenters. On such smaller clusters cassandra can still be seen as an unstable technology because poor consumers/schemas can easily affect, even bring down, a whole cluster. I haven't look into the feasibility of this within the code, but it comes to mind as rather simple, and i would be interested in offering a patch if the idea carries validity. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-6332) Cassandra startup failure: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
[ https://issues.apache.org/jira/browse/CASSANDRA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895873#comment-13895873 ] Mck SembWever edited comment on CASSANDRA-6332 at 2/10/14 9:06 AM: --- being a dev environment it's pretty much an open playground so it could well have been that without us knowing about it. (i've been away the past 2 months but will try and chase up if this was the case…) update: yes a table of the same name was dropped and then created again with a different definition. this happened in a timeframe of 15 minutes or less… was (Author: michaelsembwever): being a dev environment it's pretty much an open playground so it could well have been that without us knowing about it. (i've been away the past 2 months but will try and chase up if this was the case…) Cassandra startup failure: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection -- Key: CASSANDRA-6332 URL: https://issues.apache.org/jira/browse/CASSANDRA-6332 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 Cassandra 2.0.1 Reporter: Prateek Priority: Critical The cassandra node fails to startup with the following error message. This is currently impacting availability of our production cluster so your quick response is highly appreciated. ERROR 22:58:26,046 Exception encountered during startup java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:411) at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:400) at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:273) at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:96) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:146) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:126) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:299) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:407) ... 8 more Caused by: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection at org.apache.cassandra.db.marshal.ColumnToCollectionType.compareCollectionMembers(ColumnToCollectionType.java:72) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) at edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) at edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) at edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1192) at edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) at edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) at org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:323) at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195) at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196) at org.apache.cassandra.db.Memtable.put(Memtable.java:160) at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338) at org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:265) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6332) Cassandra startup failure: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
[ https://issues.apache.org/jira/browse/CASSANDRA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895873#comment-13895873 ] Mck SembWever commented on CASSANDRA-6332: -- being a dev environment it's pretty much an open playground so it could well have been that without us knowing about it. (i've been away the past 2 months but will try and chase up if this was the case…) Cassandra startup failure: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection -- Key: CASSANDRA-6332 URL: https://issues.apache.org/jira/browse/CASSANDRA-6332 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 Cassandra 2.0.1 Reporter: Prateek Priority: Critical The cassandra node fails to startup with the following error message. This is currently impacting availability of our production cluster so your quick response is highly appreciated. ERROR 22:58:26,046 Exception encountered during startup java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:411) at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:400) at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:273) at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:96) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:146) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:126) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:299) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:407) ... 8 more Caused by: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection at org.apache.cassandra.db.marshal.ColumnToCollectionType.compareCollectionMembers(ColumnToCollectionType.java:72) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) at edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) at edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) at edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1192) at edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) at edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) at org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:323) at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195) at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196) at org.apache.cassandra.db.Memtable.put(Memtable.java:160) at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338) at org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:265) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6332) Cassandra startup failure: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
[ https://issues.apache.org/jira/browse/CASSANDRA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894382#comment-13894382 ] Mck SembWever commented on CASSANDRA-6332: -- No recipe to reproduce. But on a 1.2.9 non-prod cluster we came across this problem. The above exception occurs while reading commitlogs. Removing the commitlogs was a workaround. Cassandra startup failure: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection -- Key: CASSANDRA-6332 URL: https://issues.apache.org/jira/browse/CASSANDRA-6332 Project: Cassandra Issue Type: Bug Environment: Ubuntu 12.04 Cassandra 2.0.1 Reporter: Prateek Priority: Critical The cassandra node fails to startup with the following error message. This is currently impacting availability of our production cluster so your quick response is highly appreciated. ERROR 22:58:26,046 Exception encountered during startup java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:411) at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:400) at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:273) at org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:96) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:146) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:126) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:299) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:407) ... 8 more Caused by: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection at org.apache.cassandra.db.marshal.ColumnToCollectionType.compareCollectionMembers(ColumnToCollectionType.java:72) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85) at org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) at edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) at edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) at edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1192) at edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) at edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) at org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:323) at org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195) at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196) at org.apache.cassandra.db.Memtable.put(Memtable.java:160) at org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373) at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338) at org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:265) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5201) Cassandra/Hadoop does not support current Hadoop releases
[ https://issues.apache.org/jira/browse/CASSANDRA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805107#comment-13805107 ] Mck SembWever commented on CASSANDRA-5201: -- Hadoop-2 only just came out of alpha/beta with hadoop-2.2.0 Cassandra/Hadoop does not support current Hadoop releases - Key: CASSANDRA-5201 URL: https://issues.apache.org/jira/browse/CASSANDRA-5201 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.2.0 Reporter: Brian Jeltema Assignee: Dave Brosius Attachments: 5201_a.txt Using Hadoop 0.22.0 with Cassandra results in the stack trace below. It appears that version 0.21+ changed org.apache.hadoop.mapreduce.JobContext from a class to an interface. Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at MyHadoopApp.run(MyHadoopApp.java:163) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at MyHadoopApp.main(MyHadoopApp.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5883) Switch to Logback
[ https://issues.apache.org/jira/browse/CASSANDRA-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802649#comment-13802649 ] Mck SembWever commented on CASSANDRA-5883: -- No, most people don't log that much. But it's nice to know that you can, if you have to, turn all loggers to DEBUG, or even TRACE, in production without hurting performance. With old-log4j and logback this often isn't possible. The performance isn't an across the board improvement, as far as i've understood it, but just an improvement that comes from removing all the little contentions from synchronised blocks. I can imagine that this appeals to the c* community, even if it's of little practical meaning for normal production c* Log4j2 also provides the auto-reload of configuration files. Other features it offers is at http://logging.apache.org/log4j/2.x/ Switch to Logback - Key: CASSANDRA-5883 URL: https://issues.apache.org/jira/browse/CASSANDRA-5883 Project: Cassandra Issue Type: Improvement Components: Core, Tools Reporter: Jonathan Ellis Assignee: Dave Brosius Priority: Minor Fix For: 2.1 Attachments: 0001-Additional-migration-to-logback.patch, 5883-1.txt, 5883-additional1.txt, 5883.txt Logback has a number of advantages over log4j, and switching will be straightforward since we are already using the slf4j translation layer: http://logback.qos.ch/reasonsToSwitch.html -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Comment Edited] (CASSANDRA-5201) Cassandra/Hadoop does not support current Hadoop releases
[ https://issues.apache.org/jira/browse/CASSANDRA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792563#comment-13792563 ] Mck SembWever edited comment on CASSANDRA-5201 at 10/13/13 5:37 PM: I've updated the github project so to be a [patch|https://github.com/michaelsembwever/cassandra-hadoop/commit/6d7555ea205354a606907e40c16db35072004594] off the InputFormat and OutputFormat classes as found in cassandra-1.2.10 It works against hadoop-0.22.0 was (Author: michaelsembwever): I've updated the github project so to be a patch off the InputFormat and OutputFormat classes as found in cassandra-1.2.10 It works against hadoop-0.22.0 Cassandra/Hadoop does not support current Hadoop releases - Key: CASSANDRA-5201 URL: https://issues.apache.org/jira/browse/CASSANDRA-5201 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.2.0 Reporter: Brian Jeltema Assignee: Dave Brosius Attachments: 5201_a.txt Using Hadoop 0.22.0 with Cassandra results in the stack trace below. It appears that version 0.21+ changed org.apache.hadoop.mapreduce.JobContext from a class to an interface. Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at MyHadoopApp.run(MyHadoopApp.java:163) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at MyHadoopApp.main(MyHadoopApp.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (CASSANDRA-5883) Switch to Logback
[ https://issues.apache.org/jira/browse/CASSANDRA-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762916#comment-13762916 ] Mck SembWever commented on CASSANDRA-5883: -- Has log4j2 been considered? Both log4j (old) and logback has awful java code in its internals using synchronised blocks/methods. log4j2 seems to take a large step forward here and ensures an application won't lock up in the same way a log4j/logback application can. Such contention locks are not unusual once you increase logging in any high concurrent application. log4j2 can be more than 1000x times faster… http://logging.apache.org/log4j/2.x/manual/async.html#Performance Switch to Logback - Key: CASSANDRA-5883 URL: https://issues.apache.org/jira/browse/CASSANDRA-5883 Project: Cassandra Issue Type: Bug Components: Core, Tools Reporter: Jonathan Ellis Assignee: Dave Brosius Priority: Minor Fix For: 2.1 Attachments: 0001-Additional-migration-to-logback.patch, 5883-1.txt, 5883-additional1.txt, 5883.txt Logback has a number of advantages over log4j, and switching will be straightforward since we are already using the slf4j translation layer: http://logback.qos.ch/reasonsToSwitch.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667902#comment-13667902 ] Mck SembWever commented on CASSANDRA-2388: -- {quote}The biggest problem is [avoiding endpoints in a different DC]. Maybe the way todo this is change getSplits logic to never return replicas in another DC. I think this would require adding DC info to the describe_ring call{quote} Tasktrackers may have access to a set of datacenters, so this DC info needs contain a list of DCs. For example, our setup separates datacenters by physical datacenter and hadoop-usage, like:{noformat}DC1 Production + Hadoop c*01 c*03 DC2 Production + Hadoop c*02 c*04 DC3 Production c*05 DC4 Production c*06{noformat} So here we'd pass to getSplits() a DC info like DC1,DC2. But the problem remain, given a task executing on c*01 that fails to connect to localhost, although we can now prevent a connection to DC3 or DC4, we can't favour a connection to any other split in DC1 over anything in DC2. Is this solvable? ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.6 Reporter: Eldon Stegall Assignee: Mck SembWever Priority: Minor Labels: hadoop, inputformat Fix For: 1.2.6 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5201) Cassandra/Hadoop does not support current Hadoop releases
[ https://issues.apache.org/jira/browse/CASSANDRA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661509#comment-13661509 ] Mck SembWever commented on CASSANDRA-5201: -- What about simply putting the hadoop2 package into a github project? it would become available for others to use, and c* can switch to it when they feel ready to drop support for hadoop-0.20 otherwise i'm in favour of separate jar files (apache-cassandra-hadoop-legacy-XXX.jar and apache-cassandra-hadoop-XXX.jar). c* already bundles too much into the one jar file IMHO. Cassandra/Hadoop does not support current Hadoop releases - Key: CASSANDRA-5201 URL: https://issues.apache.org/jira/browse/CASSANDRA-5201 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.2.0 Reporter: Brian Jeltema Assignee: Dave Brosius Attachments: 5201_a.txt Using Hadoop 0.22.0 with Cassandra results in the stack trace below. It appears that version 0.21+ changed org.apache.hadoop.mapreduce.JobContext from a class to an interface. Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at MyHadoopApp.run(MyHadoopApp.java:163) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at MyHadoopApp.main(MyHadoopApp.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-5201) Cassandra/Hadoop does not support current Hadoop releases
[ https://issues.apache.org/jira/browse/CASSANDRA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661512#comment-13661512 ] Mck SembWever commented on CASSANDRA-5201: -- {quote}What about simply putting the hadoop2 package into a github project?{quote} Done @ https://github.com/michaelsembwever/cassandra-hadoop (i refactor the new package to hadoop1 instead of hadoop2, to better match the hadoop version we are actually supporting). Cassandra/Hadoop does not support current Hadoop releases - Key: CASSANDRA-5201 URL: https://issues.apache.org/jira/browse/CASSANDRA-5201 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 1.2.0 Reporter: Brian Jeltema Assignee: Dave Brosius Attachments: 5201_a.txt Using Hadoop 0.22.0 with Cassandra results in the stack trace below. It appears that version 0.21+ changed org.apache.hadoop.mapreduce.JobContext from a class to an interface. Exception in thread main java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected at org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103) at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045) at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062) at MyHadoopApp.run(MyHadoopApp.java:163) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at MyHadoopApp.main(MyHadoopApp.java:82) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.util.RunJar.main(RunJar.java:192) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661436#comment-13661436 ] Mck SembWever commented on CASSANDRA-2388: -- Jonathan, I can't say i'm in favour of enforcing data locality. Because data locality in hadoop doesn't work this way… when a tasktracker through the next heartbeat announces that it has a task slot free the jobtracker will do its best to assign a task with data locality to it but failing this will assign it a random task. the number of these random tasks can be quite high, just like i mentioned above {quote} Tasks are still being evenly distributed around the ring regardless of what the ColumnFamilySplit.locations is. {quote} This can be almost solved by upgrading to hadoop-0.21+, using the fair scheduler and setting the property {code}property namemapred.fairscheduler.locality.delay/name value36000/value property{code}. At the end of the day while hadoop encourages data locality it does not enforce it. The ideal approach would be to sort all locations by proximity. The feasible approach hopefully is still [~tjake]'s above. In addition i'd be in favour of a setting in the job's configuration as to whether a location from another datacenter can be used. references: - http://www.infoq.com/articles/HadoopInputFormat - http://www.mentby.com/matei-zaharia/running-only-node-local-jobs.html - https://groups.google.com/a/cloudera.org/forum/?fromgroups#!topic/cdh-user/3ggnE5hV0PY - http://www.cs.berkeley.edu/~matei/papers/2010/eurosys_delay_scheduling.pdf ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.6 Reporter: Eldon Stegall Assignee: Mck SembWever Priority: Minor Labels: hadoop, inputformat Fix For: 1.2.5 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-4977) Expose new SliceQueryFilter features through Thrift interface
[ https://issues.apache.org/jira/browse/CASSANDRA-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575211#comment-13575211 ] Mck SembWever commented on CASSANDRA-4977: -- duplicate of CASSANDRA-2710 ? Expose new SliceQueryFilter features through Thrift interface - Key: CASSANDRA-4977 URL: https://issues.apache.org/jira/browse/CASSANDRA-4977 Project: Cassandra Issue Type: Improvement Components: API Affects Versions: 1.2.0 beta 2 Reporter: aaa SliceQueryFilter has some very useful new features like ability to specify a composite column prefix to group by and specify a limit of groups to return. This is very useful if for example I have a wide row with columns prefixed by timestamp and I want to retrieve the latest columns, but I don't know the column names. Say I have a row {{row - (t1, c1), (t1, c2)... (t1, cn) ... (t0,c1) ... etc}} Query slice range (t1,) group by prefix (1) limit (1) As a more general question, is the Thrift interface going to be kept up-to-date with the feature changes or will it be left behind (a mistake IMO) ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-4417) invalid counter shard detected
[ https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-4417: - Attachment: cassandra-mck.log.bz2 Sylvain, here's log from one node. For most of the log we were running 1.0.8. And then at line 2883399 we upgraded (and this was the first node to upgrade) to 1.1.6. The error msg comes every few seconds. Our counters are sub-columns inside supercolumns. We completed the upgrade on all nodes. Then restarted again (because jna was missing). We are now running upgradesstables but that's not in this logfile. The error msgs still appear. An operational problem we're had recently is that we had one node down for ~one month (faulty raid controller) and when we finally brought the node back into the cluster nightly repairs would never finish. In the end we just disabled nightly repairs (we don't have tombstones) with the plan that an upgrade and upgradesstables would bring us back to a state where repairs would work again. I have no idea if this can be related. invalid counter shard detected --- Key: CASSANDRA-4417 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: Amazon Linux Reporter: Senthilvel Rangaswamy Attachments: cassandra-mck.log.bz2, err.txt Seeing errors like these: 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard What does it mean ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected
[ https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492245#comment-13492245 ] Mck SembWever edited comment on CASSANDRA-4417 at 11/7/12 10:20 AM: Sylvain, here's log from one node. For most of the log we were running 1.0.8. And then at line 2883399 we upgraded (and this was the first node to upgrade) to 1.1.6. The error msg comes every few seconds. Our counters are sub-columns inside supercolumns. We completed the upgrade on all nodes. Then restarted again (because jna was missing). We are now running upgradesstables but that's not in this logfile. The error msgs still appear. An operational problem we've had recently is that we had one node down for ~one month (faulty raid controller) and when we finally brought the node back into the cluster nightly repairs would never finish. In the end we just disabled nightly repairs (we don't have tombstones) with the plan that an upgrade and upgradesstables would bring us back to a state where repairs would work again. I have no idea if this can be related. was (Author: michaelsembwever): Sylvain, here's log from one node. For most of the log we were running 1.0.8. And then at line 2883399 we upgraded (and this was the first node to upgrade) to 1.1.6. The error msg comes every few seconds. Our counters are sub-columns inside supercolumns. We completed the upgrade on all nodes. Then restarted again (because jna was missing). We are now running upgradesstables but that's not in this logfile. The error msgs still appear. An operational problem we're had recently is that we had one node down for ~one month (faulty raid controller) and when we finally brought the node back into the cluster nightly repairs would never finish. In the end we just disabled nightly repairs (we don't have tombstones) with the plan that an upgrade and upgradesstables would bring us back to a state where repairs would work again. I have no idea if this can be related. invalid counter shard detected --- Key: CASSANDRA-4417 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 1.1.1 Environment: Amazon Linux Reporter: Senthilvel Rangaswamy Attachments: cassandra-mck.log.bz2, err.txt Seeing errors like these: 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick highest to self-heal; this indicates a bug or corruption generated a bad counter shard What does it mean ? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3237) refactor super column implmentation to use composite column names instead
[ https://issues.apache.org/jira/browse/CASSANDRA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492684#comment-13492684 ] Mck SembWever commented on CASSANDRA-3237: -- {quote}... in any case I'm not convinced a startup scrub is the right approach. I think that what we need is to write conversions functions...{quote} Despite there being no startup scrub this still means that a manual `nodetool upgradesstables` will using these conversions functions to rewrite all sstables to composite columns? refactor super column implmentation to use composite column names instead - Key: CASSANDRA-3237 URL: https://issues.apache.org/jira/browse/CASSANDRA-3237 Project: Cassandra Issue Type: Improvement Reporter: Matthew F. Dennis Priority: Minor Labels: ponies Fix For: 1.3 Attachments: cassandra-supercolumn-irc.log super columns are annoying. composite columns offer a better API and performance. people should use composites over super columns. some people are already using super columns. C* should implement the super column API in terms of composites to reduce code, complexity and testing as well as increase performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Comment Edited] (CASSANDRA-3237) refactor super column implmentation to use composite column names instead
[ https://issues.apache.org/jira/browse/CASSANDRA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492684#comment-13492684 ] Mck SembWever edited comment on CASSANDRA-3237 at 11/7/12 8:47 PM: --- {quote}... in any case I'm not convinced a startup scrub is the right approach. I think that what we need is to write conversions functions...{quote} Despite there being no startup scrub this still means that a manual `nodetool upgradesstables` will use these conversions functions to rewrite all sstables to composite columns? was (Author: michaelsembwever): {quote}... in any case I'm not convinced a startup scrub is the right approach. I think that what we need is to write conversions functions...{quote} Despite there being no startup scrub this still means that a manual `nodetool upgradesstables` will using these conversions functions to rewrite all sstables to composite columns? refactor super column implmentation to use composite column names instead - Key: CASSANDRA-3237 URL: https://issues.apache.org/jira/browse/CASSANDRA-3237 Project: Cassandra Issue Type: Improvement Reporter: Matthew F. Dennis Priority: Minor Labels: ponies Fix For: 1.3 Attachments: cassandra-supercolumn-irc.log super columns are annoying. composite columns offer a better API and performance. people should use composites over super columns. some people are already using super columns. C* should implement the super column API in terms of composites to reduce code, complexity and testing as well as increase performance. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114220#comment-13114220 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/26/11 8:29 AM: --- {{fullscan-example1.log}} is debug from a full scan job. It scans data over a full year (and since the cluster's ring range only holds 3 months of data such a job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving new data (5-10k rows/second). This new data is being written directly using {{StorageProxy.mutate(..)}} with code somewhat similar to the second example in [wiki: ScribeToCassandra|http://wiki.apache.org/cassandra/ScribeToCassandra]. This cluster has {{binary_memtable_throughput_in_mb: 1024}} and {{Xmx8g}}. There are 3 nodes in the cluster each 48g ram and 24cpus. was (Author: michaelsembwever): {{fullscan-example1.log}} is debug from a full scan job. It scans data over a full year (and since the cluster's ring range only holds 3 months of data such a job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving new data (5-10k rows/second). This new data is being written directly using {{StorageProxy.mutate(..)}} with code somewhat similar to the second example in [wiki: ScribeToCassandra|http://wiki.apache.org/cassandra/ScribeToCassandra]. ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4, 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Fix For: 0.8.6 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, fullscan-example1.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114196#comment-13114196 ] Mck SembWever commented on CASSANDRA-3150: -- Yes i see now an CFRR that went to 600%. But it's still a long way from problematic. My understanding of how the sampled keys is generated is it all happens when a sstable is read. If a restart or a (manual) compact doesn't help, why did an upgrade help? What can i investigate to provide more debug here? ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4, 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Fix For: 0.8.6 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3150: - Attachment: fullscan-example1.log Here's debug from a full scan job. It scans data over a full year (and since the cluster's ring range only hold 3 months of data this job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4, 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Fix For: 0.8.6 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, fullscan-example1.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114220#comment-13114220 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/25/11 11:23 AM: Here's debug from a full scan job. It scans data over a full year (and since the cluster's ring range only holds 3 months of data such a job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} was (Author: michaelsembwever): Here's debug from a full scan job. It scans data over a full year (and since the cluster's ring range only hold 3 months of data this job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4, 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Fix For: 0.8.6 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, fullscan-example1.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114220#comment-13114220 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/25/11 11:27 AM: Here's debug from a full scan job. It scans data over a full year (and since the cluster's ring range only holds 3 months of data such a job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving new data (5-10k rows/second). was (Author: michaelsembwever): Here's debug from a full scan job. It scans data over a full year (and since the cluster's ring range only holds 3 months of data such a job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4, 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Fix For: 0.8.6 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, fullscan-example1.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114220#comment-13114220 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/25/11 11:30 AM: {{fullscan-example1.log}} is debug from a full scan job. It scans data over a full year (and since the cluster's ring range only holds 3 months of data such a job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving new data (5-10k rows/second). was (Author: michaelsembwever): Here's debug from a full scan job. It scans data over a full year (and since the cluster's ring range only holds 3 months of data such a job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving new data (5-10k rows/second). ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4, 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Fix For: 0.8.6 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, fullscan-example1.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114220#comment-13114220 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/25/11 11:32 AM: {{fullscan-example1.log}} is debug from a full scan job. It scans data over a full year (and since the cluster's ring range only holds 3 months of data such a job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving new data (5-10k rows/second). This new data is being written directly using {{StorageProxy.mutate(..)}} with code somewhat similar to the second example in [wiki: ScribeToCassandra|http://wiki.apache.org/cassandra/ScribeToCassandra]. was (Author: michaelsembwever): {{fullscan-example1.log}} is debug from a full scan job. It scans data over a full year (and since the cluster's ring range only holds 3 months of data such a job guarantees a full scan). In the debug you see the splits. {{`nodetool ring`}} gives {noformat}Address DC RackStatus State Load OwnsToken Token(bytes[5554]) 152.90.241.22 DC1 RAC1Up Normal 16.65 GB33.33% Token(bytes[00]) 152.90.241.23 DC2 RAC1Up Normal 63.22 GB33.33% Token(bytes[2aaa]) 152.90.241.24 DC1 RAC1Up Normal 72.4 KB 33.33% Token(bytes[5554]) {noformat} The problematic split ends up being {noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', endToken='2aaa', dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving new data (5-10k rows/second). ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4, 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Fix For: 0.8.6 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, fullscan-example1.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3150: - Affects Version/s: 0.8.5 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4, 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever resolved CASSANDRA-3150. -- Resolution: Cannot Reproduce Fix Version/s: 0.8.6 An upgrade to 0.8.6 /seems/ to have fixed this. I saw one map task (CFRR) reach 150% but nothing like i was seeing before. Maybe there is something else happening in the upgrade process that fixed it? Otherwise i'm happy to close this issue for now and reopen if the problem exacerbates again. ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4, 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Fix For: 0.8.6 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113326#comment-13113326 ] Mck SembWever commented on CASSANDRA-3150: -- This is persisting to be a problem. And continuously gets worse through the day. I can see like half my cf being read from one split. It doesn't matter if it's ByteOrderingPartition or RandomPartitioner. I've attached two screenshots from hadoop webpages. ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3150: - Attachment: Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113326#comment-13113326 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/23/11 11:04 AM: This is persisting to be a problem. And continuously gets worse through the day. I can see like half my cf being read from one split. It doesn't matter if it's ByteOrderingPartition or RandomPartitioner. I've attached two screenshots from hadoop webpages (you see 36million map input records when cassandra.input.split.size is set to 393216). was (Author: michaelsembwever): This is persisting to be a problem. And continuously gets worse through the day. I can see like half my cf being read from one split. It doesn't matter if it's ByteOrderingPartition or RandomPartitioner. I've attached two screenshots from hadoop webpages. ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3150: - Attachment: Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113326#comment-13113326 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/23/11 12:05 PM: This is persisting to be a problem. And continuously gets worse through the day. I can see like half my cf being read from one split. It doesn't matter if it's ByteOrderingPartition or RandomPartitioner. I've attached two screenshots from hadoop webpages (you see 36million map input records when cassandra.input.split.size is set to 393216). bq. ... double-check how many rows there really are in the given split range. I can confirm that if i let these jobs run through to their completion (despite how terribly long they may take) that the results are correct. It would seem that the split ranges are incorrect (not the rows within in them). was (Author: michaelsembwever): This is persisting to be a problem. And continuously gets worse through the day. I can see like half my cf being read from one split. It doesn't matter if it's ByteOrderingPartition or RandomPartitioner. I've attached two screenshots from hadoop webpages (you see 36million map input records when cassandra.input.split.size is set to 393216). ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113326#comment-13113326 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/23/11 7:30 PM: --- This is persisting to be a problem. And continuously gets worse through the day. I can see two thirds of my cf being read from one split. It doesn't matter if it's ByteOrderingPartition or RandomPartitioner. Although i'm having a greater problem with the former. A compact and restart didn't help. I've attached two screenshots from hadoop webpages (you see 36million map input records when cassandra.input.split.size is set to 393216). bq. ... double-check how many rows there really are in the given split range. I can confirm that if i let these jobs run through to their completion (despite how terribly long they may take) that the results are correct. It would seem that the split ranges are incorrect (not the rows within in them). was (Author: michaelsembwever): This is persisting to be a problem. And continuously gets worse through the day. I can see like half my cf being read from one split. It doesn't matter if it's ByteOrderingPartition or RandomPartitioner. I've attached two screenshots from hadoop webpages (you see 36million map input records when cassandra.input.split.size is set to 393216). bq. ... double-check how many rows there really are in the given split range. I can confirm that if i let these jobs run through to their completion (despite how terribly long they may take) that the results are correct. It would seem that the split ranges are incorrect (not the rows within in them). ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, Screenshot-Counters for task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3197) Separate input and output connection details in ConfigHelper
[ https://issues.apache.org/jira/browse/CASSANDRA-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3197: - Attachment: CASSANDRA-3197-extra.patch patch for contrib/pig Separate input and output connection details in ConfigHelper Key: CASSANDRA-3197 URL: https://issues.apache.org/jira/browse/CASSANDRA-3197 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Attachments: CASSANDRA-3197-extra.patch, CASSANDRA-3197.patch Currently ConfigHelper's getInitialAddress(..) getRpcPort(..) and getPartitioner(..) presume CFIF will be using the same cluster as CFOF. The latter two are a problem for me as on the same servers i'm running two clusters, one w/ ByteOrderingPartitioner and the other with RP), and i would like to read from the BOP cluster and write to the RP cluster. getInitialAddress(..) is of little concern to me. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3197) Separate input and output connection details in ConfigHelper
[ https://issues.apache.org/jira/browse/CASSANDRA-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3197: - Fix Version/s: 0.8.6 Separate input and output connection details in ConfigHelper Key: CASSANDRA-3197 URL: https://issues.apache.org/jira/browse/CASSANDRA-3197 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Fix For: 0.8.6 Attachments: CASSANDRA-3197-extra.patch, CASSANDRA-3197.patch Currently ConfigHelper's getInitialAddress(..) getRpcPort(..) and getPartitioner(..) presume CFIF will be using the same cluster as CFOF. The latter two are a problem for me as on the same servers i'm running two clusters, one w/ ByteOrderingPartitioner and the other with RP), and i would like to read from the BOP cluster and write to the RP cluster. getInitialAddress(..) is of little concern to me. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3150: - Summary: ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) (was: ColumnFormatRecordReader loops forever) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack) -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3150: - Attachment: attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log Debug from a task that was still running at 1200% The initial split for this CFRR is 30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8 This job was run with cassandra.input.split.size=196608 cassandra.range.batch.size=16000 therefore there shouldn't be more than 13 calls to get_range_slices(..) in this task. There was already 166 calls in this log. What i can see here is that the original split for this task is just way too big and this comes from {{describe_splits(..)}} ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102062#comment-13102062 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/10/11 6:02 PM: --- Debug from a task that was still running at 1200% The initial split for this CFRR is 30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8 This job was run with cassandra.input.split.size=196608 cassandra.range.batch.size=16000 therefore there shouldn't be more than 13 calls to get_range_slices(..) in this task. There was already 166 calls in this log. What i can see here is that the original split for this task is just way too big and this comes from {{describe_splits(..)}} which in turn depends on index_interval. Reading {{StorageService.getSplits(..)}} i would guess that the split can in fact contain many more keys with the default sampling of 128. Question is how low can/should i bring index_interval ? was (Author: michaelsembwever): Debug from a task that was still running at 1200% The initial split for this CFRR is 30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8 This job was run with cassandra.input.split.size=196608 cassandra.range.batch.size=16000 therefore there shouldn't be more than 13 calls to get_range_slices(..) in this task. There was already 166 calls in this log. What i can see here is that the original split for this task is just way too big and this comes from {{describe_splits(..)}} ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102062#comment-13102062 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/10/11 6:04 PM: --- Debug from a task that was still running at 1200% The initial split for this CFRR is 30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8 This job was run with cassandra.input.split.size=196608 cassandra.range.batch.size=16000 therefore there shouldn't be more than 13 calls to get_range_slices(..) in this task. There was already 166 calls in this log. What i can see here is that the original split for this task is just way too big and this comes from {{describe_splits(..)}} which in turn depends on index_interval. Reading {{StorageService.getSplits(..)}} i would guess that the split can in fact contain many more keys with the default sampling of 128. Question is how low can/should i bring index_interval (this cf can have up to 8 billion rows)? was (Author: michaelsembwever): Debug from a task that was still running at 1200% The initial split for this CFRR is 30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8 This job was run with cassandra.input.split.size=196608 cassandra.range.batch.size=16000 therefore there shouldn't be more than 13 calls to get_range_slices(..) in this task. There was already 166 calls in this log. What i can see here is that the original split for this task is just way too big and this comes from {{describe_splits(..)}} which in turn depends on index_interval. Reading {{StorageService.getSplits(..)}} i would guess that the split can in fact contain many more keys with the default sampling of 128. Question is how low can/should i bring index_interval ? ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102115#comment-13102115 ] Mck SembWever commented on CASSANDRA-3150: -- There's a lot to learn about cassandra so forgive my ignorance in so many areas. So how can {{StorageService.getSplits(..)}} be so out of whack? Is there anything i can tune to better this situation? (Or is there any other debug i can provide?) ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch, attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100093#comment-13100093 ] Mck SembWever commented on CASSANDRA-2388: -- In the meantime could we make this behavior configurable. eg replace CFRR:176 with something like {noformat} if(ConfigHelper.isDataLocalityDisabled()) { return split.getLocations()[0]; } else { throw new UnsupportedOperationException(no local connection available); }{noformat} ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.6 Reporter: Eldon Stegall Assignee: Mck SembWever Labels: hadoop, inputformat Fix For: 0.8.6 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100717#comment-13100717 ] Mck SembWever commented on CASSANDRA-2388: -- Well that would work for me, was only thinking you want to push a default behavior (especially for those using a RP). But I think a better understanding (at least from me) of hadoop's task scheduling is required before enforcing data locality, as as-is it certainly doesn't work. ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.6 Reporter: Eldon Stegall Assignee: Mck SembWever Labels: hadoop, inputformat Fix For: 0.8.6 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100717#comment-13100717 ] Mck SembWever edited comment on CASSANDRA-2388 at 9/8/11 9:37 PM: -- Well that would work for me, was only thinking you want to push a default behavior (especially for those using a RP). But I think a better understanding (at least from me) of hadoop's task scheduling is required before enforcing data locality, as as-is it certainly doesn't work for all. was (Author: michaelsembwever): Well that would work for me, was only thinking you want to push a default behavior (especially for those using a RP). But I think a better understanding (at least from me) of hadoop's task scheduling is required before enforcing data locality, as as-is it certainly doesn't work. ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.6 Reporter: Eldon Stegall Assignee: Mck SembWever Labels: hadoop, inputformat Fix For: 0.8.6 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3108) Make Range and Bounds objects client-safe
[ https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098810#comment-13098810 ] Mck SembWever commented on CASSANDRA-3108: -- No, in itself it doesn't implement wrapping ranges. But by making intersectionBothWrapping(..) and intersectionOneWrapping(..) client safe it makes it possible to implement. I think the best explanation to your question Jonathan is to look at the patch available in CASSANDRA-3137 Make Range and Bounds objects client-safe - Key: CASSANDRA-3108 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Mck SembWever Labels: hadoop Fix For: 0.8.5 Attachments: 3108.txt From Mck's comment on CASSANDRA-1125: Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call to new Range(token, token) which calls StorageService.getPartitioner() and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from new Range(token, token) to new Range(token, token, partitioner) making the presumption that the partitioner for the new Range will be the same as this Range. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange
[ https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099124#comment-13099124 ] Mck SembWever edited comment on CASSANDRA-3137 at 9/7/11 5:37 PM: -- Indeed. I could be using this asap. The use case is... We're using a ByteOrderedPartition because we run incremental hadoop jobs over one of our column families where events initially come in. This cf has RF=1 and time-based UUID keys that are manipulated so that their byte ordering are time ordered. (the byte-unsigned timestamp put up front). Each column has ttl of 3 months. After 3 months of data we saw all data on one node. Now i understand as the token range is the timestamp range which is from 1970 to 2270 so of course our 3 month period fell on one node (with a 3 node cluster even 100 years would fall on one node). To properly manage this cf we need to either continuously move nodes around, a cumbersome operation, or change the key so it's prefixed with {{timestamp % 3months}}. This would allow 3 months of data to cycle over the whole cluster and wrap around again. Obviously we're leaning towards the latter solution as it simplifies operations. But it does require this patch. (When CFIF supports IndexClause everything changes, we change our cluster to RandomPartitioner, use secondary indexes, and never look back...) was (Author: michaelsembwever): Indeed. I could be using this asap. The use case is... We're using a ByteOrderedPartition because we run incremental hadoop jobs over one of our column families where events initially come in. This cf has RF=1 and time-based UUID keys that are manipulated so that their byte ordering are time ordered. (mostSigBits extracted and the byte-unsigned timestamp put up front). Each column has ttl of 3 months. After 3 months of data we saw all data on one node. Now i understand as the token range is the timestamp range which is from 1970 to 2270 so of course our 3 month period fell on one node (with 3 node cluster even 100 years would fall on one node). To properly manage this cf we need to either continuous move nodes around, a cumbersome operation, or change the key so it's prefixed with {{timestamp % 3months}}. This would allow 3 months of data to cycle over the whole cluster and wrap around again. Obviously we're leaning towards the latter solution as it simplifies operations. But it does require this patch. (When CFIF supports IndexClause everything changes, we change our cluster to RandomPartitioner, use secondary indexes, and never look back...) Implement wrapping intersections for ConfigHelper's InputKeyRange - Key: CASSANDRA-3137 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Attachments: CASSANDRA-3137.patch, CASSANDRA-3137.patch Before there was no support for multiple intersections between the split's range and the job's configured range. After CASSANDRA-3108 it is now possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange
[ https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099124#comment-13099124 ] Mck SembWever commented on CASSANDRA-3137: -- Indeed. I could be using this asap. The use case is... We're using a ByteOrderedPartition because we run incremental hadoop jobs over one of our column families where events initially come in. This cf has RF=1 and time-based UUID keys that are manipulated so that their byte ordering are time ordered. (mostSigBits extracted and the byte-unsigned timestamp put up front). Each column has ttl of 3 months. After 3 months of data we saw all data on one node. Now i understand as the token range is the timestamp range which is from 1970 to 2270 so of course our 3 month period fell on one node (with 3 node cluster even 100 years would fall on one node). To properly manage this cf we need to either continuous move nodes around, a cumbersome operation, or change the key so it's prefixed with {{timestamp % 3months}}. This would allow 3 months of data to cycle over the whole cluster and wrap around again. Obviously we're leaning towards the latter solution as it simplifies operations. But it does require this patch. (When CFIF supports IndexClause everything changes, we change our cluster to RandomPartitioner, use secondary indexes, and never look back...) Implement wrapping intersections for ConfigHelper's InputKeyRange - Key: CASSANDRA-3137 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Attachments: CASSANDRA-3137.patch, CASSANDRA-3137.patch Before there was no support for multiple intersections between the split's range and the job's configured range. After CASSANDRA-3108 it is now possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3150: - Attachment: CASSANDRA-3150.patch If the split's end token does not match any of the row key tokens the RowIterator will never stop (see RowIterator:243) This patch 1) presumes this is the problem, 2) compares each row token with the split end token and exits when need be (which only works on order preserving partitioners, and 3) stops iterating when totalRowCount has been read. Just (3) has been tested and works. ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099234#comment-13099234 ] Mck SembWever commented on CASSANDRA-3150: -- Here keyRange is startToken to split.getEndToken() startToken is updated each iterate to the last row read (each iterate is batchRowCount rows). What happens is split.getEndToken() doesn't correspond to any of the rowKeys? To me it reads that startToken will hop over split.getEndToken() and get_rage_slices(..) will start returning wrapping ranges. This will still return rows and so the iteration will continue, now forever. The only way out for this code today is a) startToken equals split.getEndToken(), or b) get_range_slices(..) is called with startToken equals split.getEndToken() OR a gap so small there exists no rows in between. ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099234#comment-13099234 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/7/11 7:17 PM: -- Here keyRange is startToken to split.getEndToken() startToken is updated each iterate to the last row read (each iterate is batchRowCount rows). What happens if split.getEndToken() doesn't correspond to any of the rowKeys? To me it reads that startToken will hop over split.getEndToken() and get_range_slices(..) will start returning wrapping ranges. This will still return rows and so the iteration will continue, now forever. The only way out for this code today is a) startToken equals split.getEndToken(), or b) get_range_slices(..) is called with startToken equals split.getEndToken() OR a gap so small there exists no rows in between. was (Author: michaelsembwever): Here keyRange is startToken to split.getEndToken() startToken is updated each iterate to the last row read (each iterate is batchRowCount rows). What happens is split.getEndToken() doesn't correspond to any of the rowKeys? To me it reads that startToken will hop over split.getEndToken() and get_rage_slices(..) will start returning wrapping ranges. This will still return rows and so the iteration will continue, now forever. The only way out for this code today is a) startToken equals split.getEndToken(), or b) get_range_slices(..) is called with startToken equals split.getEndToken() OR a gap so small there exists no rows in between. ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099234#comment-13099234 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/7/11 7:24 PM: -- Here keyRange is startToken to split.getEndToken() startToken is updated each iterate to the last row read (each iterate is batchRowCount rows). What happens if split.getEndToken() doesn't correspond to any of the rowKeys? To me it reads that startToken will hop over split.getEndToken() and get_range_slices(..) will start querying against wrapping ranges. This will still return rows and so the iteration will continue, now forever. The only way out for this code today is a) startToken equals split.getEndToken(), or b) get_range_slices(..) is called with startToken equals split.getEndToken() OR a gap so small there exists no rows in between. was (Author: michaelsembwever): Here keyRange is startToken to split.getEndToken() startToken is updated each iterate to the last row read (each iterate is batchRowCount rows). What happens if split.getEndToken() doesn't correspond to any of the rowKeys? To me it reads that startToken will hop over split.getEndToken() and get_range_slices(..) will start returning wrapping ranges. This will still return rows and so the iteration will continue, now forever. The only way out for this code today is a) startToken equals split.getEndToken(), or b) get_range_slices(..) is called with startToken equals split.getEndToken() OR a gap so small there exists no rows in between. ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099254#comment-13099254 ] Mck SembWever commented on CASSANDRA-3150: -- What about the case where tokens of different length exist. I don't know if this is actually possible but from {noformat} Address Status State LoadOwnsToken Token(bytes[76118303760208547436305468318170713656]) 152.90.241.22 Up Normal 270.46 GB 33.33% Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8]) 152.90.241.24 Up Normal 247.89 GB 33.33% Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8]) 152.90.241.23 Up Normal 1.1 TB 33.33% Token(bytes[76118303760208547436305468318170713656]) {noformat} you see the real tokens are very long compared to the initial_tokens the cluster was configured with. (The two long tokens has since been moved, and to note the load on .23 never decreased to ~300GB as it should have...). ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099254#comment-13099254 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/7/11 7:38 PM: -- What about the case where tokens of different length exist. I don't know if this is actually possible but from {noformat} Address Status State LoadOwnsToken Token(bytes[76118303760208547436305468318170713656]) 152.90.241.22 Up Normal 270.46 GB 33.33% Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8]) 152.90.241.24 Up Normal 247.89 GB 33.33% Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8]) 152.90.241.23 Up Normal 1.1 TB 33.33% Token(bytes[76118303760208547436305468318170713656]) {noformat} you see the real tokens are very long compared to the initial_tokens the cluster was configured with. (The two long tokens have been moved off their initial_tokens, and to note the load on .23 never decreased to ~300GB as it should have...). was (Author: michaelsembwever): What about the case where tokens of different length exist. I don't know if this is actually possible but from {noformat} Address Status State LoadOwnsToken Token(bytes[76118303760208547436305468318170713656]) 152.90.241.22 Up Normal 270.46 GB 33.33% Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8]) 152.90.241.24 Up Normal 247.89 GB 33.33% Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8]) 152.90.241.23 Up Normal 1.1 TB 33.33% Token(bytes[76118303760208547436305468318170713656]) {noformat} you see the real tokens are very long compared to the initial_tokens the cluster was configured with. (The two long tokens has since been moved, and to note the load on .23 never decreased to ~300GB as it should have...). ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099254#comment-13099254 ] Mck SembWever edited comment on CASSANDRA-3150 at 9/7/11 7:43 PM: -- What about the case where tokens of different length exist. Could get_range_slices be busted there? I don't know if this is actually possible but from {noformat} Address Status State LoadOwnsToken Token(bytes[76118303760208547436305468318170713656]) 152.90.241.22 Up Normal 270.46 GB 33.33% Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8]) 152.90.241.24 Up Normal 247.89 GB 33.33% Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8]) 152.90.241.23 Up Normal 1.1 TB 33.33% Token(bytes[76118303760208547436305468318170713656]) {noformat} you see the real tokens are very long compared to the initial_tokens the cluster was configured with. (The two long tokens have been moved off their initial_tokens, and to note the load on .23 never decreased to ~300GB as it should have...). was (Author: michaelsembwever): What about the case where tokens of different length exist. I don't know if this is actually possible but from {noformat} Address Status State LoadOwnsToken Token(bytes[76118303760208547436305468318170713656]) 152.90.241.22 Up Normal 270.46 GB 33.33% Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8]) 152.90.241.24 Up Normal 247.89 GB 33.33% Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8]) 152.90.241.23 Up Normal 1.1 TB 33.33% Token(bytes[76118303760208547436305468318170713656]) {noformat} you see the real tokens are very long compared to the initial_tokens the cluster was configured with. (The two long tokens have been moved off their initial_tokens, and to note the load on .23 never decreased to ~300GB as it should have...). ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099269#comment-13099269 ] Mck SembWever commented on CASSANDRA-3150: -- bq. BOP is sorting the shorter token correctly since 3 7. Sorry, so that doesn't explain this bug? bq. Load won't decrease until you run cleanup. Never worked. repair and cleanup is run every night, the move was done one week ago and more than a couple of weeks ago. ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever
[ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099943#comment-13099943 ] Mck SembWever commented on CASSANDRA-3150: -- I'll try and put debug in so i can get a log of get_slice_range calls from CFRR... (this may take some days) ColumnFormatRecordReader loops forever -- Key: CASSANDRA-3150 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Priority: Critical Attachments: CASSANDRA-3150.patch From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039 {quote} bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner bq. CFIF's inputSplitSize=196608 bq. 3 map tasks (from 4013) is still running after read 25 million rows. bq. Can this be a bug in StorageService.getSplits(..) ? getSplits looks pretty foolproof to me but I guess we'd need to add more debug logging to rule out a bug there for sure. I guess the main alternative would be a bug in the recordreader paging. {quote} -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges
Allow CFIF to keep going despite unavailable ranges --- Key: CASSANDRA-3136 URL: https://issues.apache.org/jira/browse/CASSANDRA-3136 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Mck SembWever Priority: Minor From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902 use-case-1 We use Cassandra as a storage for web-pages, we store the HTML, all URLs that has the same HTML data and some computed data. We run Hadoop MR jobs to compute lexical and thematical data for each page and for exporting the data to a binary files for later use. URL gets to a Cassandra on user request (a pageview) so if we delete an URL, it gets back quickly if the page is active. Because of that and because there is lots of data, we have the keyspace set to RF=1. We can drop the whole keyspace and it will regenerate quickly and would contain only fresh data, so we don't care about lossing a node. /use-case-1 use-case-2 trying to extract a small random sample (like a pig SAMPLE) of data out of cassandra. /use-case-2 use-case-3 searching for something or some-pattern and one hit is enough. If you get the hit it's a positive result regardless if ranges were ignored, if you don't and you *know* there was a range ignored along the way you can re-run the job later. For example such a job could be run at regular intervals in the day until a hit was found. /use-case-3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges
[ https://issues.apache.org/jira/browse/CASSANDRA-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3136: - Description: From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902 use-case-1 from=Patrik Modesto We use Cassandra as a storage for web-pages, we store the HTML, all URLs that has the same HTML data and some computed data. We run Hadoop MR jobs to compute lexical and thematical data for each page and for exporting the data to a binary files for later use. URL gets to a Cassandra on user request (a pageview) so if we delete an URL, it gets back quickly if the page is active. Because of that and because there is lots of data, we have the keyspace set to RF=1. We can drop the whole keyspace and it will regenerate quickly and would contain only fresh data, so we don't care about lossing a node. /use-case-1 use-case-2 trying to extract a small random sample (like a pig SAMPLE) of data out of cassandra. /use-case-2 use-case-3 searching for something or some-pattern and one hit is enough. If you get the hit it's a positive result regardless if ranges were ignored, if you don't and you *know* there was a range ignored along the way you can re-run the job later. For example such a job could be run at regular intervals in the day until a hit was found. /use-case-3 was: From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902 use-case-1 We use Cassandra as a storage for web-pages, we store the HTML, all URLs that has the same HTML data and some computed data. We run Hadoop MR jobs to compute lexical and thematical data for each page and for exporting the data to a binary files for later use. URL gets to a Cassandra on user request (a pageview) so if we delete an URL, it gets back quickly if the page is active. Because of that and because there is lots of data, we have the keyspace set to RF=1. We can drop the whole keyspace and it will regenerate quickly and would contain only fresh data, so we don't care about lossing a node. /use-case-1 use-case-2 trying to extract a small random sample (like a pig SAMPLE) of data out of cassandra. /use-case-2 use-case-3 searching for something or some-pattern and one hit is enough. If you get the hit it's a positive result regardless if ranges were ignored, if you don't and you *know* there was a range ignored along the way you can re-run the job later. For example such a job could be run at regular intervals in the day until a hit was found. /use-case-3 Allow CFIF to keep going despite unavailable ranges --- Key: CASSANDRA-3136 URL: https://issues.apache.org/jira/browse/CASSANDRA-3136 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Mck SembWever Priority: Minor From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902 use-case-1 from=Patrik Modesto We use Cassandra as a storage for web-pages, we store the HTML, all URLs that has the same HTML data and some computed data. We run Hadoop MR jobs to compute lexical and thematical data for each page and for exporting the data to a binary files for later use. URL gets to a Cassandra on user request (a pageview) so if we delete an URL, it gets back quickly if the page is active. Because of that and because there is lots of data, we have the keyspace set to RF=1. We can drop the whole keyspace and it will regenerate quickly and would contain only fresh data, so we don't care about lossing a node. /use-case-1 use-case-2 trying to extract a small random sample (like a pig SAMPLE) of data out of cassandra. /use-case-2 use-case-3 searching for something or some-pattern and one hit is enough. If you get the hit it's a positive result regardless if ranges were ignored, if you don't and you *know* there was a range ignored along the way you can re-run the job later. For example such a job could be run at regular intervals in the day until a hit was found. /use-case-3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange
Implement wrapping intersections for ConfigHelper's InputKeyRange - Key: CASSANDRA-3137 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.8.4 Reporter: Mck SembWever Assignee: Mck SembWever Before there was no support for multiple intersections between the split's range and the job's configured range. After CASSANDRA-3108 it is now possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3108) Make Range and Bounds objects client-safe
[ https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097126#comment-13097126 ] Mck SembWever commented on CASSANDRA-3108: -- Didn't see it until now but your patch Jonathan removes the limitation that ConfigHelper's InputKeyRange cannot wrap. I've entered CASSANDRA-3137 to allow wrapping intersections in {{ColumnFamilyInputFormat}}. Make Range and Bounds objects client-safe - Key: CASSANDRA-3108 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Mck SembWever Labels: hadoop Fix For: 0.8.5 Attachments: 3108.txt From Mck's comment on CASSANDRA-1125: Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call to new Range(token, token) which calls StorageService.getPartitioner() and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from new Range(token, token) to new Range(token, token, partitioner) making the presumption that the partitioner for the new Range will be the same as this Range. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange
[ https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3137: - Affects Version/s: (was: 0.8.4) 0.8.5 Implement wrapping intersections for ConfigHelper's InputKeyRange - Key: CASSANDRA-3137 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Before there was no support for multiple intersections between the split's range and the job's configured range. After CASSANDRA-3108 it is now possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange
[ https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-3137: - Attachment: CASSANDRA-3137.patch Haven't tested this (with real data) yet. But the code looks pretty simple and straight forward here... Implement wrapping intersections for ConfigHelper's InputKeyRange - Key: CASSANDRA-3137 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137 Project: Cassandra Issue Type: Improvement Components: Hadoop Affects Versions: 0.8.5 Reporter: Mck SembWever Assignee: Mck SembWever Attachments: CASSANDRA-3137.patch Before there was no support for multiple intersections between the split's range and the job's configured range. After CASSANDRA-3108 it is now possible. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges
[ https://issues.apache.org/jira/browse/CASSANDRA-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097280#comment-13097280 ] Mck SembWever commented on CASSANDRA-3136: -- Ok... it was mentioned in CASSANDRA-2388 (by Patrik Modesto). but no one there paid it any attention as it didn't belong to that issue. Allow CFIF to keep going despite unavailable ranges --- Key: CASSANDRA-3136 URL: https://issues.apache.org/jira/browse/CASSANDRA-3136 Project: Cassandra Issue Type: Improvement Components: Hadoop Reporter: Mck SembWever Priority: Minor From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902 use-case-1 from=Patrik Modesto We use Cassandra as a storage for web-pages, we store the HTML, all URLs that has the same HTML data and some computed data. We run Hadoop MR jobs to compute lexical and thematical data for each page and for exporting the data to a binary files for later use. URL gets to a Cassandra on user request (a pageview) so if we delete an URL, it gets back quickly if the page is active. Because of that and because there is lots of data, we have the keyspace set to RF=1. We can drop the whole keyspace and it will regenerate quickly and would contain only fresh data, so we don't care about lossing a node. /use-case-1 use-case-2 trying to extract a small random sample (like a pig SAMPLE) of data out of cassandra. /use-case-2 use-case-3 searching for something or some-pattern and one hit is enough. If you get the hit it's a positive result regardless if ranges were ignored, if you don't and you *know* there was a range ignored along the way you can re-run the job later. For example such a job could be run at regular intervals in the day until a hit was found. /use-case-3 -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3108) Make Range and Bounds objects client-safe
[ https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097282#comment-13097282 ] Mck SembWever commented on CASSANDRA-3108: -- You drastically removed the usage of the {{Range(left, right)}} constructor so that even the usage of {{intersectionBothWrapping(..)}} and {{intersectionOneWrapping(..)}} avoids any server-side calls. Make Range and Bounds objects client-safe - Key: CASSANDRA-3108 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Mck SembWever Labels: hadoop Fix For: 0.8.5 Attachments: 3108.txt From Mck's comment on CASSANDRA-1125: Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call to new Range(token, token) which calls StorageService.getPartitioner() and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from new Range(token, token) to new Range(token, token, partitioner) making the presumption that the partitioner for the new Range will be the same as this Range. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3108) Make Range and Bounds objects client-safe
[ https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097282#comment-13097282 ] Mck SembWever edited comment on CASSANDRA-3108 at 9/5/11 7:41 PM: -- You drastically removed the usage of the {{Range(left, right)}} constructor so that even the usage of {{intersectionBothWrapping(..)}} and {{intersectionOneWrapping(..)}} avoids any server-side calls. In CFIF there AFAIK doesn't seem any other limitation to wrapping ranges... was (Author: michaelsembwever): You drastically removed the usage of the {{Range(left, right)}} constructor so that even the usage of {{intersectionBothWrapping(..)}} and {{intersectionOneWrapping(..)}} avoids any server-side calls. It CFIF there AFAIK doesn't seem any other limitation to wrapping ranges... Make Range and Bounds objects client-safe - Key: CASSANDRA-3108 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Mck SembWever Labels: hadoop Fix For: 0.8.5 Attachments: 3108.txt From Mck's comment on CASSANDRA-1125: Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call to new Range(token, token) which calls StorageService.getPartitioner() and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from new Range(token, token) to new Range(token, token, partitioner) making the presumption that the partitioner for the new Range will be the same as this Range. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-3108) Make Range and Bounds objects client-safe
[ https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097282#comment-13097282 ] Mck SembWever edited comment on CASSANDRA-3108 at 9/5/11 7:41 PM: -- You drastically removed the usage of the {{Range(left, right)}} constructor so that even the usage of {{intersectionBothWrapping(..)}} and {{intersectionOneWrapping(..)}} avoids any server-side calls. It CFIF there AFAIK doesn't seem any other limitation to wrapping ranges... was (Author: michaelsembwever): You drastically removed the usage of the {{Range(left, right)}} constructor so that even the usage of {{intersectionBothWrapping(..)}} and {{intersectionOneWrapping(..)}} avoids any server-side calls. Make Range and Bounds objects client-safe - Key: CASSANDRA-3108 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Mck SembWever Labels: hadoop Fix For: 0.8.5 Attachments: 3108.txt From Mck's comment on CASSANDRA-1125: Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call to new Range(token, token) which calls StorageService.getPartitioner() and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from new Range(token, token) to new Range(token, token, partitioner) making the presumption that the partitioner for the new Range will be the same as this Range. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094658#comment-13094658 ] Mck SembWever edited comment on CASSANDRA-2388 at 9/1/11 6:39 AM: -- see last comment. (say if this should be a separate bug...) Maybe hadoop's task allocation isn't working properly because i've an unbalanced ring (i'm working in parallel to fix that). If this is the case i think it's an unfortunate limitation (the ring must be balanced to get any decent hadoop performance). It's also probably likely when using {{ConfigHelper.setInputRange(..)}} that the number of nodes involved is small (approaching RF). With the default hadoop scheduler your hadoop cluster is occupied while just a few taskTrackers are busy. Of course switching to FairScheduler will help some here. I'll take a look into hadoop's task allocation code as well... was (Author: michaelsembwever): see last comment. (say if this should be a separate bug...) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.6 Reporter: Eldon Stegall Assignee: Mck SembWever Labels: hadoop, inputformat Fix For: 0.7.9, 0.8.5 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever reopened CASSANDRA-2388: -- see last comment. (say if this should be a separate bug...) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.6 Reporter: Eldon Stegall Assignee: Mck SembWever Labels: hadoop, inputformat Fix For: 0.7.9, 0.8.5 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Reopened] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever reopened CASSANDRA-1125: -- Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} and StorageService is null as we're not inside the server. A quick fix (tested) is to change Range:148 from {{new Range(token, token)}} to {{new Range(token, token, partitioner)}} making the presumption that the partitioner for the new Range will be the same as this Range. Filter out ColumnFamily rows that aren't part of the query (using a KeyRange) - Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 0.8.2 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094036#comment-13094036 ] Mck SembWever edited comment on CASSANDRA-1125 at 8/30/11 8:02 PM: --- Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from {{new Range(token, token)}} to {{new Range(token, token, partitioner)}} making the presumption that the partitioner for the new Range will be the same as this Range. was (Author: michaelsembwever): Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} and StorageService is null as we're not inside the server. A quick fix (tested) is to change Range:148 from {{new Range(token, token)}} to {{new Range(token, token, partitioner)}} making the presumption that the partitioner for the new Range will be the same as this Range. Filter out ColumnFamily rows that aren't part of the query (using a KeyRange) - Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 0.8.2 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094036#comment-13094036 ] Mck SembWever edited comment on CASSANDRA-1125 at 8/30/11 8:55 PM: --- Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from {{new Range(token, token)}} to {{new Range(token, token, partitioner)}} making the presumption that the partitioner for the new Range will be the same as this Range. This won't work if the Range wraps in any way (which could be just a limitation of the current KeyRange filtering), but otherwise tests ok. was (Author: michaelsembwever): Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from {{new Range(token, token)}} to {{new Range(token, token, partitioner)}} making the presumption that the partitioner for the new Range will be the same as this Range. Filter out ColumnFamily rows that aren't part of the query (using a KeyRange) - Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 0.8.2 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094097#comment-13094097 ] Mck SembWever commented on CASSANDRA-2388: -- This approach isn't really working for me and was committed too quickly i believe. bq. Although the documentation on inputSplit.getLocations() is a little thin as to whether this restricts which trackers it should run on or whether is just a preference Tasks are still being evenly distributed around the ring regardless of what the ColumnFamilySplit.locations is. The chance of a task actually working is RF/N. Therefore the chances of a blacklisted node are high. Worse is that the whole ring can quickly become blacklisted. ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.6 Reporter: Eldon Stegall Assignee: Mck SembWever Labels: hadoop, inputformat Fix For: 0.7.9, 0.8.5 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3108) Make Range and Bounds objects client-safe
[ https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094104#comment-13094104 ] Mck SembWever commented on CASSANDRA-3108: -- Tested in production. +1 Make Range and Bounds objects client-safe - Key: CASSANDRA-3108 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108 Project: Cassandra Issue Type: Bug Affects Versions: 0.8.2 Reporter: Jonathan Ellis Assignee: Jonathan Ellis Labels: hadoop Fix For: 0.8.5 Attachments: 3108.txt From Mck's comment on CASSANDRA-1125: Something broke here in production once we went out with 0.8.2. It may have been some poor testing, i'm not entirely sure and a little surprised. CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call to new Range(token, token) which calls StorageService.getPartitioner() and StorageService is null as we're not inside the server. A quick fix is to change Range:148 from new Range(token, token) to new Range(token, token, partitioner) making the presumption that the partitioner for the new Range will be the same as this Range. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094097#comment-13094097 ] Mck SembWever edited comment on CASSANDRA-2388 at 8/30/11 9:22 PM: --- This approach isn't really working for me and was committed too quickly i believe. bq. Although the documentation on inputSplit.getLocations() is a little thin as to whether this restricts which trackers it should run on or whether is just a preference Tasks are still being evenly distributed around the ring regardless of what the ColumnFamilySplit.locations is. The chance of a task actually working is RF/N. Therefore the chances of a blacklisted node are high. Worse is that the whole ring can quickly become blacklisted. http://abel-perez.com/hadoop-task-assignment has an interesting section in it explaining how the task assignment is supposed to work (and that data locality is preferred but not a requirement). Could ColumnFamilySplit.locations be in the wrong format? (eg they should ip not hostname?). was (Author: michaelsembwever): This approach isn't really working for me and was committed too quickly i believe. bq. Although the documentation on inputSplit.getLocations() is a little thin as to whether this restricts which trackers it should run on or whether is just a preference Tasks are still being evenly distributed around the ring regardless of what the ColumnFamilySplit.locations is. The chance of a task actually working is RF/N. Therefore the chances of a blacklisted node are high. Worse is that the whole ring can quickly become blacklisted. ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.6 Reporter: Eldon Stegall Assignee: Mck SembWever Labels: hadoop, inputformat Fix For: 0.7.9, 0.8.5 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1034) Remove assumption that Key to Token is one-to-one
[ https://issues.apache.org/jira/browse/CASSANDRA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088280#comment-13088280 ] Mck SembWever commented on CASSANDRA-1034: -- What's the status on this? This issue and its relations back to CASSANDRA-2878 are the only reason we're using OPP. I suspect other users setup with both cassandra and hadoop (or brisk) could be in the same boat. Not only does OPP leave an unbalanced ring (i've had a case where all data went to one node because the keys/tokens were longer than normal) it leaves poor performance to hadoop jobs as tasks requirement on data locality has become stricter (w/ CASSANDRA-2388). Remove assumption that Key to Token is one-to-one - Key: CASSANDRA-1034 URL: https://issues.apache.org/jira/browse/CASSANDRA-1034 Project: Cassandra Issue Type: Bug Reporter: Stu Hood Assignee: Sylvain Lebresne Priority: Minor Fix For: 1.1 Attachments: 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, 0002-LengthPartitioner.patch, 1034-1-Generify-AbstractBounds-v3.patch, 1034-2-Remove-assumption-that-token-and-keys-are-one-to-one-v3.patch, 1034_v1.txt get_range_slices assumes that Tokens do not collide and converts a KeyRange to an AbstractBounds. For RandomPartitioner, this assumption isn't safe, and would lead to a very weird heisenberg. Converting AbstractBounds to use a DecoratedKey would solve this, because the byte[] key portion of the DecoratedKey can act as a tiebreaker. Alternatively, we could make DecoratedKey extend Token, and then use DecoratedKeys in places where collisions are unacceptable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1034) Remove assumption that Key to Token is one-to-one
[ https://issues.apache.org/jira/browse/CASSANDRA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088280#comment-13088280 ] Mck SembWever edited comment on CASSANDRA-1034 at 8/20/11 10:10 PM: What's the status on this? This issue and its relations back to CASSANDRA-2878 are the only reason we're using OPP. I suspect other users setup with both cassandra and hadoop (or brisk) could be in the same boat. Not only does OPP leave an unbalanced ring (i've had a case where all data went to one node because the keys/tokens were longer than normal) it leaves poor performance to hadoop jobs as tasks requirement on data locality has become stricter (w/ CASSANDRA-2388). Apart from the plain preference to be using secondary indexes over OPP. was (Author: michaelsembwever): What's the status on this? This issue and its relations back to CASSANDRA-2878 are the only reason we're using OPP. I suspect other users setup with both cassandra and hadoop (or brisk) could be in the same boat. Not only does OPP leave an unbalanced ring (i've had a case where all data went to one node because the keys/tokens were longer than normal) it leaves poor performance to hadoop jobs as tasks requirement on data locality has become stricter (w/ CASSANDRA-2388). Remove assumption that Key to Token is one-to-one - Key: CASSANDRA-1034 URL: https://issues.apache.org/jira/browse/CASSANDRA-1034 Project: Cassandra Issue Type: Bug Reporter: Stu Hood Assignee: Sylvain Lebresne Priority: Minor Fix For: 1.1 Attachments: 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, 0002-LengthPartitioner.patch, 1034-1-Generify-AbstractBounds-v3.patch, 1034-2-Remove-assumption-that-token-and-keys-are-one-to-one-v3.patch, 1034_v1.txt get_range_slices assumes that Tokens do not collide and converts a KeyRange to an AbstractBounds. For RandomPartitioner, this assumption isn't safe, and would lead to a very weird heisenberg. Converting AbstractBounds to use a DecoratedKey would solve this, because the byte[] key portion of the DecoratedKey can act as a tiebreaker. Alternatively, we could make DecoratedKey extend Token, and then use DecoratedKeys in places where collisions are unacceptable. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062812#comment-13062812 ] Mck SembWever commented on CASSANDRA-1125: -- +1 (tested) on 1125-v3.txt Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-1125: - Summary: Filter out ColumnFamily rows that aren't part of the query (using a KeyRange) (was: Filter out ColumnFamily rows that aren't part of the query) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange) - Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-2878) Filter out ColumnFamily rows that aren't part of the query (using a IndexClause)
Filter out ColumnFamily rows that aren't part of the query (using a IndexClause) Key: CASSANDRA-2878 URL: https://issues.apache.org/jira/browse/CASSANDRA-2878 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Mck SembWever Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062814#comment-13062814 ] Mck SembWever commented on CASSANDRA-1125: -- Created CASSANDRA-2878 for the better solution using a IndexClause Filter out ColumnFamily rows that aren't part of the query (using a KeyRange) - Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059334#comment-13059334 ] Mck SembWever commented on CASSANDRA-2388: -- {quote}2) If we ARE in that situation, the right solution would be to send the job to a TT whose local replica IS live, not to read the data from a nonlocal replica. How can we signal that?{quote}To /really/ solve this issue can we do the following? In CFIF.getRangeMap() take out of each range any endpoints that are not alive. A client connection already exists in this method. This filtering out of dead endpoints wouldn't be difficult, and would move tasks *to* the data making use of replica. This approach does need a new method in cassandra.thrift, eg {{liststring describe_alive_nodes()}} ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059401#comment-13059401 ] Mck SembWever commented on CASSANDRA-1125: -- bq. using KeyRange but with tokens (which Thrift also uses for start-exclusive) this is my preference. i'll make a patch for it. Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-2388: - Attachment: CASSANDRA-2388-extended.patch ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query
[ https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-1125: - Attachment: CASSANDRA-1125.patch Filter out ColumnFamily rows that aren't part of the query -- Key: CASSANDRA-1125 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Mck SembWever Priority: Minor Fix For: 1.0 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch, CASSANDRA-1125.patch Currently, when running a MapReduce job against data in a Cassandra data store, it reads through all the data for a particular ColumnFamily. This could be optimized to only read through those rows that have to do with the query. It's a small change but wanted to put it in Jira so that it didn't fall through the cracks. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059134#comment-13059134 ] Mck SembWever edited comment on CASSANDRA-2388 at 7/2/11 10:08 PM: --- The idea is to setup splits to have only endpoints that are valid trackers. But now i see this is just a brainfart :-) Ofc the jobTracker will apply this match for us. And that CFIF was always 'restricted' to running on endpoints. Although the documentation on inputSplit.getLocations() is a little thin as to whether this restricts which trackers it should run on or whether is just a preference... I guess it doesn't matter, as you point out Jonathan all that's required here is the one line changed in CFRR. was (Author: michaelsembwever): The idea is to setup splits to have only endpoints that are valid trackers. But now i see this is just a brainfart :-) Ofc the jobTracker will apply this match for us. And that CFIF was always 'restricted' to running on endpoints. Although the documentation on inputSplit.getLocations() is a little thin as to whether this restricts which trackers it should run on or whether is just a recommendation... I guess it doesn't matter, as you point out Jonathan all that's required here is the one line changed in CFRR. ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-local-nodes-only.rough-sketch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059134#comment-13059134 ] Mck SembWever commented on CASSANDRA-2388: -- The idea is to setup splits to have only endpoints that are valid trackers. But now i see this is just a brainfart :-) Ofc the jobTracker will apply this match for us. And that CFIF was always 'restricted' to running on endpoints. Although the documentation on inputSplit.getLocations() is a little thin as to whether this restricts which trackers it should run on or whether is just a recommendation... I guess it doesn't matter, as you point out Jonathan all that's required here is the one line changed in CFRR. ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-local-nodes-only.rough-sketch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-2388: - Attachment: (was: CASSANDRA-2388-local-nodes-only.rough-sketch) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-2388: - Attachment: CASSANDRA-2388.patch ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059136#comment-13059136 ] Mck SembWever commented on CASSANDRA-2388: -- the new one-liner CASSANDRA-2388 attached. i'll submit patch once i've tested it some... ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057659#comment-13057659 ] Mck SembWever commented on CASSANDRA-2388: -- Then i would hope for two separate InputFormats. One optimised for local node connection, where cassandra is deemed the more important system over hadoop, and another where data can be read in from anywhere. I think the latter should be supported in some manner since users may not always have the possibility to install hadoop and cassandra on the same servers, or they might not think it to be so critical part (eg if CFIF is reading using a IndexClause the input data set might be quite small and the remaining code in the m/r be the bulk of the processing...) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-2388: - Attachment: CASSANDRA-2388-local-nodes-only.rough-sketch Is CASSANDRA-2388-local-nodes-only-rough-sketch the direction we want then? This is very initial code, i can't get {{new JobClient(JobTracker.getAddress(conf), conf).getClusterStatus().getActiveTrackerNames()}} to work, need a little help here. (Also CFRR.getLocations() can be drastically reduced). ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388-local-nodes-only.rough-sketch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002 ] Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 6:31 AM: --- This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. On a side note, there is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed job may just as likely got to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... was (Author: michaelsembwever): This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. For example a c* node may die in the middle of a TT... ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002 ] Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 6:32 AM: --- This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. On a side note, there is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed job may just as likely go to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... was (Author: michaelsembwever): This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. On a side note, there is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed job may just as likely got to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002 ] Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 7:19 AM: --- This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. On a side note, there is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed task may just as likely go to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... was (Author: michaelsembwever): This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. On a side note, there is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed job may just as likely go to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002 ] Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 7:27 AM: --- This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. On a side note, there is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed task may just as likely go to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... A bug i can see in the patch that did get accepted already is in CassandraServer.java:763 when endpointValid is false and restrictToSameDC is true we end up restricting to a random DC. I can fix this so restrictToSameDC is disabled in such situations. was (Author: michaelsembwever): This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. On a side note, there is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed task may just as likely go to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002 ] Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 7:49 AM: --- - This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. - There is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed task may just as likely go to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. - One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... - A bug i can see in the patch that did get accepted already is in CassandraServer.java:763 when endpointValid is false and restrictToSameDC is true we end up restricting to a random DC. I could fix this so restrictToSameDC is disabled in such situations but this actually invalidates the previous point: we can't restrict to DC anymore and we can only sortByProximity to a random node... I think this supports Jonathan's point that it's overall a poor approach. I'm more and more in preference of my original approach using just client.getDatacenter(..) and not worrying about proximity within the datacenter. - Another bug is that, contray to my patch, the code committed bq. committed with a change to use the dynamic snitch id the passed endpoint is valid. can call {{DynamicEndpointSnitch.sortByProximity(..)}} with an address that is not localhost and this breaks the assertion in the method. was (Author: michaelsembwever): - This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. - There is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed task may just as likely go to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. - One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... - A bug i can see in the patch that did get accepted already is in CassandraServer.java:763 when endpointValid is false and restrictToSameDC is true we end up restricting to a random DC. I can fix this so restrictToSameDC is disabled in such situations. This actually invalidates the previous point: we can't restrict to DC anymore and we can only sortByProximity to a random node... I think this supports Jonathan's point that it's overall a poor approach. I'm more and more in preference of my original approach using just client.getDatacenter(..) and not worrying about proximity within the datacenter. - Another bug is that, contray to my patch, the code committed bq. committed with a change to use the dynamic snitch id the passed endpoint is valid. can call {{DynamicEndpointSnitch.sortByProximity(..)}} with an address that is not localhost and this breaks the assertion in the method. ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch,
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002 ] Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 7:48 AM: --- - This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. - There is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed task may just as likely go to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. - One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... - A bug i can see in the patch that did get accepted already is in CassandraServer.java:763 when endpointValid is false and restrictToSameDC is true we end up restricting to a random DC. I can fix this so restrictToSameDC is disabled in such situations. This actually invalidates the previous point: we can't restrict to DC anymore and we can only sortByProximity to a random node... I think this supports Jonathan's point that it's overall a poor approach. I'm more and more in preference of my original approach using just client.getDatacenter(..) and not worrying about proximity within the datacenter. - Another bug is that, contray to my patch, the code committed bq. committed with a change to use the dynamic snitch id the passed endpoint is valid. can call {{DynamicEndpointSnitch.sortByProximity(..)}} with an address that is not localhost and this breaks the assertion in the method. was (Author: michaelsembwever): This does happen already (i've seen it while testing initial patches that were no good). Problem is that the TT is blacklisted, reducing hadoop's throughput for all jobs running. I bet too that a fallback to a replica is faster than a fallback to another TT. On a side note, there is no guarantee that any given TT will have its split accessible via a local c* node - this is only a preference in CFRR. A failed task may just as likely go to a random c* node. At least now we can actually properly limit to the one DC and sort by proximity. One thing we're not doing here is applying this same DC limit and sort by proximity in the case when there isn't a localhost preference. See CFRR.initialize(..) It would make sense to rewrite CFRR.getLocations(..) to {noformat}private IteratorString getLocations(final Configuration conf) throws IOException { return new SplitEndpointIterator(conf); }{noformat} and then to move the finding-a-preference-to-localhost code into SplitEndpointIterator... A bug i can see in the patch that did get accepted already is in CassandraServer.java:763 when endpointValid is false and restrictToSameDC is true we end up restricting to a random DC. I can fix this so restrictToSameDC is disabled in such situations. ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.
[ https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057470#comment-13057470 ] Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 9:18 PM: --- bq. tlipcon says it comes back after 24h just to be clear about my concerns. this means a dead c* node will bring down a TT. In a hadoop cluster with 3 nodes this means for 24hrs you're lost 33% throughput. (If less than 10% of hadoop jobs used CFIF i could well imagine some pissed users). (What if you have a temporarily problem with flapping c* nodes and you end up with a handful of blacklisted TTs? etc etc etc). All this when using a replica, any replica, could have kept things going smoothly, the only slowdown being some of the data into CFIF had to go over the network instead... was (Author: michaelsembwever): bq. tlipcon says it comes back after 24h just to be clear about my concerns. this means a dead c* node will bring down a TT. In a hadoop cluster with 3 nodes this means for 24hrs you're lost 33% throughput. (If less than 10% of hadoop jobs used CFIF i could well imagine some pissed customers). (What if you have a temporarily problem with flapping c* nodes and you end up with a handful of blacklisted TTs? etc etc etc). All this when using a replica, any replica, could have kept things going smoothly, the only slowdown being some of the data into CFIF had to go over the network instead... ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica. - Key: CASSANDRA-2388 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.6, 0.8.0 Reporter: Eldon Stegall Assignee: Jeremy Hanna Labels: hadoop, inputformat Fix For: 0.7.7, 0.8.2 Attachments: 0002_On_TException_try_next_split.patch, CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch ColumnFamilyRecordReader only tries the first location for a given split. We should try multiple locations for a given split. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira