[jira] [Commented] (CASSANDRA-8032) User based request scheduler

2014-10-01 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155395#comment-14155395
 ] 

Mck SembWever commented on CASSANDRA-8032:
--

Here's a quick initial 
[attempt|https://github.com/michaelsembwever/cassandra/commit/4516f635b923763155c524b04235a6aa39e2e5a3].

Looks like this could be but two lines of code. But the unit tests… h…
I give this more testing and post an update.

 User based request scheduler
 

 Key: CASSANDRA-8032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Mck SembWever
Priority: Minor

 Today only a keyspace based request scheduler exists.
 Post CASSANDRA-4898 it could be possible to implement a request_scheduler 
 based on users (from system_auth.credentials) rather than keyspaces. This 
 could offer a finer granularity of control, from read-only vs read-write 
 users on keyspaces, to application dedicated vs ad-hoc users. Alternatively 
 it could also offer a granularity larger and easier to work with than per 
 keyspace.
 The request scheduler is a useful concept but i think that setups with enough 
 nodes often favour separate clusters rather than either creating separate 
 virtual datacenters or using the request scheduler. To give the request 
 scheduler another, and more flexible, implementation could especially help 
 those users that don't yet have enough nodes to warrant separate clusters, or 
 even separate virtual datacenters. On such smaller clusters cassandra can 
 still be seen as an unstable technology because poor consumers/schemas can 
 easily affect, even bring down, a whole cluster.
 I haven't look into the feasibility of this within the code, but it comes to 
 mind as rather simple, and i would be interested in offering a patch if the 
 idea carries validity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-8032) User based request scheduler

2014-10-01 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14155395#comment-14155395
 ] 

Mck SembWever edited comment on CASSANDRA-8032 at 10/1/14 8:10 PM:
---

Here's a quick initial 
[attempt|https://github.com/michaelsembwever/cassandra/commit/c1f87ad3be011444d6f9d15f7d8e9e1014244cf3].

Looks like this could be but two lines of code. But the unit tests… h…
I give this more testing and post an update.


was (Author: michaelsembwever):
Here's a quick initial 
[attempt|https://github.com/michaelsembwever/cassandra/commit/4516f635b923763155c524b04235a6aa39e2e5a3].

Looks like this could be but two lines of code. But the unit tests… h…
I give this more testing and post an update.

 User based request scheduler
 

 Key: CASSANDRA-8032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Mck SembWever
Priority: Minor

 Today only a keyspace based request scheduler exists.
 Post CASSANDRA-4898 it could be possible to implement a request_scheduler 
 based on users (from system_auth.credentials) rather than keyspaces. This 
 could offer a finer granularity of control, from read-only vs read-write 
 users on keyspaces, to application dedicated vs ad-hoc users. Alternatively 
 it could also offer a granularity larger and easier to work with than per 
 keyspace.
 The request scheduler is a useful concept but i think that setups with enough 
 nodes often favour separate clusters rather than either creating separate 
 virtual datacenters or using the request scheduler. To give the request 
 scheduler another, and more flexible, implementation could especially help 
 those users that don't yet have enough nodes to warrant separate clusters, or 
 even separate virtual datacenters. On such smaller clusters cassandra can 
 still be seen as an unstable technology because poor consumers/schemas can 
 easily affect, even bring down, a whole cluster.
 I haven't look into the feasibility of this within the code, but it comes to 
 mind as rather simple, and i would be interested in offering a patch if the 
 idea carries validity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-8032) User based request scheduler

2014-09-30 Thread Mck SembWever (JIRA)
Mck SembWever created CASSANDRA-8032:


 Summary: User based request scheduler
 Key: CASSANDRA-8032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Mck SembWever
Priority: Minor


Today only a keyspace based request scheduler exists.

Post CASSANDRA-4898 it could be possible to implement a request_scheduler based 
on users (from system_auth.credentials) rather than keyspaces. This could offer 
a finer granularity of control, from read-only vs read-write users on 
keyspaces, to application dedicated vs ad-hoc users. Alternatively it could 
also offer a granularity larger and easier to work with than per keyspace.

The request scheduler is a useful concept but i think that setups with enough 
nodes often favour separate clusters rather than either creating separate 
virtual datacenters or using the request scheduler. To give the request 
scheduler another, and more flexible, implementation could especially help 
those users that don't yet have enough nodes to warrant separate clusters, or 
even separate virtual datacenters. On such smaller clusters cassandra can still 
be seen as an unstable technology because poor consumers/schemas can easily 
affect, even bring down, a whole cluster.

I haven't look into the feasibility of this within the code, but it comes to 
mind as rather simple, and i would be interested in offering a patch if the 
idea carries validity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8032) User based request scheduler

2014-09-30 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14154380#comment-14154380
 ] 

Mck SembWever commented on CASSANDRA-8032:
--

Oh if it's already implemented i've missed it.

Looking a little deeper into trunk, cassandra.yaml still reports that only 
keyspace is available as a request_scheduler_id, and looking into 
ThriftClientState i can only see support for a scheduling value based on 
keyspace.

 User based request scheduler
 

 Key: CASSANDRA-8032
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8032
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Mck SembWever
Priority: Minor

 Today only a keyspace based request scheduler exists.
 Post CASSANDRA-4898 it could be possible to implement a request_scheduler 
 based on users (from system_auth.credentials) rather than keyspaces. This 
 could offer a finer granularity of control, from read-only vs read-write 
 users on keyspaces, to application dedicated vs ad-hoc users. Alternatively 
 it could also offer a granularity larger and easier to work with than per 
 keyspace.
 The request scheduler is a useful concept but i think that setups with enough 
 nodes often favour separate clusters rather than either creating separate 
 virtual datacenters or using the request scheduler. To give the request 
 scheduler another, and more flexible, implementation could especially help 
 those users that don't yet have enough nodes to warrant separate clusters, or 
 even separate virtual datacenters. On such smaller clusters cassandra can 
 still be seen as an unstable technology because poor consumers/schemas can 
 easily affect, even bring down, a whole cluster.
 I haven't look into the feasibility of this within the code, but it comes to 
 mind as rather simple, and i would be interested in offering a patch if the 
 idea carries validity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-6332) Cassandra startup failure: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection

2014-02-10 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895873#comment-13895873
 ] 

Mck SembWever edited comment on CASSANDRA-6332 at 2/10/14 9:06 AM:
---

being a dev environment it's pretty much an open playground so it could well 
have been that without us knowing about it.
  (i've been away the past 2 months but will try and chase up if this was the 
case…)


update:
yes a table of the same name was dropped and then created again with a 
different definition.
this happened in a timeframe of 15 minutes or less…


was (Author: michaelsembwever):
being a dev environment it's pretty much an open playground so it could well 
have been that without us knowing about it.
  (i've been away the past 2 months but will try and chase up if this was the 
case…)

 Cassandra startup failure:  java.util.concurrent.ExecutionException: 
 java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
 --

 Key: CASSANDRA-6332
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6332
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04
 Cassandra 2.0.1
Reporter: Prateek
Priority: Critical

 The cassandra node fails to startup with the following error message. This is 
 currently impacting availability of our production cluster so your quick 
 response is highly appreciated.
 ERROR 22:58:26,046 Exception encountered during startup
 java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
 java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
 at 
 org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:411)
 at 
 org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:400)
 at 
 org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:273)
 at 
 org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:96)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:146)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:126)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:299)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485)
 Caused by: java.util.concurrent.ExecutionException: 
 java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:188)
 at 
 org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:407)
 ... 8 more
 Caused by: java.lang.RuntimeException: 706167655f74616773 is not defined as a 
 collection
 at 
 org.apache.cassandra.db.marshal.ColumnToCollectionType.compareCollectionMembers(ColumnToCollectionType.java:72)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1192)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
 at 
 org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:323)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:160)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338)
 at 
 org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:265)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6332) Cassandra startup failure: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection

2014-02-09 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13895873#comment-13895873
 ] 

Mck SembWever commented on CASSANDRA-6332:
--

being a dev environment it's pretty much an open playground so it could well 
have been that without us knowing about it.
  (i've been away the past 2 months but will try and chase up if this was the 
case…)

 Cassandra startup failure:  java.util.concurrent.ExecutionException: 
 java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
 --

 Key: CASSANDRA-6332
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6332
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04
 Cassandra 2.0.1
Reporter: Prateek
Priority: Critical

 The cassandra node fails to startup with the following error message. This is 
 currently impacting availability of our production cluster so your quick 
 response is highly appreciated.
 ERROR 22:58:26,046 Exception encountered during startup
 java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
 java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
 at 
 org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:411)
 at 
 org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:400)
 at 
 org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:273)
 at 
 org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:96)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:146)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:126)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:299)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485)
 Caused by: java.util.concurrent.ExecutionException: 
 java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:188)
 at 
 org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:407)
 ... 8 more
 Caused by: java.lang.RuntimeException: 706167655f74616773 is not defined as a 
 collection
 at 
 org.apache.cassandra.db.marshal.ColumnToCollectionType.compareCollectionMembers(ColumnToCollectionType.java:72)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1192)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
 at 
 org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:323)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:160)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338)
 at 
 org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:265)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6332) Cassandra startup failure: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 706167655f74616773 is not defined as a collection

2014-02-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13894382#comment-13894382
 ] 

Mck SembWever commented on CASSANDRA-6332:
--

No recipe  to reproduce.
But on a 1.2.9 non-prod cluster we came across this problem.
The above exception occurs while reading commitlogs. Removing the commitlogs 
was a workaround.

 Cassandra startup failure:  java.util.concurrent.ExecutionException: 
 java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
 --

 Key: CASSANDRA-6332
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6332
 Project: Cassandra
  Issue Type: Bug
 Environment: Ubuntu 12.04
 Cassandra 2.0.1
Reporter: Prateek
Priority: Critical

 The cassandra node fails to startup with the following error message. This is 
 currently impacting availability of our production cluster so your quick 
 response is highly appreciated.
 ERROR 22:58:26,046 Exception encountered during startup
 java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
 java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
 at 
 org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:411)
 at 
 org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:400)
 at 
 org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:273)
 at 
 org.apache.cassandra.db.commitlog.CommitLogReplayer.recover(CommitLogReplayer.java:96)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:146)
 at 
 org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:126)
 at 
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:299)
 at 
 org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:442)
 at 
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:485)
 Caused by: java.util.concurrent.ExecutionException: 
 java.lang.RuntimeException: 706167655f74616773 is not defined as a collection
 at java.util.concurrent.FutureTask.report(FutureTask.java:122)
 at java.util.concurrent.FutureTask.get(FutureTask.java:188)
 at 
 org.apache.cassandra.utils.FBUtilities.waitOnFuture(FBUtilities.java:407)
 ... 8 more
 Caused by: java.lang.RuntimeException: 706167655f74616773 is not defined as a 
 collection
 at 
 org.apache.cassandra.db.marshal.ColumnToCollectionType.compareCollectionMembers(ColumnToCollectionType.java:72)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:85)
 at 
 org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1192)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
 at 
 edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
 at 
 org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:323)
 at 
 org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:195)
 at org.apache.cassandra.db.Memtable.resolve(Memtable.java:196)
 at org.apache.cassandra.db.Memtable.put(Memtable.java:160)
 at 
 org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:842)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:373)
 at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:338)
 at 
 org.apache.cassandra.db.commitlog.CommitLogReplayer$1.runMayThrow(CommitLogReplayer.java:265)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5201) Cassandra/Hadoop does not support current Hadoop releases

2013-10-25 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13805107#comment-13805107
 ] 

Mck SembWever commented on CASSANDRA-5201:
--

Hadoop-2 only just came out of alpha/beta with hadoop-2.2.0

 Cassandra/Hadoop does not support current Hadoop releases
 -

 Key: CASSANDRA-5201
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5201
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.2.0
Reporter: Brian Jeltema
Assignee: Dave Brosius
 Attachments: 5201_a.txt


 Using Hadoop 0.22.0 with Cassandra results in the stack trace below.
 It appears that version 0.21+ changed org.apache.hadoop.mapreduce.JobContext
 from a class to an interface.
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
   at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at MyHadoopApp.run(MyHadoopApp.java:163)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
   at MyHadoopApp.main(MyHadoopApp.java:82)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:192)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5883) Switch to Logback

2013-10-23 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13802649#comment-13802649
 ] 

Mck SembWever commented on CASSANDRA-5883:
--

No, most people don't log that much. But it's nice to know that you can, if you 
have to, turn all loggers to DEBUG, or even TRACE, in production without 
hurting performance. With old-log4j and logback this often isn't possible.

The performance isn't an across the board improvement, as far as i've 
understood it, but just an improvement that comes from removing all the little 
contentions from synchronised blocks. I can imagine that this appeals to the c* 
community, even if it's of little practical meaning for normal production c*  

Log4j2 also provides the auto-reload of configuration files. Other features it 
offers is at http://logging.apache.org/log4j/2.x/

 Switch to Logback
 -

 Key: CASSANDRA-5883
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5883
 Project: Cassandra
  Issue Type: Improvement
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Dave Brosius
Priority: Minor
 Fix For: 2.1

 Attachments: 0001-Additional-migration-to-logback.patch, 5883-1.txt, 
 5883-additional1.txt, 5883.txt


 Logback has a number of advantages over log4j, and switching will be 
 straightforward since we are already using the slf4j translation layer: 
 http://logback.qos.ch/reasonsToSwitch.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Comment Edited] (CASSANDRA-5201) Cassandra/Hadoop does not support current Hadoop releases

2013-10-13 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13792563#comment-13792563
 ] 

Mck SembWever edited comment on CASSANDRA-5201 at 10/13/13 5:37 PM:


I've updated the github project so to be a 
[patch|https://github.com/michaelsembwever/cassandra-hadoop/commit/6d7555ea205354a606907e40c16db35072004594]
 off the InputFormat and OutputFormat classes as found in cassandra-1.2.10
It works against hadoop-0.22.0


was (Author: michaelsembwever):
I've updated the github project so to be a patch off the InputFormat and 
OutputFormat classes as found in cassandra-1.2.10
It works against hadoop-0.22.0

 Cassandra/Hadoop does not support current Hadoop releases
 -

 Key: CASSANDRA-5201
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5201
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.2.0
Reporter: Brian Jeltema
Assignee: Dave Brosius
 Attachments: 5201_a.txt


 Using Hadoop 0.22.0 with Cassandra results in the stack trace below.
 It appears that version 0.21+ changed org.apache.hadoop.mapreduce.JobContext
 from a class to an interface.
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
   at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at MyHadoopApp.run(MyHadoopApp.java:163)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
   at MyHadoopApp.main(MyHadoopApp.java:82)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:192)



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5883) Switch to Logback

2013-09-10 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762916#comment-13762916
 ] 

Mck SembWever commented on CASSANDRA-5883:
--

Has log4j2 been considered?
Both log4j (old) and logback has awful java code in its internals using 
synchronised blocks/methods.

log4j2 seems to take a large step forward here and ensures an application won't 
lock up in the same way a log4j/logback application can. Such contention locks 
are not unusual once you increase logging  in any high concurrent application.

log4j2 can be more than 1000x times faster…
http://logging.apache.org/log4j/2.x/manual/async.html#Performance


 Switch to Logback
 -

 Key: CASSANDRA-5883
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5883
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Tools
Reporter: Jonathan Ellis
Assignee: Dave Brosius
Priority: Minor
 Fix For: 2.1

 Attachments: 0001-Additional-migration-to-logback.patch, 5883-1.txt, 
 5883-additional1.txt, 5883.txt


 Logback has a number of advantages over log4j, and switching will be 
 straightforward since we are already using the slf4j translation layer: 
 http://logback.qos.ch/reasonsToSwitch.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2013-05-27 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13667902#comment-13667902
 ] 

Mck SembWever commented on CASSANDRA-2388:
--

{quote}The biggest problem is [avoiding endpoints in a different DC]. Maybe the 
way todo this is change getSplits logic to never return replicas in another DC. 
I think this would require adding DC info to the describe_ring call{quote}

Tasktrackers may have access to a set of datacenters, so this DC info needs 
contain a list of DCs.

For example, our setup separates datacenters by physical datacenter and 
hadoop-usage, like:{noformat}DC1 Production + Hadoop
  c*01 c*03
DC2 Production + Hadoop
  c*02 c*04
DC3 Production
  c*05
DC4 Production
  c*06{noformat}

So here we'd pass to getSplits() a DC info like DC1,DC2.
But the problem remain, given a task executing on c*01 that fails to connect to 
localhost, although we can now prevent a connection to DC3 or DC4, we can't 
favour a connection to any other split in DC1 over anything in DC2. Is this 
solvable? 

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.6
Reporter: Eldon Stegall
Assignee: Mck SembWever
Priority: Minor
  Labels: hadoop, inputformat
 Fix For: 1.2.6

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5201) Cassandra/Hadoop does not support current Hadoop releases

2013-05-19 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661509#comment-13661509
 ] 

Mck SembWever commented on CASSANDRA-5201:
--

What about simply putting the hadoop2 package into a github project?
it would become available for others to use, and c* can switch to it when they 
feel ready to drop support for hadoop-0.20

otherwise i'm in favour of separate jar files 
(apache-cassandra-hadoop-legacy-XXX.jar and apache-cassandra-hadoop-XXX.jar). 
c* already bundles too much into the one jar file IMHO.

 Cassandra/Hadoop does not support current Hadoop releases
 -

 Key: CASSANDRA-5201
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5201
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.2.0
Reporter: Brian Jeltema
Assignee: Dave Brosius
 Attachments: 5201_a.txt


 Using Hadoop 0.22.0 with Cassandra results in the stack trace below.
 It appears that version 0.21+ changed org.apache.hadoop.mapreduce.JobContext
 from a class to an interface.
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
   at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at MyHadoopApp.run(MyHadoopApp.java:163)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
   at MyHadoopApp.main(MyHadoopApp.java:82)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5201) Cassandra/Hadoop does not support current Hadoop releases

2013-05-19 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661512#comment-13661512
 ] 

Mck SembWever commented on CASSANDRA-5201:
--

{quote}What about simply putting the hadoop2 package into a github 
project?{quote}

Done @ https://github.com/michaelsembwever/cassandra-hadoop
 (i refactor the new package to hadoop1 instead of hadoop2, to better match the 
hadoop version we are actually supporting).

 Cassandra/Hadoop does not support current Hadoop releases
 -

 Key: CASSANDRA-5201
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5201
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.2.0
Reporter: Brian Jeltema
Assignee: Dave Brosius
 Attachments: 5201_a.txt


 Using Hadoop 0.22.0 with Cassandra results in the stack trace below.
 It appears that version 0.21+ changed org.apache.hadoop.mapreduce.JobContext
 from a class to an interface.
 Exception in thread main java.lang.IncompatibleClassChangeError: Found 
 interface org.apache.hadoop.mapreduce.JobContext, but class was expected
   at 
 org.apache.cassandra.hadoop.ColumnFamilyInputFormat.getSplits(ColumnFamilyInputFormat.java:103)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:445)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:462)
   at 
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:357)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1045)
   at org.apache.hadoop.mapreduce.Job$2.run(Job.java:1042)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1153)
   at org.apache.hadoop.mapreduce.Job.submit(Job.java:1042)
   at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1062)
   at MyHadoopApp.run(MyHadoopApp.java:163)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)
   at MyHadoopApp.main(MyHadoopApp.java:82)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at org.apache.hadoop.util.RunJar.main(RunJar.java:192)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2013-05-18 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661436#comment-13661436
 ] 

Mck SembWever commented on CASSANDRA-2388:
--

Jonathan,
 I can't say i'm in favour of enforcing data locality.
Because  data locality in hadoop doesn't work this way… when a tasktracker 
through the next heartbeat announces that it has a task slot free the 
jobtracker will do its best to assign a task with data locality to it but 
failing this will assign it a random task. the number of these random tasks can 
be quite high, just like i mentioned above
{quote} Tasks are still being evenly distributed around the ring regardless of 
what the ColumnFamilySplit.locations is. {quote}

This can be almost solved by upgrading to hadoop-0.21+, using the fair 
scheduler and setting the property {code}property
namemapred.fairscheduler.locality.delay/name
value36000/value
property{code}.

At the end of the day while hadoop encourages data locality it does not enforce 
it.
The ideal approach would be to sort all locations by proximity.
The feasible approach hopefully is still [~tjake]'s above. In addition i'd be 
in favour of a setting in the job's configuration as to whether a location from 
another datacenter can be used.

references:
 - http://www.infoq.com/articles/HadoopInputFormat
 - http://www.mentby.com/matei-zaharia/running-only-node-local-jobs.html
 - 
https://groups.google.com/a/cloudera.org/forum/?fromgroups#!topic/cdh-user/3ggnE5hV0PY
 - http://www.cs.berkeley.edu/~matei/papers/2010/eurosys_delay_scheduling.pdf

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.6
Reporter: Eldon Stegall
Assignee: Mck SembWever
Priority: Minor
  Labels: hadoop, inputformat
 Fix For: 1.2.5

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4977) Expose new SliceQueryFilter features through Thrift interface

2013-02-09 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575211#comment-13575211
 ] 

Mck SembWever commented on CASSANDRA-4977:
--

duplicate of CASSANDRA-2710 ?

 Expose new SliceQueryFilter features through Thrift interface
 -

 Key: CASSANDRA-4977
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4977
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Affects Versions: 1.2.0 beta 2
Reporter: aaa

 SliceQueryFilter has some very useful new features like ability to specify a 
 composite column prefix to group by and specify a limit of groups to return.
 This is very useful if for example I have a wide row with columns prefixed by 
 timestamp and I want to retrieve the latest columns, but I don't know the 
 column names. Say I have a row
 {{row - (t1, c1), (t1, c2)... (t1, cn) ... (t0,c1) ... etc}}
 Query slice range (t1,) group by prefix (1) limit (1)
 As a more general question, is the Thrift interface going to be kept 
 up-to-date with the feature changes or will it be left behind (a mistake IMO) 
 ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-4417) invalid counter shard detected

2012-11-07 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-4417:
-

Attachment: cassandra-mck.log.bz2

Sylvain, here's log from one node. For most of the log we were running 1.0.8. 
And then at line 2883399 we upgraded (and this was the first node to upgrade) 
to 1.1.6.

The error msg comes every few seconds.
Our counters are sub-columns inside supercolumns.
We completed the upgrade on all nodes. Then restarted again (because jna was 
missing).

We are now running upgradesstables but that's not in this logfile. The error 
msgs still appear.

An operational problem we're had recently is that we had one node down for ~one 
month (faulty raid controller) and when we finally brought the node back into 
the cluster nightly repairs would never finish. In the end we just disabled 
nightly repairs (we don't have tombstones) with the plan that an upgrade and 
upgradesstables would bring us back to a state where repairs would work again. 
I have no idea if this can be related. 

 invalid counter shard detected 
 ---

 Key: CASSANDRA-4417
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
 Environment: Amazon Linux
Reporter: Senthilvel Rangaswamy
 Attachments: cassandra-mck.log.bz2, err.txt


 Seeing errors like these:
 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; 
 (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and 
 (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick 
 highest to self-heal; this indicates a bug or corruption generated a bad 
 counter shard
 What does it mean ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (CASSANDRA-4417) invalid counter shard detected

2012-11-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492245#comment-13492245
 ] 

Mck SembWever edited comment on CASSANDRA-4417 at 11/7/12 10:20 AM:


Sylvain, here's log from one node. For most of the log we were running 1.0.8. 
And then at line 2883399 we upgraded (and this was the first node to upgrade) 
to 1.1.6.

The error msg comes every few seconds.
Our counters are sub-columns inside supercolumns.
We completed the upgrade on all nodes. Then restarted again (because jna was 
missing).

We are now running upgradesstables but that's not in this logfile. The error 
msgs still appear.

An operational problem we've had recently is that we had one node down for ~one 
month (faulty raid controller) and when we finally brought the node back into 
the cluster nightly repairs would never finish. In the end we just disabled 
nightly repairs (we don't have tombstones) with the plan that an upgrade and 
upgradesstables would bring us back to a state where repairs would work again. 
I have no idea if this can be related. 

  was (Author: michaelsembwever):
Sylvain, here's log from one node. For most of the log we were running 
1.0.8. And then at line 2883399 we upgraded (and this was the first node to 
upgrade) to 1.1.6.

The error msg comes every few seconds.
Our counters are sub-columns inside supercolumns.
We completed the upgrade on all nodes. Then restarted again (because jna was 
missing).

We are now running upgradesstables but that's not in this logfile. The error 
msgs still appear.

An operational problem we're had recently is that we had one node down for ~one 
month (faulty raid controller) and when we finally brought the node back into 
the cluster nightly repairs would never finish. In the end we just disabled 
nightly repairs (we don't have tombstones) with the plan that an upgrade and 
upgradesstables would bring us back to a state where repairs would work again. 
I have no idea if this can be related. 
  
 invalid counter shard detected 
 ---

 Key: CASSANDRA-4417
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4417
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.1
 Environment: Amazon Linux
Reporter: Senthilvel Rangaswamy
 Attachments: cassandra-mck.log.bz2, err.txt


 Seeing errors like these:
 2012-07-06_07:00:27.22662 ERROR 07:00:27,226 invalid counter shard detected; 
 (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 13) and 
 (17bfd850-ac52-11e1--6ecd0b5b61e7, 1, 1) differ only in count; will pick 
 highest to self-heal; this indicates a bug or corruption generated a bad 
 counter shard
 What does it mean ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-3237) refactor super column implmentation to use composite column names instead

2012-11-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492684#comment-13492684
 ] 

Mck SembWever commented on CASSANDRA-3237:
--

{quote}... in any case I'm not convinced a startup scrub is the right approach. 
I think that what we need is to write conversions functions...{quote}
Despite there being no startup scrub this still means that a manual `nodetool 
upgradesstables` will using these conversions functions to rewrite all sstables 
to composite columns?

 refactor super column implmentation to use composite column names instead
 -

 Key: CASSANDRA-3237
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3237
 Project: Cassandra
  Issue Type: Improvement
Reporter: Matthew F. Dennis
Priority: Minor
  Labels: ponies
 Fix For: 1.3

 Attachments: cassandra-supercolumn-irc.log


 super columns are annoying.  composite columns offer a better API and 
 performance.  people should use composites over super columns.  some people 
 are already using super columns.  C* should implement the super column API in 
 terms of composites to reduce code, complexity and testing as well as 
 increase performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (CASSANDRA-3237) refactor super column implmentation to use composite column names instead

2012-11-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492684#comment-13492684
 ] 

Mck SembWever edited comment on CASSANDRA-3237 at 11/7/12 8:47 PM:
---

{quote}... in any case I'm not convinced a startup scrub is the right approach. 
I think that what we need is to write conversions functions...{quote}
Despite there being no startup scrub this still means that a manual `nodetool 
upgradesstables` will use these conversions functions to rewrite all sstables 
to composite columns?

  was (Author: michaelsembwever):
{quote}... in any case I'm not convinced a startup scrub is the right 
approach. I think that what we need is to write conversions functions...{quote}
Despite there being no startup scrub this still means that a manual `nodetool 
upgradesstables` will using these conversions functions to rewrite all sstables 
to composite columns?
  
 refactor super column implmentation to use composite column names instead
 -

 Key: CASSANDRA-3237
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3237
 Project: Cassandra
  Issue Type: Improvement
Reporter: Matthew F. Dennis
Priority: Minor
  Labels: ponies
 Fix For: 1.3

 Attachments: cassandra-supercolumn-irc.log


 super columns are annoying.  composite columns offer a better API and 
 performance.  people should use composites over super columns.  some people 
 are already using super columns.  C* should implement the super column API in 
 terms of composites to reduce code, complexity and testing as well as 
 increase performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-26 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114220#comment-13114220
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/26/11 8:29 AM:
---

{{fullscan-example1.log}} is debug from a full scan job. It scans data over a 
full year (and since the cluster's ring range only holds 3 months of data such 
a job guarantees a full scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving 
new data (5-10k rows/second). This new data is being written directly using 
{{StorageProxy.mutate(..)}} with code somewhat similar to the second example in 
[wiki: ScribeToCassandra|http://wiki.apache.org/cassandra/ScribeToCassandra].

This cluster has {{binary_memtable_throughput_in_mb: 1024}} and {{Xmx8g}}. 
There are 3 nodes in the cluster each 48g ram and 24cpus.

  was (Author: michaelsembwever):
{{fullscan-example1.log}} is debug from a full scan job. It scans data 
over a full year (and since the cluster's ring range only holds 3 months of 
data such a job guarantees a full scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving 
new data (5-10k rows/second). This new data is being written directly using 
{{StorageProxy.mutate(..)}} with code somewhat similar to the second example in 
[wiki: ScribeToCassandra|http://wiki.apache.org/cassandra/ScribeToCassandra].
  
 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4, 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Fix For: 0.8.6

 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, 
 fullscan-example1.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-25 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114196#comment-13114196
 ] 

Mck SembWever commented on CASSANDRA-3150:
--

Yes i see now an CFRR that went to 600%. But it's still a long way from 
problematic.

My understanding of how the sampled keys is generated is it all happens when a 
sstable is read.
If a restart or a (manual) compact doesn't help, why did an upgrade help?

What can i investigate to provide more debug here?

 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4, 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Fix For: 0.8.6

 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-25 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3150:
-

Attachment: fullscan-example1.log

Here's debug from a full scan job. It scans data over a full year (and since 
the cluster's ring range only hold 3 months of data this job guarantees a full 
scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat}

 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4, 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Fix For: 0.8.6

 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, 
 fullscan-example1.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-25 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114220#comment-13114220
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/25/11 11:23 AM:


Here's debug from a full scan job. It scans data over a full year (and since 
the cluster's ring range only holds 3 months of data such a job guarantees a 
full scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat}

  was (Author: michaelsembwever):
Here's debug from a full scan job. It scans data over a full year (and 
since the cluster's ring range only hold 3 months of data this job guarantees a 
full scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat}
  
 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4, 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Fix For: 0.8.6

 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, 
 fullscan-example1.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-25 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114220#comment-13114220
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/25/11 11:27 AM:


Here's debug from a full scan job. It scans data over a full year (and since 
the cluster's ring range only holds 3 months of data such a job guarantees a 
full scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving 
new data (5-10k rows/second).

  was (Author: michaelsembwever):
Here's debug from a full scan job. It scans data over a full year (and 
since the cluster's ring range only holds 3 months of data such a job 
guarantees a full scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat}
  
 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4, 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Fix For: 0.8.6

 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, 
 fullscan-example1.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-25 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114220#comment-13114220
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/25/11 11:30 AM:


{{fullscan-example1.log}} is debug from a full scan job. It scans data over a 
full year (and since the cluster's ring range only holds 3 months of data such 
a job guarantees a full scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving 
new data (5-10k rows/second).

  was (Author: michaelsembwever):
Here's debug from a full scan job. It scans data over a full year (and 
since the cluster's ring range only holds 3 months of data such a job 
guarantees a full scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving 
new data (5-10k rows/second).
  
 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4, 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Fix For: 0.8.6

 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, 
 fullscan-example1.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-25 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114220#comment-13114220
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/25/11 11:32 AM:


{{fullscan-example1.log}} is debug from a full scan job. It scans data over a 
full year (and since the cluster's ring range only holds 3 months of data such 
a job guarantees a full scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving 
new data (5-10k rows/second). This new data is being written directly using 
{{StorageProxy.mutate(..)}} with code somewhat similar to the second example in 
[wiki: ScribeToCassandra|http://wiki.apache.org/cassandra/ScribeToCassandra].

  was (Author: michaelsembwever):
{{fullscan-example1.log}} is debug from a full scan job. It scans data 
over a full year (and since the cluster's ring range only holds 3 months of 
data such a job guarantees a full scan).

In the debug you see the splits.
{{`nodetool ring`}} gives

{noformat}Address DC  RackStatus State   Load   
 OwnsToken   
   
Token(bytes[5554])
152.90.241.22   DC1 RAC1Up Normal  16.65 GB33.33%  
Token(bytes[00])
152.90.241.23   DC2 RAC1Up Normal  63.22 GB33.33%  
Token(bytes[2aaa])
152.90.241.24   DC1 RAC1Up Normal  72.4 KB 33.33%  
Token(bytes[5554])
{noformat}

The problematic split ends up being 
{noformat}ColumnFamilySplit{startToken='0528cbe0b2b5ff6b816c68b59973bcbc', 
endToken='2aaa', 
dataNodes=[cassandra02.finn.no]}{noformat} This is the split that's receiving 
new data (5-10k rows/second).
  
 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4, 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Fix For: 0.8.6

 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log, 
 fullscan-example1.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-24 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3150:
-

Affects Version/s: 0.8.5

 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4, 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-24 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever resolved CASSANDRA-3150.
--

   Resolution: Cannot Reproduce
Fix Version/s: 0.8.6

An upgrade to 0.8.6 /seems/ to have fixed this.
I saw one map task (CFRR) reach 150% but nothing like i was seeing before.

Maybe there is something else happening in the upgrade process that fixed it?

Otherwise i'm happy to close this issue for now and reopen if the problem 
exacerbates again.

 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4, 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Fix For: 0.8.6

 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-23 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113326#comment-13113326
 ] 

Mck SembWever commented on CASSANDRA-3150:
--

This is persisting to be a problem. And continuously gets worse through the day.
I can see like half my cf being read from one split.
It doesn't matter if it's ByteOrderingPartition or RandomPartitioner.
I've attached two screenshots from hadoop webpages.

 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-23 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3150:
-

Attachment: Screenshot-Hadoop map task list for job_201109212019_1060 on 
cassandra01 - Mozilla Firefox.png

 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, Screenshot-Hadoop map task list 
 for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-23 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113326#comment-13113326
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/23/11 11:04 AM:


This is persisting to be a problem. And continuously gets worse through the day.
I can see like half my cf being read from one split.
It doesn't matter if it's ByteOrderingPartition or RandomPartitioner.
I've attached two screenshots from hadoop webpages (you see 36million map input 
records when cassandra.input.split.size is set to 393216).

  was (Author: michaelsembwever):
This is persisting to be a problem. And continuously gets worse through the 
day.
I can see like half my cf being read from one split.
It doesn't matter if it's ByteOrderingPartition or RandomPartitioner.
I've attached two screenshots from hadoop webpages.
  
 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-23 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3150:
-

Attachment: Screenshot-Counters for task_201109212019_1060_m_29 - 
Mozilla Firefox.png

 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-23 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113326#comment-13113326
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/23/11 12:05 PM:


This is persisting to be a problem. And continuously gets worse through the day.
I can see like half my cf being read from one split.
It doesn't matter if it's ByteOrderingPartition or RandomPartitioner.
I've attached two screenshots from hadoop webpages (you see 36million map input 
records when cassandra.input.split.size is set to 393216).

bq. ... double-check how many rows there really are in the given split range.
I can confirm that if i let these jobs run through to their completion (despite 
how terribly long they may take) that the results are correct. It would seem 
that the split ranges are incorrect (not the rows within in them).

  was (Author: michaelsembwever):
This is persisting to be a problem. And continuously gets worse through the 
day.
I can see like half my cf being read from one split.
It doesn't matter if it's ByteOrderingPartition or RandomPartitioner.
I've attached two screenshots from hadoop webpages (you see 36million map input 
records when cassandra.input.split.size is set to 393216).
  
 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-23 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13113326#comment-13113326
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/23/11 7:30 PM:
---

This is persisting to be a problem. And continuously gets worse through the day.
I can see two thirds of my cf being read from one split.
It doesn't matter if it's ByteOrderingPartition or RandomPartitioner. Although 
i'm having a greater problem with the former. A compact and restart didn't 
help. 
I've attached two screenshots from hadoop webpages (you see 36million map input 
records when cassandra.input.split.size is set to 393216).

bq. ... double-check how many rows there really are in the given split range.
I can confirm that if i let these jobs run through to their completion (despite 
how terribly long they may take) that the results are correct. It would seem 
that the split ranges are incorrect (not the rows within in them).

  was (Author: michaelsembwever):
This is persisting to be a problem. And continuously gets worse through the 
day.
I can see like half my cf being read from one split.
It doesn't matter if it's ByteOrderingPartition or RandomPartitioner.
I've attached two screenshots from hadoop webpages (you see 36million map input 
records when cassandra.input.split.size is set to 393216).

bq. ... double-check how many rows there really are in the given split range.
I can confirm that if i let these jobs run through to their completion (despite 
how terribly long they may take) that the results are correct. It would seem 
that the split ranges are incorrect (not the rows within in them).
  
 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, Screenshot-Counters for 
 task_201109212019_1060_m_29 - Mozilla Firefox.png, Screenshot-Hadoop map 
 task list for job_201109212019_1060 on cassandra01 - Mozilla Firefox.png, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3197) Separate input and output connection details in ConfigHelper

2011-09-13 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3197:
-

Attachment: CASSANDRA-3197-extra.patch

patch for contrib/pig 

 Separate input and output connection details in ConfigHelper
 

 Key: CASSANDRA-3197
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3197
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
 Attachments: CASSANDRA-3197-extra.patch, CASSANDRA-3197.patch


 Currently ConfigHelper's getInitialAddress(..) getRpcPort(..) and 
 getPartitioner(..) 
 presume CFIF will be using the same cluster as CFOF.
 The latter two are a problem for me as on the same servers i'm running two 
 clusters, one w/ ByteOrderingPartitioner and the other with RP), and i would 
 like to read from the BOP cluster and write to the RP cluster.
 getInitialAddress(..) is of little concern to me.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3197) Separate input and output connection details in ConfigHelper

2011-09-13 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3197:
-

Fix Version/s: 0.8.6

 Separate input and output connection details in ConfigHelper
 

 Key: CASSANDRA-3197
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3197
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
 Fix For: 0.8.6

 Attachments: CASSANDRA-3197-extra.patch, CASSANDRA-3197.patch


 Currently ConfigHelper's getInitialAddress(..) getRpcPort(..) and 
 getPartitioner(..) 
 presume CFIF will be using the same cluster as CFOF.
 The latter two are a problem for me as on the same servers i'm running two 
 clusters, one w/ ByteOrderingPartitioner and the other with RP), and i would 
 like to read from the BOP cluster and write to the RP cluster.
 getInitialAddress(..) is of little concern to me.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of whack)

2011-09-12 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3150:
-

Summary: ColumnFormatRecordReader loops forever 
(StorageService.getSplits(..) out of whack)  (was: ColumnFormatRecordReader 
loops forever)

 ColumnFormatRecordReader loops forever (StorageService.getSplits(..) out of 
 whack)
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-10 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3150:
-

Attachment: attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log

Debug from a task that was still running at 1200%

The initial split for this CFRR is 
30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 
303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8

This job was run with 
 cassandra.input.split.size=196608
 cassandra.range.batch.size=16000

therefore there shouldn't be more than 13 calls to get_range_slices(..) in this 
task. There was already 166 calls in this log.


What i can see here is that the original split for this task is just way too 
big and this comes from {{describe_splits(..)}}

 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-10 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102062#comment-13102062
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/10/11 6:02 PM:
---

Debug from a task that was still running at 1200%

The initial split for this CFRR is 
30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 
303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8

This job was run with 
 cassandra.input.split.size=196608
 cassandra.range.batch.size=16000

therefore there shouldn't be more than 13 calls to get_range_slices(..) in this 
task. There was already 166 calls in this log.


What i can see here is that the original split for this task is just way too 
big and this comes from {{describe_splits(..)}}
which in turn depends on index_interval. Reading 
{{StorageService.getSplits(..)}} i would guess that the split can in fact 
contain many more keys with the default sampling of 128. Question is how low 
can/should i bring index_interval ?

  was (Author: michaelsembwever):
Debug from a task that was still running at 1200%

The initial split for this CFRR is 
30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 
303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8

This job was run with 
 cassandra.input.split.size=196608
 cassandra.range.batch.size=16000

therefore there shouldn't be more than 13 calls to get_range_slices(..) in this 
task. There was already 166 calls in this log.


What i can see here is that the original split for this task is just way too 
big and this comes from {{describe_splits(..)}}
  
 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-10 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102062#comment-13102062
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/10/11 6:04 PM:
---

Debug from a task that was still running at 1200%

The initial split for this CFRR is 
30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 
303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8

This job was run with 
 cassandra.input.split.size=196608
 cassandra.range.batch.size=16000

therefore there shouldn't be more than 13 calls to get_range_slices(..) in this 
task. There was already 166 calls in this log.


What i can see here is that the original split for this task is just way too 
big and this comes from {{describe_splits(..)}}
which in turn depends on index_interval. Reading 
{{StorageService.getSplits(..)}} i would guess that the split can in fact 
contain many more keys with the default sampling of 128. Question is how low 
can/should i bring index_interval (this cf can have up to 8 billion rows)?

  was (Author: michaelsembwever):
Debug from a task that was still running at 1200%

The initial split for this CFRR is 
30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8 : 
303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8

This job was run with 
 cassandra.input.split.size=196608
 cassandra.range.batch.size=16000

therefore there shouldn't be more than 13 calls to get_range_slices(..) in this 
task. There was already 166 calls in this log.


What i can see here is that the original split for this task is just way too 
big and this comes from {{describe_splits(..)}}
which in turn depends on index_interval. Reading 
{{StorageService.getSplits(..)}} i would guess that the split can in fact 
contain many more keys with the default sampling of 128. Question is how low 
can/should i bring index_interval ?
  
 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-10 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102115#comment-13102115
 ] 

Mck SembWever commented on CASSANDRA-3150:
--

There's a lot to learn about cassandra so forgive my ignorance in so many areas.
So how can {{StorageService.getSplits(..)}} be so out of whack? Is there 
anything i can tune to better this situation?
(Or is there any other debug i can provide?)

 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch, 
 attempt_201109071357_0044_m_003040_0.grep-get_range_slices.log


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-09-08 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100093#comment-13100093
 ] 

Mck SembWever commented on CASSANDRA-2388:
--

In the meantime could we make this behavior configurable.
eg replace CFRR:176 with something like
{noformat}
if(ConfigHelper.isDataLocalityDisabled())
{
return split.getLocations()[0];
}
else
{
throw new UnsupportedOperationException(no local connection 
available);
}{noformat}


 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.6
Reporter: Eldon Stegall
Assignee: Mck SembWever
  Labels: hadoop, inputformat
 Fix For: 0.8.6

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-09-08 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100717#comment-13100717
 ] 

Mck SembWever commented on CASSANDRA-2388:
--

Well that would work for me, was only thinking you want to push a default 
behavior (especially for those using a RP). 
But I think a better understanding (at least from me) of hadoop's task 
scheduling is required before enforcing data locality, as as-is it certainly 
doesn't work.

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.6
Reporter: Eldon Stegall
Assignee: Mck SembWever
  Labels: hadoop, inputformat
 Fix For: 0.8.6

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-09-08 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13100717#comment-13100717
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 9/8/11 9:37 PM:
--

Well that would work for me, was only thinking you want to push a default 
behavior (especially for those using a RP). 
But I think a better understanding (at least from me) of hadoop's task 
scheduling is required before enforcing data locality, as as-is it certainly 
doesn't work for all.

  was (Author: michaelsembwever):
Well that would work for me, was only thinking you want to push a default 
behavior (especially for those using a RP). 
But I think a better understanding (at least from me) of hadoop's task 
scheduling is required before enforcing data locality, as as-is it certainly 
doesn't work.
  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.6
Reporter: Eldon Stegall
Assignee: Mck SembWever
  Labels: hadoop, inputformat
 Fix For: 0.8.6

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3108) Make Range and Bounds objects client-safe

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098810#comment-13098810
 ] 

Mck SembWever commented on CASSANDRA-3108:
--

No, in itself it doesn't implement wrapping ranges. But by making 
intersectionBothWrapping(..) and intersectionOneWrapping(..) client safe it 
makes it possible to implement.
I think the best explanation to your question Jonathan is to look at the patch 
available in CASSANDRA-3137

 Make Range and Bounds objects client-safe
 -

 Key: CASSANDRA-3108
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Jonathan Ellis
Assignee: Mck SembWever
  Labels: hadoop
 Fix For: 0.8.5

 Attachments: 3108.txt


 From Mck's comment on CASSANDRA-1125:
 Something broke here in production once we went out with 0.8.2. It may have 
 been some poor testing, i'm not entirely sure and a little surprised.
 CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call 
 to new Range(token, token) which calls StorageService.getPartitioner() and 
 StorageService is null as we're not inside the server.
 A quick fix is to change Range:148 from new Range(token, token) to new 
 Range(token, token, partitioner) making the presumption that the partitioner 
 for the new Range will be the same as this Range.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099124#comment-13099124
 ] 

Mck SembWever edited comment on CASSANDRA-3137 at 9/7/11 5:37 PM:
--

Indeed. I could be using this asap.

The use case is...
We're using a ByteOrderedPartition because we run incremental hadoop jobs over 
one of our column families where events initially come in. This cf has RF=1 
and time-based UUID keys that are manipulated so that their byte ordering are 
time ordered. (the byte-unsigned timestamp put up front). Each column has ttl 
of 3 months.
After 3 months of data we saw all data on one node. Now i understand as the 
token range is the timestamp range which is from 1970 to 2270 so of course our 
3 month period fell on one node (with a 3 node cluster even 100 years would 
fall on one node).

To properly manage this cf we need to either continuously move nodes around, a 
cumbersome operation, or change the key so it's prefixed with {{timestamp % 
3months}}. This would allow 3 months of data to cycle over the whole cluster 
and wrap around again. Obviously we're leaning towards the latter solution as 
it simplifies operations. But it does require this patch.

(When CFIF supports IndexClause everything changes, we change our cluster to 
RandomPartitioner, use secondary indexes, and never look back...)

  was (Author: michaelsembwever):
Indeed. I could be using this asap.
The use case is...
We're using a ByteOrderedPartition because we run incremental hadoop jobs over 
one of our column families where events initially come in. This cf has RF=1 
and time-based UUID keys that are manipulated so that their byte ordering are 
time ordered. (mostSigBits extracted and the byte-unsigned timestamp put up 
front). Each column has ttl of 3 months.
After 3 months of data we saw all data on one node. Now i understand as the 
token range is the timestamp range which is from 1970 to 2270 so of course our 
3 month period fell on one node (with 3 node cluster even 100 years would fall 
on one node).

To properly manage this cf we need to either continuous move nodes around, a 
cumbersome operation, or change the key so it's prefixed with {{timestamp % 
3months}}. This would allow 3 months of data to cycle over the whole cluster 
and wrap around again. Obviously we're leaning towards the latter solution as 
it simplifies operations. But it does require this patch.

(When CFIF supports IndexClause everything changes, we change our cluster to 
RandomPartitioner, use secondary indexes, and never look back...)
  
 Implement wrapping intersections for ConfigHelper's InputKeyRange
 -

 Key: CASSANDRA-3137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
 Attachments: CASSANDRA-3137.patch, CASSANDRA-3137.patch


 Before there was no support for multiple intersections between the split's 
 range and the job's configured range.
 After CASSANDRA-3108 it is now possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099124#comment-13099124
 ] 

Mck SembWever commented on CASSANDRA-3137:
--

Indeed. I could be using this asap.
The use case is...
We're using a ByteOrderedPartition because we run incremental hadoop jobs over 
one of our column families where events initially come in. This cf has RF=1 
and time-based UUID keys that are manipulated so that their byte ordering are 
time ordered. (mostSigBits extracted and the byte-unsigned timestamp put up 
front). Each column has ttl of 3 months.
After 3 months of data we saw all data on one node. Now i understand as the 
token range is the timestamp range which is from 1970 to 2270 so of course our 
3 month period fell on one node (with 3 node cluster even 100 years would fall 
on one node).

To properly manage this cf we need to either continuous move nodes around, a 
cumbersome operation, or change the key so it's prefixed with {{timestamp % 
3months}}. This would allow 3 months of data to cycle over the whole cluster 
and wrap around again. Obviously we're leaning towards the latter solution as 
it simplifies operations. But it does require this patch.

(When CFIF supports IndexClause everything changes, we change our cluster to 
RandomPartitioner, use secondary indexes, and never look back...)

 Implement wrapping intersections for ConfigHelper's InputKeyRange
 -

 Key: CASSANDRA-3137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
 Attachments: CASSANDRA-3137.patch, CASSANDRA-3137.patch


 Before there was no support for multiple intersections between the split's 
 range and the job's configured range.
 After CASSANDRA-3108 it is now possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-07 Thread Mck SembWever (JIRA)
ColumnFormatRecordReader loops forever
--

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical


From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039

{quote}
bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
bq. CFIF's inputSplitSize=196608
bq. 3 map tasks (from 4013) is still running after read 25 million rows.
bq. Can this be a bug in StorageService.getSplits(..) ?

getSplits looks pretty foolproof to me but I guess we'd need to add
more debug logging to rule out a bug there for sure.

I guess the main alternative would be a bug in the recordreader paging.
{quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-07 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3150:
-

Attachment: CASSANDRA-3150.patch

If the split's end token does not match any of the row key tokens the 
RowIterator will never stop (see RowIterator:243)

This patch 1) presumes this is the problem, 2) compares each row token with the 
split end token and exits when need be (which only works on order preserving 
partitioners, and 3) stops iterating when totalRowCount has been read.

Just (3) has been tested and works.

 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099234#comment-13099234
 ] 

Mck SembWever commented on CASSANDRA-3150:
--

Here keyRange is startToken to split.getEndToken()
startToken is updated each iterate to the last row read (each iterate is 
batchRowCount rows).

What happens is split.getEndToken() doesn't correspond to any of the rowKeys?
To me it reads that startToken will hop over split.getEndToken() and 
get_rage_slices(..) will start returning wrapping ranges. This will still 
return rows and so the iteration will continue, now forever.

The only way out for this code today is a) startToken equals 
split.getEndToken(), or b) get_range_slices(..) is called with startToken 
equals split.getEndToken() OR a gap so small there exists no rows in between.

 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099234#comment-13099234
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/7/11 7:17 PM:
--

Here keyRange is startToken to split.getEndToken()
startToken is updated each iterate to the last row read (each iterate is 
batchRowCount rows).

What happens if split.getEndToken() doesn't correspond to any of the rowKeys?
To me it reads that startToken will hop over split.getEndToken() and 
get_range_slices(..) will start returning wrapping ranges. This will still 
return rows and so the iteration will continue, now forever.

The only way out for this code today is a) startToken equals 
split.getEndToken(), or b) get_range_slices(..) is called with startToken 
equals split.getEndToken() OR a gap so small there exists no rows in between.

  was (Author: michaelsembwever):
Here keyRange is startToken to split.getEndToken()
startToken is updated each iterate to the last row read (each iterate is 
batchRowCount rows).

What happens is split.getEndToken() doesn't correspond to any of the rowKeys?
To me it reads that startToken will hop over split.getEndToken() and 
get_rage_slices(..) will start returning wrapping ranges. This will still 
return rows and so the iteration will continue, now forever.

The only way out for this code today is a) startToken equals 
split.getEndToken(), or b) get_range_slices(..) is called with startToken 
equals split.getEndToken() OR a gap so small there exists no rows in between.
  
 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099234#comment-13099234
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/7/11 7:24 PM:
--

Here keyRange is startToken to split.getEndToken()
startToken is updated each iterate to the last row read (each iterate is 
batchRowCount rows).

What happens if split.getEndToken() doesn't correspond to any of the rowKeys?
To me it reads that startToken will hop over split.getEndToken() and 
get_range_slices(..) will start querying against wrapping ranges. This will 
still return rows and so the iteration will continue, now forever.

The only way out for this code today is a) startToken equals 
split.getEndToken(), or b) get_range_slices(..) is called with startToken 
equals split.getEndToken() OR a gap so small there exists no rows in between.

  was (Author: michaelsembwever):
Here keyRange is startToken to split.getEndToken()
startToken is updated each iterate to the last row read (each iterate is 
batchRowCount rows).

What happens if split.getEndToken() doesn't correspond to any of the rowKeys?
To me it reads that startToken will hop over split.getEndToken() and 
get_range_slices(..) will start returning wrapping ranges. This will still 
return rows and so the iteration will continue, now forever.

The only way out for this code today is a) startToken equals 
split.getEndToken(), or b) get_range_slices(..) is called with startToken 
equals split.getEndToken() OR a gap so small there exists no rows in between.
  
 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099254#comment-13099254
 ] 

Mck SembWever commented on CASSANDRA-3150:
--

What about the case where tokens of different length exist.
I don't know if this is actually possible but from 
{noformat}
Address Status State   LoadOwnsToken
   
   
Token(bytes[76118303760208547436305468318170713656])
152.90.241.22   Up Normal  270.46 GB   33.33%  
Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8])
152.90.241.24   Up Normal  247.89 GB   33.33%  
Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8])
152.90.241.23   Up Normal  1.1 TB  33.33%  
Token(bytes[76118303760208547436305468318170713656])
{noformat}
you see the real tokens are very long compared to the initial_tokens the 
cluster was configured with. (The two long tokens has since been moved, and to 
note the load on .23 never decreased to ~300GB as it should have...).

 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099254#comment-13099254
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/7/11 7:38 PM:
--

What about the case where tokens of different length exist.
I don't know if this is actually possible but from 
{noformat}
Address Status State   LoadOwnsToken
   
   
Token(bytes[76118303760208547436305468318170713656])
152.90.241.22   Up Normal  270.46 GB   33.33%  
Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8])
152.90.241.24   Up Normal  247.89 GB   33.33%  
Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8])
152.90.241.23   Up Normal  1.1 TB  33.33%  
Token(bytes[76118303760208547436305468318170713656])
{noformat}
you see the real tokens are very long compared to the initial_tokens the 
cluster was configured with. (The two long tokens have been moved off their 
initial_tokens, and to note the load on .23 never decreased to ~300GB as it 
should have...).

  was (Author: michaelsembwever):
What about the case where tokens of different length exist.
I don't know if this is actually possible but from 
{noformat}
Address Status State   LoadOwnsToken
   
   
Token(bytes[76118303760208547436305468318170713656])
152.90.241.22   Up Normal  270.46 GB   33.33%  
Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8])
152.90.241.24   Up Normal  247.89 GB   33.33%  
Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8])
152.90.241.23   Up Normal  1.1 TB  33.33%  
Token(bytes[76118303760208547436305468318170713656])
{noformat}
you see the real tokens are very long compared to the initial_tokens the 
cluster was configured with. (The two long tokens has since been moved, and to 
note the load on .23 never decreased to ~300GB as it should have...).
  
 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099254#comment-13099254
 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/7/11 7:43 PM:
--

What about the case where tokens of different length exist. Could 
get_range_slices be busted there?
I don't know if this is actually possible but from 
{noformat}
Address Status State   LoadOwnsToken
   
   
Token(bytes[76118303760208547436305468318170713656])
152.90.241.22   Up Normal  270.46 GB   33.33%  
Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8])
152.90.241.24   Up Normal  247.89 GB   33.33%  
Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8])
152.90.241.23   Up Normal  1.1 TB  33.33%  
Token(bytes[76118303760208547436305468318170713656])
{noformat}
you see the real tokens are very long compared to the initial_tokens the 
cluster was configured with. (The two long tokens have been moved off their 
initial_tokens, and to note the load on .23 never decreased to ~300GB as it 
should have...).

  was (Author: michaelsembwever):
What about the case where tokens of different length exist.
I don't know if this is actually possible but from 
{noformat}
Address Status State   LoadOwnsToken
   
   
Token(bytes[76118303760208547436305468318170713656])
152.90.241.22   Up Normal  270.46 GB   33.33%  
Token(bytes[30303030303031333131313739353337303038d4e7f72db2ed11e09d7c68b59973a5d8])
152.90.241.24   Up Normal  247.89 GB   33.33%  
Token(bytes[303030303030313331323631393735313231381778518cc00711e0acb968b59973a5d8])
152.90.241.23   Up Normal  1.1 TB  33.33%  
Token(bytes[76118303760208547436305468318170713656])
{noformat}
you see the real tokens are very long compared to the initial_tokens the 
cluster was configured with. (The two long tokens have been moved off their 
initial_tokens, and to note the load on .23 never decreased to ~300GB as it 
should have...).
  
 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099269#comment-13099269
 ] 

Mck SembWever commented on CASSANDRA-3150:
--

bq. BOP is sorting the shorter token correctly since 3  7.
Sorry, so that doesn't explain this bug?

bq. Load won't decrease until you run cleanup.
Never worked.
repair and cleanup is run every night, the move was done one week ago and more 
than a couple of weeks ago.

 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

2011-09-07 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099943#comment-13099943
 ] 

Mck SembWever commented on CASSANDRA-3150:
--

I'll try and put debug in so i can get a log of get_slice_range calls from 
CFRR... (this may take some days)

 ColumnFormatRecordReader loops forever
 --

 Key: CASSANDRA-3150
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Critical
 Attachments: CASSANDRA-3150.patch


 From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
 {quote}
 bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
 bq. CFIF's inputSplitSize=196608
 bq. 3 map tasks (from 4013) is still running after read 25 million rows.
 bq. Can this be a bug in StorageService.getSplits(..) ?
 getSplits looks pretty foolproof to me but I guess we'd need to add
 more debug logging to rule out a bug there for sure.
 I guess the main alternative would be a bug in the recordreader paging.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges

2011-09-05 Thread Mck SembWever (JIRA)
Allow CFIF to keep going despite unavailable ranges
---

 Key: CASSANDRA-3136
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3136
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Mck SembWever
Priority: Minor


From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902

use-case-1
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
/use-case-1

use-case-2
trying to extract a small random sample (like a pig SAMPLE) of data out of 
cassandra.
/use-case-2

use-case-3
searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later. 
For example such a job could be run at regular intervals in the day until a hit 
was found.
/use-case-3

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges

2011-09-05 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3136:
-

Description: 
From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902

use-case-1 from=Patrik Modesto
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
/use-case-1

use-case-2
trying to extract a small random sample (like a pig SAMPLE) of data out of 
cassandra.
/use-case-2

use-case-3
searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later. 
For example such a job could be run at regular intervals in the day until a hit 
was found.
/use-case-3

  was:
From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902

use-case-1
We use Cassandra as a storage for web-pages, we store the HTML, all
URLs that has the same HTML data and some computed data. We run Hadoop
MR jobs to compute lexical and thematical data for each page and for
exporting the data to a binary files for later use. URL gets to a
Cassandra on user request (a pageview) so if we delete an URL, it gets
back quickly if the page is active. Because of that and because there
is lots of data, we have the keyspace set to RF=1. We can drop the
whole keyspace and it will regenerate quickly and would contain only
fresh data, so we don't care about lossing a node.
/use-case-1

use-case-2
trying to extract a small random sample (like a pig SAMPLE) of data out of 
cassandra.
/use-case-2

use-case-3
searching for something or some-pattern and one hit
is enough. If you get the hit it's a positive result regardless if
ranges were ignored, if you don't and you *know* there was a range
ignored along the way you can re-run the job later. 
For example such a job could be run at regular intervals in the day until a hit 
was found.
/use-case-3


 Allow CFIF to keep going despite unavailable ranges
 ---

 Key: CASSANDRA-3136
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3136
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Mck SembWever
Priority: Minor

 From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902
 use-case-1 from=Patrik Modesto
 We use Cassandra as a storage for web-pages, we store the HTML, all
 URLs that has the same HTML data and some computed data. We run Hadoop
 MR jobs to compute lexical and thematical data for each page and for
 exporting the data to a binary files for later use. URL gets to a
 Cassandra on user request (a pageview) so if we delete an URL, it gets
 back quickly if the page is active. Because of that and because there
 is lots of data, we have the keyspace set to RF=1. We can drop the
 whole keyspace and it will regenerate quickly and would contain only
 fresh data, so we don't care about lossing a node.
 /use-case-1
 use-case-2
 trying to extract a small random sample (like a pig SAMPLE) of data out of 
 cassandra.
 /use-case-2
 use-case-3
 searching for something or some-pattern and one hit
 is enough. If you get the hit it's a positive result regardless if
 ranges were ignored, if you don't and you *know* there was a range
 ignored along the way you can re-run the job later. 
 For example such a job could be run at regular intervals in the day until a 
 hit was found.
 /use-case-3

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange

2011-09-05 Thread Mck SembWever (JIRA)
Implement wrapping intersections for ConfigHelper's InputKeyRange
-

 Key: CASSANDRA-3137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.4
Reporter: Mck SembWever
Assignee: Mck SembWever


Before there was no support for multiple intersections between the split's 
range and the job's configured range.
After CASSANDRA-3108 it is now possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3108) Make Range and Bounds objects client-safe

2011-09-05 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097126#comment-13097126
 ] 

Mck SembWever commented on CASSANDRA-3108:
--

Didn't see it until now but your patch Jonathan removes the limitation that 
ConfigHelper's InputKeyRange cannot wrap.
I've entered CASSANDRA-3137 to allow wrapping intersections in 
{{ColumnFamilyInputFormat}}.

 Make Range and Bounds objects client-safe
 -

 Key: CASSANDRA-3108
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Jonathan Ellis
Assignee: Mck SembWever
  Labels: hadoop
 Fix For: 0.8.5

 Attachments: 3108.txt


 From Mck's comment on CASSANDRA-1125:
 Something broke here in production once we went out with 0.8.2. It may have 
 been some poor testing, i'm not entirely sure and a little surprised.
 CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call 
 to new Range(token, token) which calls StorageService.getPartitioner() and 
 StorageService is null as we're not inside the server.
 A quick fix is to change Range:148 from new Range(token, token) to new 
 Range(token, token, partitioner) making the presumption that the partitioner 
 for the new Range will be the same as this Range.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange

2011-09-05 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3137:
-

Affects Version/s: (was: 0.8.4)
   0.8.5

 Implement wrapping intersections for ConfigHelper's InputKeyRange
 -

 Key: CASSANDRA-3137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever

 Before there was no support for multiple intersections between the split's 
 range and the job's configured range.
 After CASSANDRA-3108 it is now possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-3137) Implement wrapping intersections for ConfigHelper's InputKeyRange

2011-09-05 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-3137:
-

Attachment: CASSANDRA-3137.patch

Haven't tested this (with real data) yet.

But the code looks pretty simple and straight forward here...

 Implement wrapping intersections for ConfigHelper's InputKeyRange
 -

 Key: CASSANDRA-3137
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3137
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.8.5
Reporter: Mck SembWever
Assignee: Mck SembWever
 Attachments: CASSANDRA-3137.patch


 Before there was no support for multiple intersections between the split's 
 range and the job's configured range.
 After CASSANDRA-3108 it is now possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3136) Allow CFIF to keep going despite unavailable ranges

2011-09-05 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097280#comment-13097280
 ] 

Mck SembWever commented on CASSANDRA-3136:
--

Ok... it was mentioned in CASSANDRA-2388 (by Patrik Modesto). but no one there 
paid it any attention as it didn't belong to that issue.

 Allow CFIF to keep going despite unavailable ranges
 ---

 Key: CASSANDRA-3136
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3136
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Mck SembWever
Priority: Minor

 From http://thread.gmane.org/gmane.comp.db.cassandra.user/18902
 use-case-1 from=Patrik Modesto
 We use Cassandra as a storage for web-pages, we store the HTML, all
 URLs that has the same HTML data and some computed data. We run Hadoop
 MR jobs to compute lexical and thematical data for each page and for
 exporting the data to a binary files for later use. URL gets to a
 Cassandra on user request (a pageview) so if we delete an URL, it gets
 back quickly if the page is active. Because of that and because there
 is lots of data, we have the keyspace set to RF=1. We can drop the
 whole keyspace and it will regenerate quickly and would contain only
 fresh data, so we don't care about lossing a node.
 /use-case-1
 use-case-2
 trying to extract a small random sample (like a pig SAMPLE) of data out of 
 cassandra.
 /use-case-2
 use-case-3
 searching for something or some-pattern and one hit
 is enough. If you get the hit it's a positive result regardless if
 ranges were ignored, if you don't and you *know* there was a range
 ignored along the way you can re-run the job later. 
 For example such a job could be run at regular intervals in the day until a 
 hit was found.
 /use-case-3

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3108) Make Range and Bounds objects client-safe

2011-09-05 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097282#comment-13097282
 ] 

Mck SembWever commented on CASSANDRA-3108:
--

You drastically removed the usage of the {{Range(left, right)}} constructor so 
that even the usage of {{intersectionBothWrapping(..)}} and 
{{intersectionOneWrapping(..)}} avoids any server-side calls.

 Make Range and Bounds objects client-safe
 -

 Key: CASSANDRA-3108
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Jonathan Ellis
Assignee: Mck SembWever
  Labels: hadoop
 Fix For: 0.8.5

 Attachments: 3108.txt


 From Mck's comment on CASSANDRA-1125:
 Something broke here in production once we went out with 0.8.2. It may have 
 been some poor testing, i'm not entirely sure and a little surprised.
 CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call 
 to new Range(token, token) which calls StorageService.getPartitioner() and 
 StorageService is null as we're not inside the server.
 A quick fix is to change Range:148 from new Range(token, token) to new 
 Range(token, token, partitioner) making the presumption that the partitioner 
 for the new Range will be the same as this Range.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3108) Make Range and Bounds objects client-safe

2011-09-05 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097282#comment-13097282
 ] 

Mck SembWever edited comment on CASSANDRA-3108 at 9/5/11 7:41 PM:
--

You drastically removed the usage of the {{Range(left, right)}} constructor so 
that even the usage of {{intersectionBothWrapping(..)}} and 
{{intersectionOneWrapping(..)}} avoids any server-side calls.

In CFIF there AFAIK doesn't seem any other limitation to wrapping ranges...

  was (Author: michaelsembwever):
You drastically removed the usage of the {{Range(left, right)}} constructor 
so that even the usage of {{intersectionBothWrapping(..)}} and 
{{intersectionOneWrapping(..)}} avoids any server-side calls.

It CFIF there AFAIK doesn't seem any other limitation to wrapping ranges...
  
 Make Range and Bounds objects client-safe
 -

 Key: CASSANDRA-3108
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Jonathan Ellis
Assignee: Mck SembWever
  Labels: hadoop
 Fix For: 0.8.5

 Attachments: 3108.txt


 From Mck's comment on CASSANDRA-1125:
 Something broke here in production once we went out with 0.8.2. It may have 
 been some poor testing, i'm not entirely sure and a little surprised.
 CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call 
 to new Range(token, token) which calls StorageService.getPartitioner() and 
 StorageService is null as we're not inside the server.
 A quick fix is to change Range:148 from new Range(token, token) to new 
 Range(token, token, partitioner) making the presumption that the partitioner 
 for the new Range will be the same as this Range.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3108) Make Range and Bounds objects client-safe

2011-09-05 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13097282#comment-13097282
 ] 

Mck SembWever edited comment on CASSANDRA-3108 at 9/5/11 7:41 PM:
--

You drastically removed the usage of the {{Range(left, right)}} constructor so 
that even the usage of {{intersectionBothWrapping(..)}} and 
{{intersectionOneWrapping(..)}} avoids any server-side calls.

It CFIF there AFAIK doesn't seem any other limitation to wrapping ranges...

  was (Author: michaelsembwever):
You drastically removed the usage of the {{Range(left, right)}} constructor 
so that even the usage of {{intersectionBothWrapping(..)}} and 
{{intersectionOneWrapping(..)}} avoids any server-side calls.
  
 Make Range and Bounds objects client-safe
 -

 Key: CASSANDRA-3108
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Jonathan Ellis
Assignee: Mck SembWever
  Labels: hadoop
 Fix For: 0.8.5

 Attachments: 3108.txt


 From Mck's comment on CASSANDRA-1125:
 Something broke here in production once we went out with 0.8.2. It may have 
 been some poor testing, i'm not entirely sure and a little surprised.
 CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call 
 to new Range(token, token) which calls StorageService.getPartitioner() and 
 StorageService is null as we're not inside the server.
 A quick fix is to change Range:148 from new Range(token, token) to new 
 Range(token, token, partitioner) making the presumption that the partitioner 
 for the new Range will be the same as this Range.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-09-01 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094658#comment-13094658
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 9/1/11 6:39 AM:
--

see last comment. (say if this should be a separate bug...)

Maybe hadoop's task allocation isn't working properly because i've an 
unbalanced ring (i'm working in parallel to fix that).
If this is the case i think it's an unfortunate limitation (the ring must be 
balanced to get any decent hadoop performance).
It's also probably likely when using {{ConfigHelper.setInputRange(..)}} that 
the number of nodes involved is small (approaching RF).
With the default hadoop scheduler your hadoop cluster is occupied while just a 
few taskTrackers are busy. Of course switching to FairScheduler will help some 
here.

I'll take a look into hadoop's task allocation code as well...

  was (Author: michaelsembwever):
see last comment. (say if this should be a separate bug...)
  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.6
Reporter: Eldon Stegall
Assignee: Mck SembWever
  Labels: hadoop, inputformat
 Fix For: 0.7.9, 0.8.5

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-08-31 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever reopened CASSANDRA-2388:
--


see last comment. (say if this should be a separate bug...)

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.6
Reporter: Eldon Stegall
Assignee: Mck SembWever
  Labels: hadoop, inputformat
 Fix For: 0.7.9, 0.8.5

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)

2011-08-30 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever reopened CASSANDRA-1125:
--


Something broke here in production once we went out with 0.8.2. It may have 
been some poor testing, i'm not entirely sure and a little surprised.

CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call 
to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} 
and StorageService is null as we're not inside the server. 

A quick fix (tested) is to change Range:148 from {{new Range(token, token)}} to 
{{new Range(token, token, partitioner)}} making the presumption that the 
partitioner for the new Range will be the same as this Range.


 Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
 -

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 0.8.2

 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)

2011-08-30 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094036#comment-13094036
 ] 

Mck SembWever edited comment on CASSANDRA-1125 at 8/30/11 8:02 PM:
---

Something broke here in production once we went out with 0.8.2. It may have 
been some poor testing, i'm not entirely sure and a little surprised.

CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call 
to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} 
and StorageService is null as we're not inside the server. 

A quick fix is to change Range:148 from {{new Range(token, token)}} to {{new 
Range(token, token, partitioner)}} making the presumption that the partitioner 
for the new Range will be the same as this Range.


  was (Author: michaelsembwever):
Something broke here in production once we went out with 0.8.2. It may have 
been some poor testing, i'm not entirely sure and a little surprised.

CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call 
to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} 
and StorageService is null as we're not inside the server. 

A quick fix (tested) is to change Range:148 from {{new Range(token, token)}} to 
{{new Range(token, token, partitioner)}} making the presumption that the 
partitioner for the new Range will be the same as this Range.

  
 Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
 -

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 0.8.2

 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)

2011-08-30 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094036#comment-13094036
 ] 

Mck SembWever edited comment on CASSANDRA-1125 at 8/30/11 8:55 PM:
---

Something broke here in production once we went out with 0.8.2. It may have 
been some poor testing, i'm not entirely sure and a little surprised.

CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call 
to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} 
and StorageService is null as we're not inside the server. 

A quick fix is to change Range:148 from {{new Range(token, token)}} to {{new 
Range(token, token, partitioner)}} making the presumption that the partitioner 
for the new Range will be the same as this Range. This won't work if the Range 
wraps in any way (which could be just a limitation of the current KeyRange 
filtering), but otherwise tests ok.


  was (Author: michaelsembwever):
Something broke here in production once we went out with 0.8.2. It may have 
been some poor testing, i'm not entirely sure and a little surprised.

CFIF:135 breaks because inside {{dhtRange.intersects(jobRange)}} there's a call 
to {{new Range(token, token)}} which calls {{StorageService.getPartitioner()}} 
and StorageService is null as we're not inside the server. 

A quick fix is to change Range:148 from {{new Range(token, token)}} to {{new 
Range(token, token, partitioner)}} making the presumption that the partitioner 
for the new Range will be the same as this Range.

  
 Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
 -

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 0.8.2

 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-08-30 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094097#comment-13094097
 ] 

Mck SembWever commented on CASSANDRA-2388:
--

This approach isn't really working for me and was committed too quickly i 
believe.

bq. Although the documentation on inputSplit.getLocations() is a little thin as 
to whether this restricts which trackers it should run on or whether is just a 
preference

Tasks are still being evenly distributed around the ring regardless of what the 
ColumnFamilySplit.locations is.

The chance of a task actually working is RF/N. Therefore the chances of a 
blacklisted node are high. Worse is that the whole ring can quickly become 
blacklisted.

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.6
Reporter: Eldon Stegall
Assignee: Mck SembWever
  Labels: hadoop, inputformat
 Fix For: 0.7.9, 0.8.5

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3108) Make Range and Bounds objects client-safe

2011-08-30 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094104#comment-13094104
 ] 

Mck SembWever commented on CASSANDRA-3108:
--

Tested in production. 
+1

 Make Range and Bounds objects client-safe
 -

 Key: CASSANDRA-3108
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3108
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.2
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
  Labels: hadoop
 Fix For: 0.8.5

 Attachments: 3108.txt


 From Mck's comment on CASSANDRA-1125:
 Something broke here in production once we went out with 0.8.2. It may have 
 been some poor testing, i'm not entirely sure and a little surprised.
 CFIF:135 breaks because inside dhtRange.intersects(jobRange) there's a call 
 to new Range(token, token) which calls StorageService.getPartitioner() and 
 StorageService is null as we're not inside the server.
 A quick fix is to change Range:148 from new Range(token, token) to new 
 Range(token, token, partitioner) making the presumption that the partitioner 
 for the new Range will be the same as this Range.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-08-30 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094097#comment-13094097
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 8/30/11 9:22 PM:
---

This approach isn't really working for me and was committed too quickly i 
believe.

bq. Although the documentation on inputSplit.getLocations() is a little thin as 
to whether this restricts which trackers it should run on or whether is just a 
preference

Tasks are still being evenly distributed around the ring regardless of what the 
ColumnFamilySplit.locations is.

The chance of a task actually working is RF/N. Therefore the chances of a 
blacklisted node are high. Worse is that the whole ring can quickly become 
blacklisted.

http://abel-perez.com/hadoop-task-assignment has an interesting section in it 
explaining how the task assignment is supposed to work (and that data locality 
is preferred but not a requirement). Could ColumnFamilySplit.locations be in 
the wrong format? (eg they should ip not hostname?).

  was (Author: michaelsembwever):
This approach isn't really working for me and was committed too quickly i 
believe.

bq. Although the documentation on inputSplit.getLocations() is a little thin as 
to whether this restricts which trackers it should run on or whether is just a 
preference

Tasks are still being evenly distributed around the ring regardless of what the 
ColumnFamilySplit.locations is.

The chance of a task actually working is RF/N. Therefore the chances of a 
blacklisted node are high. Worse is that the whole ring can quickly become 
blacklisted.
  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.6
Reporter: Eldon Stegall
Assignee: Mck SembWever
  Labels: hadoop, inputformat
 Fix For: 0.7.9, 0.8.5

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1034) Remove assumption that Key to Token is one-to-one

2011-08-20 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088280#comment-13088280
 ] 

Mck SembWever commented on CASSANDRA-1034:
--

What's the status on this? This issue and its relations back to CASSANDRA-2878 
are the only reason we're using OPP. I suspect other users setup with both 
cassandra and hadoop (or brisk) could be in the same boat. Not only does OPP 
leave an unbalanced ring (i've had a case where all data went to one node 
because the keys/tokens were longer than normal) it leaves poor performance to 
hadoop jobs as tasks requirement on data locality has become stricter (w/ 
CASSANDRA-2388).

 Remove assumption that Key to Token is one-to-one
 -

 Key: CASSANDRA-1034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1034
 Project: Cassandra
  Issue Type: Bug
Reporter: Stu Hood
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, 
 0002-LengthPartitioner.patch, 1034-1-Generify-AbstractBounds-v3.patch, 
 1034-2-Remove-assumption-that-token-and-keys-are-one-to-one-v3.patch, 
 1034_v1.txt


 get_range_slices assumes that Tokens do not collide and converts a KeyRange 
 to an AbstractBounds. For RandomPartitioner, this assumption isn't safe, and 
 would lead to a very weird heisenberg.
 Converting AbstractBounds to use a DecoratedKey would solve this, because the 
 byte[] key portion of the DecoratedKey can act as a tiebreaker. 
 Alternatively, we could make DecoratedKey extend Token, and then use 
 DecoratedKeys in places where collisions are unacceptable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-1034) Remove assumption that Key to Token is one-to-one

2011-08-20 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088280#comment-13088280
 ] 

Mck SembWever edited comment on CASSANDRA-1034 at 8/20/11 10:10 PM:


What's the status on this? This issue and its relations back to CASSANDRA-2878 
are the only reason we're using OPP. I suspect other users setup with both 
cassandra and hadoop (or brisk) could be in the same boat. Not only does OPP 
leave an unbalanced ring (i've had a case where all data went to one node 
because the keys/tokens were longer than normal) it leaves poor performance to 
hadoop jobs as tasks requirement on data locality has become stricter (w/ 
CASSANDRA-2388). Apart from the plain preference to be using secondary indexes 
over OPP.

  was (Author: michaelsembwever):
What's the status on this? This issue and its relations back to 
CASSANDRA-2878 are the only reason we're using OPP. I suspect other users setup 
with both cassandra and hadoop (or brisk) could be in the same boat. Not only 
does OPP leave an unbalanced ring (i've had a case where all data went to one 
node because the keys/tokens were longer than normal) it leaves poor 
performance to hadoop jobs as tasks requirement on data locality has become 
stricter (w/ CASSANDRA-2388).
  
 Remove assumption that Key to Token is one-to-one
 -

 Key: CASSANDRA-1034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1034
 Project: Cassandra
  Issue Type: Bug
Reporter: Stu Hood
Assignee: Sylvain Lebresne
Priority: Minor
 Fix For: 1.1

 Attachments: 
 0001-Make-range-accept-both-Token-and-DecoratedKey.patch, 
 0002-LengthPartitioner.patch, 1034-1-Generify-AbstractBounds-v3.patch, 
 1034-2-Remove-assumption-that-token-and-keys-are-one-to-one-v3.patch, 
 1034_v1.txt


 get_range_slices assumes that Tokens do not collide and converts a KeyRange 
 to an AbstractBounds. For RandomPartitioner, this assumption isn't safe, and 
 would lead to a very weird heisenberg.
 Converting AbstractBounds to use a DecoratedKey would solve this, because the 
 byte[] key portion of the DecoratedKey can act as a tiebreaker. 
 Alternatively, we could make DecoratedKey extend Token, and then use 
 DecoratedKeys in places where collisions are unacceptable.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

2011-07-10 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062812#comment-13062812
 ] 

Mck SembWever commented on CASSANDRA-1125:
--

+1 (tested) on 1125-v3.txt

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)

2011-07-10 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-1125:
-

Summary: Filter out ColumnFamily rows that aren't part of the query (using 
a KeyRange)  (was: Filter out ColumnFamily rows that aren't part of the query)

 Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
 -

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2878) Filter out ColumnFamily rows that aren't part of the query (using a IndexClause)

2011-07-10 Thread Mck SembWever (JIRA)
Filter out ColumnFamily rows that aren't part of the query (using a IndexClause)


 Key: CASSANDRA-2878
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2878
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Mck SembWever
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0


Currently, when running a MapReduce job against data in a Cassandra data store, 
it reads through all the data for a particular ColumnFamily.  This could be 
optimized to only read through those rows that have to do with the query.

It's a small change but wanted to put it in Jira so that it didn't fall through 
the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)

2011-07-10 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062814#comment-13062814
 ] 

Mck SembWever commented on CASSANDRA-1125:
--

Created CASSANDRA-2878 for the better solution using a IndexClause

 Filter out ColumnFamily rows that aren't part of the query (using a KeyRange)
 -

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, 1125-v3.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-07-04 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059334#comment-13059334
 ] 

Mck SembWever commented on CASSANDRA-2388:
--

{quote}2) If we ARE in that situation, the right solution would be to send 
the job to a TT whose local replica IS live, not to read the data from a 
nonlocal replica. How can we signal that?{quote}To /really/ solve this issue 
can we do the following? 
In CFIF.getRangeMap() take out of each range any endpoints that are not alive. 
A client connection already exists in this method. This filtering out of dead 
endpoints wouldn't be difficult, and would move tasks *to* the data making use 
of replica. This approach does need a new method in cassandra.thrift, eg 
{{liststring describe_alive_nodes()}}

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

2011-07-04 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059401#comment-13059401
 ] 

Mck SembWever commented on CASSANDRA-1125:
--

bq. using KeyRange but with tokens (which Thrift also uses for start-exclusive)
this is my preference. i'll make a patch for it.

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-07-04 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-2388:
-

Attachment: CASSANDRA-2388-extended.patch

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-extended.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

2011-07-04 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-1125:
-

Attachment: CASSANDRA-1125.patch

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch, 
 CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-07-02 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059134#comment-13059134
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 7/2/11 10:08 PM:
---

The idea is to setup splits to have only endpoints that are valid trackers. But 
now i see this is just a brainfart :-) Ofc the jobTracker will apply this match 
for us. And that CFIF was always 'restricted' to running on endpoints. Although 
the documentation on inputSplit.getLocations() is a little thin as to whether 
this restricts which trackers it should run on or whether is just a 
preference... I guess it doesn't matter, as you point out Jonathan all that's 
required here is the one line changed in CFRR.



  was (Author: michaelsembwever):
The idea is to setup splits to have only endpoints that are valid trackers. 
But now i see this is just a brainfart :-) Ofc the jobTracker will apply this 
match for us. And that CFIF was always 'restricted' to running on endpoints. 
Although the documentation on inputSplit.getLocations() is a little thin as to 
whether this restricts which trackers it should run on or whether is just a 
recommendation... I guess it doesn't matter, as you point out Jonathan all 
that's required here is the one line changed in CFRR.


  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-local-nodes-only.rough-sketch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-07-02 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059134#comment-13059134
 ] 

Mck SembWever commented on CASSANDRA-2388:
--

The idea is to setup splits to have only endpoints that are valid trackers. But 
now i see this is just a brainfart :-) Ofc the jobTracker will apply this match 
for us. And that CFIF was always 'restricted' to running on endpoints. Although 
the documentation on inputSplit.getLocations() is a little thin as to whether 
this restricts which trackers it should run on or whether is just a 
recommendation... I guess it doesn't matter, as you point out Jonathan all 
that's required here is the one line changed in CFRR.



 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-local-nodes-only.rough-sketch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-07-02 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-2388:
-

Attachment: (was: CASSANDRA-2388-local-nodes-only.rough-sketch)

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-07-02 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-2388:
-

Attachment: CASSANDRA-2388.patch

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira





[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-07-02 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059136#comment-13059136
 ] 

Mck SembWever commented on CASSANDRA-2388:
--

the new one-liner CASSANDRA-2388 attached. i'll submit patch once i've 
tested it some...

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-30 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057659#comment-13057659
 ] 

Mck SembWever commented on CASSANDRA-2388:
--

Then i would hope for two separate InputFormats. One optimised for local node 
connection, where cassandra is deemed the more important system over hadoop, 
and another where data can be read in from anywhere. I think the latter should 
be supported in some manner  since users may not always have the possibility to 
install hadoop and cassandra on the same servers, or they might not think it to 
be so critical part (eg if CFIF is reading using a IndexClause the input data 
set might be quite small and the remaining code in the m/r be the bulk of the 
processing...)

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-30 Thread Mck SembWever (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mck SembWever updated CASSANDRA-2388:
-

Attachment: CASSANDRA-2388-local-nodes-only.rough-sketch

Is CASSANDRA-2388-local-nodes-only-rough-sketch the direction we want then?

This is very initial code, i can't get {{new 
JobClient(JobTracker.getAddress(conf), 
conf).getClusterStatus().getActiveTrackerNames()}} to work, need a little help 
here.
(Also CFRR.getLocations() can be drastically reduced).

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388-local-nodes-only.rough-sketch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-29 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 6:31 AM:
---

This does happen already (i've seen it while testing initial patches that were 
no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

On a side note, there is no guarantee that any given TT will have its split 
accessible via a local c* node - this is only a preference in CFRR. A failed 
job may just as likely got to a random c* node. At least now we can actually 
properly limit to the one DC and sort by proximity. 

One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...

  was (Author: michaelsembwever):
This does happen already (i've seen it while testing initial patches that 
were no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than  a fallback to another 
TT. For example a c* node may die in the middle of a TT...
  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-29 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 6:32 AM:
---

This does happen already (i've seen it while testing initial patches that were 
no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

On a side note, there is no guarantee that any given TT will have its split 
accessible via a local c* node - this is only a preference in CFRR. A failed 
job may just as likely go to a random c* node. At least now we can actually 
properly limit to the one DC and sort by proximity. 

One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...

  was (Author: michaelsembwever):
This does happen already (i've seen it while testing initial patches that 
were no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

On a side note, there is no guarantee that any given TT will have its split 
accessible via a local c* node - this is only a preference in CFRR. A failed 
job may just as likely got to a random c* node. At least now we can actually 
properly limit to the one DC and sort by proximity. 

One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...
  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-29 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 7:19 AM:
---

This does happen already (i've seen it while testing initial patches that were 
no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

On a side note, there is no guarantee that any given TT will have its split 
accessible via a local c* node - this is only a preference in CFRR. A failed 
task may just as likely go to a random c* node. At least now we can actually 
properly limit to the one DC and sort by proximity. 

One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...

  was (Author: michaelsembwever):
This does happen already (i've seen it while testing initial patches that 
were no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

On a side note, there is no guarantee that any given TT will have its split 
accessible via a local c* node - this is only a preference in CFRR. A failed 
job may just as likely go to a random c* node. At least now we can actually 
properly limit to the one DC and sort by proximity. 

One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...
  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-29 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 7:27 AM:
---

This does happen already (i've seen it while testing initial patches that were 
no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

On a side note, there is no guarantee that any given TT will have its split 
accessible via a local c* node - this is only a preference in CFRR. A failed 
task may just as likely go to a random c* node. At least now we can actually 
properly limit to the one DC and sort by proximity. 

One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...

A bug i can see in the patch that did get accepted already is in 
CassandraServer.java:763 when endpointValid is false and restrictToSameDC is 
true we end up restricting to a random DC. I can fix this so restrictToSameDC 
is disabled in such situations.

  was (Author: michaelsembwever):
This does happen already (i've seen it while testing initial patches that 
were no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

On a side note, there is no guarantee that any given TT will have its split 
accessible via a local c* node - this is only a preference in CFRR. A failed 
task may just as likely go to a random c* node. At least now we can actually 
properly limit to the one DC and sort by proximity. 

One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...
  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-29 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 7:49 AM:
---

 - This does happen already (i've seen it while testing initial patches that 
were no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

 - There is no guarantee that any given TT will have its split accessible via a 
local c* node - this is only a preference in CFRR. A failed task may just as 
likely go to a random c* node. At least now we can actually properly limit to 
the one DC and sort by proximity. 

 - One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...

 - A bug i can see in the patch that did get accepted already is in 
CassandraServer.java:763 when endpointValid is false and restrictToSameDC is 
true we end up restricting to a random DC. I could fix this so restrictToSameDC 
is disabled in such situations but this actually invalidates the previous 
point: we can't restrict to DC anymore and we can only sortByProximity to a 
random node... I think this supports Jonathan's point that it's overall a poor 
approach. I'm more and more in preference of my original approach using just 
client.getDatacenter(..) and not worrying about proximity within the datacenter.

 - Another bug is that, contray to my patch, the code committed
bq. committed with a change to use the dynamic snitch id the passed endpoint is 
valid.
 can call {{DynamicEndpointSnitch.sortByProximity(..)}} with an address that is 
not localhost and this breaks the assertion in the method. 

  was (Author: michaelsembwever):
 - This does happen already (i've seen it while testing initial patches 
that were no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

 - There is no guarantee that any given TT will have its split accessible via a 
local c* node - this is only a preference in CFRR. A failed task may just as 
likely go to a random c* node. At least now we can actually properly limit to 
the one DC and sort by proximity. 

 - One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...

 - A bug i can see in the patch that did get accepted already is in 
CassandraServer.java:763 when endpointValid is false and restrictToSameDC is 
true we end up restricting to a random DC. I can fix this so restrictToSameDC 
is disabled in such situations. This actually invalidates the previous point: 
we can't restrict to DC anymore and we can only sortByProximity to a random 
node... I think this supports Jonathan's point that it's overall a poor 
approach. I'm more and more in preference of my original approach using just 
client.getDatacenter(..) and not worrying about proximity within the datacenter.

 - Another bug is that, contray to my patch, the code committed
bq. committed with a change to use the dynamic snitch id the passed endpoint is 
valid.
 can call {{DynamicEndpointSnitch.sortByProximity(..)}} with an address that is 
not localhost and this breaks the assertion in the method. 
  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, 

[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-29 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057002#comment-13057002
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 7:48 AM:
---

 - This does happen already (i've seen it while testing initial patches that 
were no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

 - There is no guarantee that any given TT will have its split accessible via a 
local c* node - this is only a preference in CFRR. A failed task may just as 
likely go to a random c* node. At least now we can actually properly limit to 
the one DC and sort by proximity. 

 - One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...

 - A bug i can see in the patch that did get accepted already is in 
CassandraServer.java:763 when endpointValid is false and restrictToSameDC is 
true we end up restricting to a random DC. I can fix this so restrictToSameDC 
is disabled in such situations. This actually invalidates the previous point: 
we can't restrict to DC anymore and we can only sortByProximity to a random 
node... I think this supports Jonathan's point that it's overall a poor 
approach. I'm more and more in preference of my original approach using just 
client.getDatacenter(..) and not worrying about proximity within the datacenter.

 - Another bug is that, contray to my patch, the code committed
bq. committed with a change to use the dynamic snitch id the passed endpoint is 
valid.
 can call {{DynamicEndpointSnitch.sortByProximity(..)}} with an address that is 
not localhost and this breaks the assertion in the method. 

  was (Author: michaelsembwever):
This does happen already (i've seen it while testing initial patches that 
were no good).
Problem is that the TT is blacklisted, reducing hadoop's throughput for all 
jobs running.
I bet too that a fallback to a replica is faster than a fallback to another TT.

On a side note, there is no guarantee that any given TT will have its split 
accessible via a local c* node - this is only a preference in CFRR. A failed 
task may just as likely go to a random c* node. At least now we can actually 
properly limit to the one DC and sort by proximity. 

One thing we're not doing here is applying this same DC limit and sort by 
proximity in the case when there isn't a localhost preference. See 
CFRR.initialize(..)
It would make sense to rewrite CFRR.getLocations(..) to
{noformat}private IteratorString getLocations(final Configuration conf) 
throws IOException
{
return new SplitEndpointIterator(conf);
}{noformat} and then to move the finding-a-preference-to-localhost code 
into SplitEndpointIterator...

A bug i can see in the patch that did get accepted already is in 
CassandraServer.java:763 when endpointValid is false and restrictToSameDC is 
true we end up restricting to a random DC. I can fix this so restrictToSameDC 
is disabled in such situations.
  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-29 Thread Mck SembWever (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057470#comment-13057470
 ] 

Mck SembWever edited comment on CASSANDRA-2388 at 6/29/11 9:18 PM:
---

bq. tlipcon says it comes back after 24h
just to be clear about my concerns. 
this means a dead c* node will bring down a TT. In a hadoop cluster with 3 
nodes this means for 24hrs you're lost 33% throughput. (If less than 10% of 
hadoop jobs used CFIF i could well imagine some pissed users). (What if you 
have a temporarily problem with flapping c* nodes and you end up with a handful 
of blacklisted TTs? etc etc etc).

All this when using a replica, any replica, could have kept things going 
smoothly, the only slowdown being some of the data into CFIF had to go over the 
network instead...


  was (Author: michaelsembwever):
bq. tlipcon says it comes back after 24h
just to be clear about my concerns. 
this means a dead c* node will bring down a TT. In a hadoop cluster with 3 
nodes this means for 24hrs you're lost 33% throughput. (If less than 10% of 
hadoop jobs used CFIF i could well imagine some pissed customers). (What if you 
have a temporarily problem with flapping c* nodes and you end up with a handful 
of blacklisted TTs? etc etc etc).

All this when using a replica, any replica, could have kept things going 
smoothly, the only slowdown being some of the data into CFIF had to go over the 
network instead...

  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388-addition1.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch, 
 CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >