[jira] [Created] (CASSANDRA-2658) Pig + CassandraStorage should work when trying to cast data after it's loaded

2011-05-16 Thread Jeremy Hanna (JIRA)
Pig + CassandraStorage should work when trying to cast data after it's loaded
-

 Key: CASSANDRA-2658
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2658
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.5
Reporter: Jeremy Hanna
Priority: Minor


We currently do a lot with pig + cassandra, but one thing I've found is that 
currently it's very touchy with data that comes from Cassandra for some reason. 
 For example, if I try to a SUM of data that has not been validated as an 
LongType in Cassandra, it borks.  See this schema script for Cassandra - 
https://github.com/jeromatron/pygmalion/blob/master/cassandra/example_data.txt 
- and remove the validation on the num_heads data type and try to SUM that over 
the data and it gives data type errors.  (It breaks with the num_heads 
validation removed and with or without the default_validation class being set.)

We currently do analysis over data that is either just String (UTF8) data or 
that we have validated, so it works for us.  However, I've seen a couple of 
people trying to use Cassandra with Pig that have had issues because of this.  
One of the tenants of pig is that it will eat anything and it kind of goes 
against this if the load/store somehow interferes with that.  So in essence, I 
think this is a big deal for those wanting to use pig with cassandra in the 
ways that pig is normally used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (CASSANDRA-2658) Pig + CassandraStorage should work when trying to cast data after it's loaded

2011-05-16 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2658:


Description: 
We currently do a lot with pig + cassandra, but one thing I've found is that 
currently it's very touchy with data that comes from Cassandra for some reason. 
 For example, if I try to a SUM of data that has not been validated as an 
LongType in Cassandra, it borks.  See this schema script for Cassandra - 
https://github.com/jeromatron/pygmalion/blob/master/cassandra/example_data.txt 
- and remove the validation on the num_heads data type and try to SUM that over 
the data and it gives data type errors.  (It breaks with the num_heads 
validation removed and with or without the default_validation class being set.)

We currently do analysis over data that is either just String (UTF8) data or 
that we have validated, so it works for us.  However, I've seen a couple of 
people trying to use Cassandra with Pig that have had issues because of this.  
One of the tenets of pig is that it will eat anything and it kind of goes 
against this if the load/store somehow interferes with that.  So in essence, I 
think this is a big deal for those wanting to use pig with cassandra in the 
ways that pig is normally used.

  was:
We currently do a lot with pig + cassandra, but one thing I've found is that 
currently it's very touchy with data that comes from Cassandra for some reason. 
 For example, if I try to a SUM of data that has not been validated as an 
LongType in Cassandra, it borks.  See this schema script for Cassandra - 
https://github.com/jeromatron/pygmalion/blob/master/cassandra/example_data.txt 
- and remove the validation on the num_heads data type and try to SUM that over 
the data and it gives data type errors.  (It breaks with the num_heads 
validation removed and with or without the default_validation class being set.)

We currently do analysis over data that is either just String (UTF8) data or 
that we have validated, so it works for us.  However, I've seen a couple of 
people trying to use Cassandra with Pig that have had issues because of this.  
One of the tenants of pig is that it will eat anything and it kind of goes 
against this if the load/store somehow interferes with that.  So in essence, I 
think this is a big deal for those wanting to use pig with cassandra in the 
ways that pig is normally used.


 Pig + CassandraStorage should work when trying to cast data after it's loaded
 -

 Key: CASSANDRA-2658
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2658
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.5
Reporter: Jeremy Hanna
Priority: Minor
  Labels: pig

 We currently do a lot with pig + cassandra, but one thing I've found is that 
 currently it's very touchy with data that comes from Cassandra for some 
 reason.  For example, if I try to a SUM of data that has not been validated 
 as an LongType in Cassandra, it borks.  See this schema script for Cassandra 
 - 
 https://github.com/jeromatron/pygmalion/blob/master/cassandra/example_data.txt
  - and remove the validation on the num_heads data type and try to SUM that 
 over the data and it gives data type errors.  (It breaks with the num_heads 
 validation removed and with or without the default_validation class being 
 set.)
 We currently do analysis over data that is either just String (UTF8) data or 
 that we have validated, so it works for us.  However, I've seen a couple of 
 people trying to use Cassandra with Pig that have had issues because of this. 
  One of the tenets of pig is that it will eat anything and it kind of goes 
 against this if the load/store somehow interferes with that.  So in essence, 
 I think this is a big deal for those wanting to use pig with cassandra in the 
 ways that pig is normally used.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2688) Support wide rows with Hadoop support

2011-05-23 Thread Jeremy Hanna (JIRA)
Support wide rows with Hadoop support
-

 Key: CASSANDRA-2688
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2688
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna


Currently the Hadoop support can only operate over the maximum row width of 
thrift afaik.  Then a user must do paging of the row within their hadoop 
interface - java, pig, hive.  It would be much nicer to have the hadoop support 
page through the row internally, if possible.  Seeing that one of cassandra's 
features is extremely wide rows, it would be nice feature parity so that people 
didn't have to adjust their cassandra plans based on hadoop support limitations.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-1497) Add input support for Hadoop Streaming

2011-05-25 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna resolved CASSANDRA-1497.
-

Resolution: Won't Fix

It turns out that currently there's not much interest in streaming.  The way to 
go is something like pig or hive if Java is not an option.

 Add input support for Hadoop Streaming
 --

 Key: CASSANDRA-1497
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
 Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch


 related to CASSANDRA-1368 - create similar functionality for input streaming.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Reopened] (CASSANDRA-1498) Add a simple MapReduce system test

2011-05-25 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna reopened CASSANDRA-1498:
-


 Add a simple MapReduce system test
 --

 Key: CASSANDRA-1498
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1498
 Project: Cassandra
  Issue Type: Test
  Components: Hadoop, Tools
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna

 We don't have a good way to regression test MapReduce functionality.  This 
 ticket entails making a simple system test to do that (in python).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-1498) Add a simple MapReduce system test

2011-05-25 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna resolved CASSANDRA-1498.
-

Resolution: Won't Fix

While it would still be nice to have automated testing for mapreduce 
functionality and hadoop integration, it appears that doing it via streaming is 
not the way to go.

 Add a simple MapReduce system test
 --

 Key: CASSANDRA-1498
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1498
 Project: Cassandra
  Issue Type: Test
  Components: Hadoop, Tools
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
 Fix For: 0.8.1


 We don't have a good way to regression test MapReduce functionality.  This 
 ticket entails making a simple system test to do that (in python).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-1498) Add a simple MapReduce system test

2011-05-25 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna resolved CASSANDRA-1498.
-

   Resolution: Won't Fix
Fix Version/s: (was: 0.8.1)

Re-resolving to remove the fix version.

 Add a simple MapReduce system test
 --

 Key: CASSANDRA-1498
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1498
 Project: Cassandra
  Issue Type: Test
  Components: Hadoop, Tools
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna

 We don't have a good way to regression test MapReduce functionality.  This 
 ticket entails making a simple system test to do that (in python).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2706) Pig output not working with 0.8.0 branch

2011-05-25 Thread Jeremy Hanna (JIRA)
Pig output not working with 0.8.0 branch


 Key: CASSANDRA-2706
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2706
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremy Hanna


For some reason running a simple column family copy with pig is not writing 
out, though pig reports that it is successful.
Steps to reproduce on a local node:
1. Create the schema:
http://aep.appspot.com/display/VgbvdtP6QExc3OTY3HBry9ncC3k/
2. Run the following pig script (I did it with pig 0.8.0 from cdh3) using 
contrib/pig/bin/pig_cassandra -x local:
http://aep.appspot.com/display/PaWJkCqRGbp7CRgjt7qoyx9izN8/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2706) Pig output not working with 0.8.0 branch

2011-05-25 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039275#comment-13039275
 ] 

Jeremy Hanna commented on CASSANDRA-2706:
-

+1
Fixed the case I mentioned and I tried on a simple use of from/to cassandra bag 
from the pygmalion stuff just to have something a little different.

 Pig output not working with 0.8.0 branch
 

 Key: CASSANDRA-2706
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2706
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremy Hanna
Assignee: Brandon Williams
  Labels: pig
 Fix For: 0.8.0

 Attachments: 2706.txt


 For some reason running a simple column family copy with pig is not writing 
 out, though pig reports that it is successful.
 Steps to reproduce on a local node:
 1. Create the schema:
 http://aep.appspot.com/display/VgbvdtP6QExc3OTY3HBry9ncC3k/
 2. Run the following pig script (I did it with pig 0.8.0 from cdh3) using 
 contrib/pig/bin/pig_cassandra -x local:
 http://aep.appspot.com/display/PaWJkCqRGbp7CRgjt7qoyx9izN8/

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2727) examples/hadoop_word_count reducer to cassandra doesn't output into the output_words cf

2011-05-31 Thread Jeremy Hanna (JIRA)
examples/hadoop_word_count reducer to cassandra doesn't output into the 
output_words cf
---

 Key: CASSANDRA-2727
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2727
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.0 beta 2
Reporter: Jeremy Hanna
Priority: Minor


I tried the examples/hadoop_word_count example and could output to the 
filesystem but when I output to cassandra (the default), nothing shows up in 
output_words.  I can output to cassandra using pig so I think the problem is 
isolated to this example.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-2763) When dropping a keyspace you're currently authenticated to, might be nice to de-authenticate upon completion

2011-06-11 Thread Jeremy Hanna (JIRA)
When dropping a keyspace you're currently authenticated to, might be nice to 
de-authenticate upon completion


 Key: CASSANDRA-2763
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2763
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
Priority: Trivial


I found that when I'm authenticated to MyKeyspace, then do 'drop keyspace 
MyKeyspace;', I'm still authenticated to it.  It's trivial I know, but seems 
reasonable to unauthenticate from it.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2777) Pig storage handler should implement LoadMetadata

2011-06-15 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2777:


Reviewer: jeromatron

 Pig storage handler should implement LoadMetadata
 -

 Key: CASSANDRA-2777
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2777
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Attachments: 2777.txt


 The reason for this is many builtin functions like SUM won't work on longs 
 (you can workaround using LongSum, but that's lame) because the query planner 
 doesn't know about the types beforehand, even though we are casting to native 
 longs.
 There is some impact to this, though.  With LoadMetadata implemented, 
 existing scripts that specify schema will need to remove it (since LM is 
 doing it for them) and they will need to conform to LM's terminology (key, 
 columns, name, value) within the script.  This is trivial to change, however, 
 and the increased functionality is worth the switch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2777) Pig storage handler should implement LoadMetadata

2011-06-15 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050159#comment-13050159
 ] 

Jeremy Hanna edited comment on CASSANDRA-2777 at 6/16/11 12:21 AM:
---

while we're add it can we remove the redundant addMutation call on line 505 and 
on line 513 add the e param on:
{code}
throw new IOException(e +  Output must be (key, {(column,value)...}) for 
ColumnFamily or (key, {supercolumn:{(column,value)...}...}) for 
SuperColumnFamily, e);
{code}

  was (Author: jeromatron):
while we're add it can we remove the redundant addMutation call on line 505 
and on line 513 add the e param on:
{quote}
throw new IOException(e +  Output must be (key, {(column,value)...}) for 
ColumnFamily or (key, {supercolumn:{(column,value)...}...}) for 
SuperColumnFamily, e);
{quote}
  
 Pig storage handler should implement LoadMetadata
 -

 Key: CASSANDRA-2777
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2777
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.7.7

 Attachments: 2777.txt


 The reason for this is many builtin functions like SUM won't work on longs 
 (you can workaround using LongSum, but that's lame) because the query planner 
 doesn't know about the types beforehand, even though we are casting to native 
 longs.
 There is some impact to this, though.  With LoadMetadata implemented, 
 existing scripts that specify schema will need to remove it (since LM is 
 doing it for them) and they will need to conform to LM's terminology (key, 
 columns, name, value) within the script.  This is trivial to change, however, 
 and the increased functionality is worth the switch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2777) Pig storage handler should implement LoadMetadata

2011-06-15 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050159#comment-13050159
 ] 

Jeremy Hanna commented on CASSANDRA-2777:
-

while we're add it can we remove the redundant addMutation call on line 505 and 
on line 513 add the e param on:
{quote}
throw new IOException(e +  Output must be (key, {(column,value)...}) for 
ColumnFamily or (key, {supercolumn:{(column,value)...}...}) for 
SuperColumnFamily, e);
{quote}

 Pig storage handler should implement LoadMetadata
 -

 Key: CASSANDRA-2777
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2777
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.7.7

 Attachments: 2777.txt


 The reason for this is many builtin functions like SUM won't work on longs 
 (you can workaround using LongSum, but that's lame) because the query planner 
 doesn't know about the types beforehand, even though we are casting to native 
 longs.
 There is some impact to this, though.  With LoadMetadata implemented, 
 existing scripts that specify schema will need to remove it (since LM is 
 doing it for them) and they will need to conform to LM's terminology (key, 
 columns, name, value) within the script.  This is trivial to change, however, 
 and the increased functionality is worth the switch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2799) Implement old style api support for ColumnFamilyInputFormat and ColumnFamilyRecordReader

2011-06-20 Thread Jeremy Hanna (JIRA)
Implement old style api support for ColumnFamilyInputFormat and 
ColumnFamilyRecordReader


 Key: CASSANDRA-2799
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2799
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
 Fix For: 0.7.7, 0.8.2


For better compatibility with hadoop, I would like to add old style hadoop 
support (mapred) to the ColumnFamilyInputFormat and ColumnFamilyRecordReader.  
We already have it in the output.  Oozie in particular handles the old style 
api better.  That is the motivation for us.  I already did this as part of my 
patch for CASSANDRA-1497 so it should be trivial.  We are just in a tight 
schedule right now and I'll come back to this once we have a bit of breathing 
room.

I think it would help with compatibility with other systems that rely on hadoop 
as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-24 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054780#comment-13054780
 ] 

Jeremy Hanna commented on CASSANDRA-2388:
-

This patch applies to 0.7 with minimal problems - just some imports on 
CassandraServer that it couldn't resolve properly.  Can this be committed 
against 0.7-branch for inclusion in 0.7.7?

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Eldon Stegall
Assignee: Mck SembWever
  Labels: hadoop, inputformat
 Fix For: 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-24 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054780#comment-13054780
 ] 

Jeremy Hanna edited comment on CASSANDRA-2388 at 6/25/11 12:50 AM:
---

This patch applies to the current 0.7-branch with minimal problems - just some 
imports on CassandraServer that it couldn't resolve properly.  Can this be 
committed against 0.7-branch for inclusion in 0.7.7?

  was (Author: jeromatron):
This patch applies to 0.7 with minimal problems - just some imports on 
CassandraServer that it couldn't resolve properly.  Can this be committed 
against 0.7-branch for inclusion in 0.7.7?
  
 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Eldon Stegall
Assignee: Mck SembWever
  Labels: hadoop, inputformat
 Fix For: 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-24 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054789#comment-13054789
 ] 

Jeremy Hanna commented on CASSANDRA-2388:
-

I've done basic testing with the word count and pig examples to make sure that 
the basic hadoop integration isn't negatively affected by this.  I'll also try 
it against our dev cluster before and after the patch - killing one node to see 
if it fails over to another replica - to make sure it does what it should that 
way.

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Reporter: Eldon Stegall
Assignee: Mck SembWever
  Labels: hadoop, inputformat
 Fix For: 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-24 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2388:


Affects Version/s: 0.7.6
   0.8.0
Fix Version/s: 0.7.7

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2388) ColumnFamilyRecordReader fails for a given split because a host is down, even if records could reasonably be read from other replica.

2011-06-24 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054806#comment-13054806
 ] 

Jeremy Hanna commented on CASSANDRA-2388:
-

Jonathan - is it possible to attach an updated patch based on your changes to 
0.8 branch?  Not sure if that would be simple to extract.

 ColumnFamilyRecordReader fails for a given split because a host is down, even 
 if records could reasonably be read from other replica.
 -

 Key: CASSANDRA-2388
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2388
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.6, 0.8.0
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
  Labels: hadoop, inputformat
 Fix For: 0.7.7, 0.8.2

 Attachments: 0002_On_TException_try_next_split.patch, 
 CASSANDRA-2388.patch, CASSANDRA-2388.patch, CASSANDRA-2388.patch


 ColumnFamilyRecordReader only tries the first location for a given split. We 
 should try multiple locations for a given split.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2807) ColumnFamilyInputFormat configuration should support multiple initial addresses

2011-06-27 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055780#comment-13055780
 ] 

Jeremy Hanna commented on CASSANDRA-2807:
-

is this an easy thing to commit to 0.7-branch as well?

 ColumnFamilyInputFormat configuration should support multiple initial 
 addresses
 ---

 Key: CASSANDRA-2807
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2807
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Affects Versions: 0.6
Reporter: Greg Katz
Assignee: Mck SembWever
Priority: Minor
 Fix For: 0.8.1

 Attachments: CASSANDRA-2807.patch


 The {{ColumnFamilyInputFormat}} class only allows a single initial node to be 
 specified through the cassandra.thrift.address configuration property. The 
 configuration should support a list of nodes in order to account for the 
 possibility that the initial node becomes unavailable.
 By contrast, the {{RingCache}} class used by the {{ColumnFamilyRecordWriter}} 
 reads the exact same {{cassandra.thrift.address}} property but splits its 
 value on commas to allow multiple initial nodes to be specified.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1125) Filter out ColumnFamily rows that aren't part of the query

2011-07-02 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059098#comment-13059098
 ] 

Jeremy Hanna commented on CASSANDRA-1125:
-

So does this only include key ranges - that's what it sounds like.  And indexes 
are out for now too, it sounds like - e.g. where timebucket = 12345.

 Filter out ColumnFamily rows that aren't part of the query
 --

 Key: CASSANDRA-1125
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1125
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Mck SembWever
Priority: Minor
 Fix For: 1.0

 Attachments: 1125-formatted.txt, CASSANDRA-1125.patch


 Currently, when running a MapReduce job against data in a Cassandra data 
 store, it reads through all the data for a particular ColumnFamily.  This 
 could be optimized to only read through those rows that have to do with the 
 query.
 It's a small change but wanted to put it in Jira so that it didn't fall 
 through the cracks.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2855) Add hadoop support option to skip rows with empty columns

2011-07-04 Thread Jeremy Hanna (JIRA)
Add hadoop support option to skip rows with empty columns
-

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna


We have been finding that range ghosts appear in results from Hadoop via Pig.  
This could also happen if rows don't have data for the slice predicate that is 
given.  This leads to having to do a painful amount of defensive checking on 
the Pig side, especially in the case of range ghosts.

We would like to add an option to skip rows that have no column values in it.  
That functionality existed before in core Cassandra but was removed because of 
the performance penalty of that checking.  However with Hadoop support in the 
RecordReader, that is batch oriented anyway, so individual row reading 
performance isn't as much of an issue.  Also we would make it an optional 
config parameter for each job anyway, so people wouldn't have to incur that 
penalty if they are confident that there won't be those empty rows or they 
don't care.

It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Add hadoop support option to skip rows with empty columns

2011-07-05 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059978#comment-13059978
 ] 

Jeremy Hanna commented on CASSANDRA-2855:
-

{quote}
What I think we could do is not bother including empty rows in the resultset, 
IF we are doing a slice query for the entire row. (Since, as soon as the 
tombstones expire, they will be gone anyway.)
{quote}
Yeah - our primary concern is tombstones.  Would be great to get that done at a 
lower level.

 Add hadoop support option to skip rows with empty columns
 -

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
  Labels: hadoop

 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-05 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059995#comment-13059995
 ] 

Jeremy Hanna commented on CASSANDRA-2855:
-

is it more expensive/complicated to do it for an empty slice or is that just 
orthogonal to this since that is handled in a different place?

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jonathan Ellis
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.2


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-05 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna reassigned CASSANDRA-2855:
---

Assignee: Jeremy Hanna

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.2


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2866) Add higher nofile (ulimit -n) property to the install configuration for debian and rpm packaging

2011-07-06 Thread Jeremy Hanna (JIRA)
Add higher nofile (ulimit -n) property to the install configuration for debian 
and rpm packaging


 Key: CASSANDRA-2866
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2866
 Project: Cassandra
  Issue Type: Improvement
  Components: Packaging
Reporter: Jeremy Hanna
Priority: Minor


Currently in the packaging we set the memlock to unlimited.  We should also up 
the nofile value (ulimit -n) so that it's more than 1024, likely unlimited.  
Otherwise, there can be odd indirect bugs.  For example, I've seen compaction 
fail because of the too many open files error.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2870) dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException

2011-07-08 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062256#comment-13062256
 ] 

Jeremy Hanna commented on CASSANDRA-2870:
-

This also appears to affect 0.7.6 and when read repair is not off.  I didn't 
set read repair on my CFs (defaults to 100%) and tried a simple rowcount pig 
script using read consistency LOCAL_QUORUM and it fails with UE.  I would think 
if that's the case, the priority should be higher and it should go in 0.7.7.  
Any thoughts?

 dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return 
 spurious UnavailableException
 -

 Key: CASSANDRA-2870
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2870
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.0
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.7.8, 0.8.2

 Attachments: 2870.txt


 When Read Repair is off, we want to avoid doing requests to more nodes than 
 necessary to satisfy the ConsistencyLevel.  ReadCallback does this here:
 {code}
 this.endpoints = repair || resolver instanceof RowRepairResolver
? endpoints
: endpoints.subList(0, Math.min(endpoints.size(), 
 blockfor)); // min so as to not throw exception until assureSufficient is 
 called
 {code}
 You can see that it is assuming that the endpoints list is sorted in order 
 of preferred-ness for the read.
 Then the LOCAL_QUORUM code in DatacenterReadCallback checks to see if we have 
 enough nodes to do the read:
 {code}
 int localEndpoints = 0;
 for (InetAddress endpoint : endpoints)
 {
 if (localdc.equals(snitch.getDatacenter(endpoint)))
 localEndpoints++;
 }
 if (localEndpoints  blockfor)
 throw new UnavailableException();
 {code}
 So if repair is off (so we truncate our endpoints list) AND dynamic snitch 
 has decided that nodes in another DC are to be preferred over local ones, 
 we'll throw UE even if all the replicas are healthy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2870) dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return spurious UnavailableException

2011-07-08 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13062324#comment-13062324
 ] 

Jeremy Hanna commented on CASSANDRA-2870:
-

Okay - it just seemed like a higher priority issue with the scope expanded.  
We'll probably just disable dynamic snitch until the fix is in a release then.

 dynamic snitch + read repair off can cause LOCAL_QUORUM reads to return 
 spurious UnavailableException
 -

 Key: CASSANDRA-2870
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2870
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.7.0
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
Priority: Minor
 Fix For: 0.7.8, 0.8.2

 Attachments: 2870.txt


 When Read Repair is off, we want to avoid doing requests to more nodes than 
 necessary to satisfy the ConsistencyLevel.  ReadCallback does this here:
 {code}
 this.endpoints = repair || resolver instanceof RowRepairResolver
? endpoints
: endpoints.subList(0, Math.min(endpoints.size(), 
 blockfor)); // min so as to not throw exception until assureSufficient is 
 called
 {code}
 You can see that it is assuming that the endpoints list is sorted in order 
 of preferred-ness for the read.
 Then the LOCAL_QUORUM code in DatacenterReadCallback checks to see if we have 
 enough nodes to do the read:
 {code}
 int localEndpoints = 0;
 for (InetAddress endpoint : endpoints)
 {
 if (localdc.equals(snitch.getDatacenter(endpoint)))
 localEndpoints++;
 }
 if (localEndpoints  blockfor)
 throw new UnavailableException();
 {code}
 So if repair is off (so we truncate our endpoints list) AND dynamic snitch 
 has decided that nodes in another DC are to be preferred over local ones, 
 we'll throw UE even if all the replicas are healthy.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-2869) CassandraStorage does not function properly when used multiple times in a single pig script due to UDFContext sharing issues

2011-07-11 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna reassigned CASSANDRA-2869:
---

Assignee: Jeremy Hanna

 CassandraStorage does not function properly when used multiple times in a 
 single pig script due to UDFContext sharing issues
 

 Key: CASSANDRA-2869
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2869
 Project: Cassandra
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.7.2
Reporter: Grant Ingersoll
Assignee: Jeremy Hanna

 CassandraStorage appears to have threading issues along the lines of those 
 described at http://pig.markmail.org/message/oz7oz2x2dwp66eoz due to the 
 sharing of the UDFContext.
 I believe the fix lies in implementing
 {code}
 public void setStoreFuncUDFContextSignature(String signature)
 {
 }
 {code}
 and then using that signature when getting the UDFContext.
 From the Pig manual:
 {quote}
 setStoreFunc!UDFContextSignature(): This method will be called by Pig both in 
 the front end and back end to pass a unique signature to the Storer. The 
 signature can be used to store into the UDFContext any information which the 
 Storer needs to store between various method invocations in the front end and 
 back end. The default implementation in StoreFunc has an empty body. This 
 method will be called before other methods.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2869) CassandraStorage does not function properly when used multiple times in a single pig script due to UDFContext sharing issues

2011-07-12 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13064149#comment-13064149
 ] 

Jeremy Hanna commented on CASSANDRA-2869:
-

Yes. I was about to post an updated patch last night but got sidetracked. Do 
you mind removing that if it's otherwise good to go? Otherwise I can do that 
later today.

 CassandraStorage does not function properly when used multiple times in a 
 single pig script due to UDFContext sharing issues
 

 Key: CASSANDRA-2869
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2869
 Project: Cassandra
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.7.2
Reporter: Grant Ingersoll
Assignee: Jeremy Hanna
 Attachments: 2869.txt


 CassandraStorage appears to have threading issues along the lines of those 
 described at http://pig.markmail.org/message/oz7oz2x2dwp66eoz due to the 
 sharing of the UDFContext.
 I believe the fix lies in implementing
 {code}
 public void setStoreFuncUDFContextSignature(String signature)
 {
 }
 {code}
 and then using that signature when getting the UDFContext.
 From the Pig manual:
 {quote}
 setStoreFunc!UDFContextSignature(): This method will be called by Pig both in 
 the front end and back end to pass a unique signature to the Storer. The 
 signature can be used to store into the UDFContext any information which the 
 Storer needs to store between various method invocations in the front end and 
 back end. The default implementation in StoreFunc has an empty body. This 
 method will be called before other methods.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2869) CassandraStorage does not function properly when used multiple times in a single pig script due to UDFContext sharing issues

2011-07-13 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2869:


Attachment: 2869-2.txt

Removed that String.  Also removed adding mutation twice and put in the nested 
exception in putNext into the IOException.  We've been meaning to add those 
last two items to one of these tickets.

 CassandraStorage does not function properly when used multiple times in a 
 single pig script due to UDFContext sharing issues
 

 Key: CASSANDRA-2869
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2869
 Project: Cassandra
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.7.2
Reporter: Grant Ingersoll
Assignee: Jeremy Hanna
 Attachments: 2869-2.txt, 2869.txt


 CassandraStorage appears to have threading issues along the lines of those 
 described at http://pig.markmail.org/message/oz7oz2x2dwp66eoz due to the 
 sharing of the UDFContext.
 I believe the fix lies in implementing
 {code}
 public void setStoreFuncUDFContextSignature(String signature)
 {
 }
 {code}
 and then using that signature when getting the UDFContext.
 From the Pig manual:
 {quote}
 setStoreFunc!UDFContextSignature(): This method will be called by Pig both in 
 the front end and back end to pass a unique signature to the Storer. The 
 signature can be used to store into the UDFContext any information which the 
 Storer needs to store between various method invocations in the front end and 
 back end. The default implementation in StoreFunc has an empty body. This 
 method will be called before other methods.
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2905) Add retry logic to ColumnFamilyRecordReader

2011-07-16 Thread Jeremy Hanna (JIRA)
Add retry logic to ColumnFamilyRecordReader
---

 Key: CASSANDRA-2905
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2905
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna


One thing that would improve the built-in ColumnFamilyRecordReader is some 
retry logic if it times out on hasNext.  It could help in addition to setting 
the rpc_timeout_in_ms, so that timeouts happen less frequently so there are 
fewer blacklisted task trackers (which are the result of an error, including 
the timeout).

{code}
java.lang.RuntimeException: TimedOutException() at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264)
 at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279)
 at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176)
 at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) 
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135)
 at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
 at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455)
 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at 
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:268) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
 at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: 
TimedOutException() at 
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104)
 at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732)
 at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
 at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242)
 ... 17 more
{code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2905) Add retry logic to ColumnFamilyRecordReader

2011-07-16 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2905:


Description: 
One thing that would improve the built-in ColumnFamilyRecordReader is some 
retry logic if it times out on hasNext.  It could help in addition to setting 
the rpc_timeout_in_ms, so that timeouts happen less frequently so there are 
fewer blacklisted task trackers (which are the result of an error, including 
the timeout).

{quote}
java.lang.RuntimeException: TimedOutException() at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264)
 at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279)
 at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176)
 at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) 
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135)
 at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
 at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455)
 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at 
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:268) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
 at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: 
TimedOutException() at 
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104)
 at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732)
 at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
 at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242)
 ... 17 more
{quote}

  was:
One thing that would improve the built-in ColumnFamilyRecordReader is some 
retry logic if it times out on hasNext.  It could help in addition to setting 
the rpc_timeout_in_ms, so that timeouts happen less frequently so there are 
fewer blacklisted task trackers (which are the result of an error, including 
the timeout).

{code}
java.lang.RuntimeException: TimedOutException() at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264)
 at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279)
 at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176)
 at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
 at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) 
at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135)
 at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
 at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455)
 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at 
org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at 
org.apache.hadoop.mapred.Child$4.run(Child.java:268) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:396) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
 at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: 
TimedOutException() at 
org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104)
 at 
org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732)
 at 
org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
 at 
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242)
 ... 17 more
{code}


 Add retry logic to ColumnFamilyRecordReader
 ---

 Key: 

[jira] [Updated] (CASSANDRA-2905) Add retry logic to ColumnFamilyRecordReader

2011-07-16 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2905:


Priority: Minor  (was: Major)

 Add retry logic to ColumnFamilyRecordReader
 ---

 Key: CASSANDRA-2905
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2905
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop

 One thing that would improve the built-in ColumnFamilyRecordReader is some 
 retry logic if it times out on hasNext.  It could help in addition to setting 
 the rpc_timeout_in_ms, so that timeouts happen less frequently so there are 
 fewer blacklisted task trackers (which are the result of an error, including 
 the timeout).
 {quote}
 java.lang.RuntimeException: TimedOutException() at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264)
  at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279)
  at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176)
  at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
  at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) 
 at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135)
  at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) 
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
  at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455)
  at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) 
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at 
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at 
 org.apache.hadoop.mapred.Child$4.run(Child.java:268) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 javax.security.auth.Subject.doAs(Subject.java:396) at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
  at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: 
 TimedOutException() at 
 org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104)
  at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732)
  at 
 org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
  at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242)
  ... 17 more
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2905) Add retry logic to ColumnFamilyRecordReader

2011-07-18 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067323#comment-13067323
 ] 

Jeremy Hanna commented on CASSANDRA-2905:
-

Good point.  I'll put that in there with the default just being current 
behavior to begin with.

 Add retry logic to ColumnFamilyRecordReader
 ---

 Key: CASSANDRA-2905
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2905
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop

 One thing that would improve the built-in ColumnFamilyRecordReader is some 
 retry logic if it times out on hasNext.  It could help in addition to setting 
 the rpc_timeout_in_ms, so that timeouts happen less frequently so there are 
 fewer blacklisted task trackers (which are the result of an error, including 
 the timeout).
 {quote}
 java.lang.RuntimeException: TimedOutException() at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:264)
  at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:279)
  at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:176)
  at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
  at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131) 
 at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:135)
  at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) 
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
  at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:455)
  at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) 
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at 
 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:646) at 
 org.apache.hadoop.mapred.MapTask.run(MapTask.java:322) at 
 org.apache.hadoop.mapred.Child$4.run(Child.java:268) at 
 java.security.AccessController.doPrivileged(Native Method) at 
 javax.security.auth.Subject.doAs(Subject.java:396) at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
  at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: 
 TimedOutException() at 
 org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12104)
  at 
 org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:732)
  at 
 org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:704)
  at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:242)
  ... 17 more
 {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2932) Implement assume in cqlsh

2011-07-21 Thread Jeremy Hanna (JIRA)
Implement assume in cqlsh
---

 Key: CASSANDRA-2932
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2932
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
Priority: Minor


In the CLI there is a handy way to assume validators.  It would be very nice to 
have the assume command in cqlsh as well.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2940) Make rpc_timeout_in_ms into a jmx mbean property

2011-07-22 Thread Jeremy Hanna (JIRA)
Make rpc_timeout_in_ms into a jmx mbean property


 Key: CASSANDRA-2940
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2940
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna


When using the hadoop integration especially, experimenting with 
rpc_timeout_in_ms is a pain if you have to restart every server in the cluster 
for it to take effect.  This would be an improvement to make it into a jmx 
mbean property to set it at runtime.  The yaml file could be updated separately 
so it would be persistent still.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2940) Make rpc_timeout_in_ms into a jmx mbean property

2011-07-22 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2940:


Labels: lhf  (was: )

 Make rpc_timeout_in_ms into a jmx mbean property
 

 Key: CASSANDRA-2940
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2940
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
  Labels: lhf

 When using the hadoop integration especially, experimenting with 
 rpc_timeout_in_ms is a pain if you have to restart every server in the 
 cluster for it to take effect.  This would be an improvement to make it into 
 a jmx mbean property to set it at runtime.  The yaml file could be updated 
 separately so it would be persistent still.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-25 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2855:


Attachment: 2855.txt

Simple patch to skip results that have no values for the key.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.7.9, 0.8.3

 Attachments: 2855.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-25 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13070764#comment-13070764
 ] 

Jeremy Hanna commented on CASSANDRA-2855:
-

Brandon was saying that empty slice comment only referred to core Cassandra, so 
in the CFRR I just skipped any key didn't have values - hoping that 
isSetColumns handles all cases for that.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.7.9, 0.8.3

 Attachments: 2855.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-26 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2855:


Attachment: (was: 2855.txt)

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.7.9, 0.8.3


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-26 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2855:


Attachment: 2855-v2.txt

v2 is tested to skip results with no columns and tombstones.  Also fixed where 
an exception would occur because lastRow looked at the altered set of rows.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.7.9, 0.8.3

 Attachments: 2855-v2.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-28 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2855:


Attachment: 2855-v3.txt

Added a configuration property cassandra.skip.empty.results which defaults to 
false.  We can't skip just complete empty rows because there is no way to tell 
if the complete row is empty based on a result that is a slice predicate.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.7.9, 0.8.3

 Attachments: 2855-v2.txt, 2855-v3.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-07-28 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2855:


Attachment: (was: 2855-v3.txt)

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.7.9, 0.8.3

 Attachments: 2855-v2.txt, 2855-v3.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2970) Create separate read repair settings for intra- versus inter- datacenter

2011-07-29 Thread Jeremy Hanna (JIRA)
Create separate read repair settings for intra- versus inter- datacenter


 Key: CASSANDRA-2970
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2970
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jeremy Hanna
Priority: Minor


When doing read repair, it doesn't take into account the datacenter where the 
replicas are.  It simply does a repair based on the read repair chance.  Since 
multi-DC configurations would benefit from a lower chance between DC, it seems 
reasonable to have a separate setting for read repair between DCs than for 
within the DC.

Perhaps there could be a single property still, which would default both inter 
and intra for the sake of simple scenarios and backwards compatibility.  Then 
if a more specific setting is specified (read_repair_chance_global or 
read_repair_chance_local), that would be used.

I'm not sure if this would complicate matters too much, but it builds on 
CASSANDRA-982 and CASSANDRA-1530 to help with efficiency between datacenters, 
especially in the case of an analytics cluster versus a realtime cluster - 
making them tunably more independent.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (CASSANDRA-2378) Enhance hadoop support to take a list of hosts for the initial connection

2011-07-30 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna resolved CASSANDRA-2378.
-

Resolution: Duplicate

This improvement was added in CASSANDRA-2807

 Enhance hadoop support to take a list of hosts for the initial connection
 -

 Key: CASSANDRA-2378
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2378
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
  Labels: hadoop

 Currently the hadoop configuration accepts a host to do the initial 
 connection - from there it spreads the load around the ring with the input 
 splits.  Using rrdns with retry logic would be fine externally, but it would 
 be nice to have something built-in to handle multiple hosts.  So as we talked 
 about in the channel, the cassandra hadoop config variable for cassandra host 
 could accept a comma separated list of hosts that it could try for the 
 initial connection.  It could hit a random host and if that timed out, try 
 another host and return an error to the calling client if it couldn't succeed 
 with any of the hosts.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-1996) Update the hadoop mapreduce and pig examples with a richer set of text to work with.

2011-07-30 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-1996:


Description: 
It would be nice to have a more complicated set of text to do the word count 
over - perhaps a more interesting example too.  Based on CASSANDRA-1993, it 
would be nice for sanity checking purposes as well.

It might also be nice to either:
* Unify the mapreduce and pig examples to use the same input text
* Make them separate but have separate dataset creation (currently the pig 
example doesn't have any default data set creation)

  was:It would be nice to have a more complicated set of text to do the word 
count over - perhaps a more interesting example too.  Based on CASSANDRA-1993, 
it would be nice for sanity checking purposes as well.

Summary: Update the hadoop mapreduce and pig examples with a richer set 
of text to work with.  (was: Add to the hadoop integration contrib stuff - more 
complicated set of text)

 Update the hadoop mapreduce and pig examples with a richer set of text to 
 work with.
 

 Key: CASSANDRA-1996
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1996
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib, Hadoop
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Trivial
  Labels: lhf

 It would be nice to have a more complicated set of text to do the word count 
 over - perhaps a more interesting example too.  Based on CASSANDRA-1993, it 
 would be nice for sanity checking purposes as well.
 It might also be nice to either:
 * Unify the mapreduce and pig examples to use the same input text
 * Make them separate but have separate dataset creation (currently the pig 
 example doesn't have any default data set creation)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (CASSANDRA-1996) Update the hadoop mapreduce and pig examples with a richer set of text to work with.

2011-07-30 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna reassigned CASSANDRA-1996:
---

Assignee: (was: Jeremy Hanna)

 Update the hadoop mapreduce and pig examples with a richer set of text to 
 work with.
 

 Key: CASSANDRA-1996
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1996
 Project: Cassandra
  Issue Type: Improvement
  Components: Contrib, Hadoop
Reporter: Jeremy Hanna
Priority: Trivial
  Labels: lhf

 It would be nice to have a more complicated set of text to do the word count 
 over - perhaps a more interesting example too.  Based on CASSANDRA-1993, it 
 would be nice for sanity checking purposes as well.
 It might also be nice to either:
 * Unify the mapreduce and pig examples to use the same input text
 * Make them separate but have separate dataset creation (currently the pig 
 example doesn't have any default data set creation)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-08-03 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2855:


Attachment: 2855-v4.txt

v4 updates the config var to cassandra.skip.empty.rows and only does so if the 
slice predicate is empty.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.4

 Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-2998) Remove dependency on old version of cdh3 in any builds

2011-08-05 Thread Jeremy Hanna (JIRA)
Remove dependency on old version of cdh3 in any builds
--

 Key: CASSANDRA-2998
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2998
 Project: Cassandra
  Issue Type: Task
Reporter: Jeremy Hanna


This is nothing against cloudera or cdh.

For a time we had depended on a version of cdh3 which included a fix required 
to make hadoop output streaming work with Cassandra.  However, in 0.8.x, output 
streaming support has been removed.  We should therefore remove dependency on 
the cloudera maven repo and cdh and replace them with hadoop-0.20.203 or 
whatever the current stable release is for hadoop for the build.

This likely just involves removing the cloudera maven repo and replacing the 
cdh maven build dependencies with apache stable release dependencies.  This 
would be in the main build.xml.  Worth checking would be the dependencies in 
the examples and/or contrib modules.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2998) Remove dependency on old version of cdh3 in any builds

2011-08-05 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080201#comment-13080201
 ] 

Jeremy Hanna commented on CASSANDRA-2998:
-

Looks good, though I'm not sure if we even need to depend on hadoop-streaming 
any longer.

 Remove dependency on old version of cdh3 in any builds
 --

 Key: CASSANDRA-2998
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2998
 Project: Cassandra
  Issue Type: Task
Reporter: Jeremy Hanna
  Labels: lhf
 Fix For: 0.8.3, 1.0

 Attachments: 2998_0.8.txt, 2998_1.0.txt


 This is nothing against cloudera or cdh.
 For a time we had depended on a version of cdh3 which included a fix required 
 to make hadoop output streaming work with Cassandra.  However, in 0.8.x, 
 output streaming support has been removed.  We should therefore remove 
 dependency on the cloudera maven repo and cdh and replace them with 
 hadoop-0.20.203 or whatever the current stable release is for hadoop for the 
 build.
 This likely just involves removing the cloudera maven repo and replacing the 
 cdh maven build dependencies with apache stable release dependencies.  This 
 would be in the main build.xml.  Worth checking would be the dependencies in 
 the examples and/or contrib modules.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2221) 'show create' commands on the CLI to export schema

2011-08-08 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081175#comment-13081175
 ] 

Jeremy Hanna commented on CASSANDRA-2221:
-

Any update on this?  It's been patch available for a while and would be nice 
to have committed.

 'show create' commands on the CLI to export schema
 --

 Key: CASSANDRA-2221
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2221
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Assignee: Aaron Morton
Priority: Minor
  Labels: cli
 Fix For: 0.8.4

 Attachments: 0001-add-show-schema-statement-8.patch, 
 0001-add-show-schema-statement-v08-2.patch, 
 0001-add-show-schema-statement.patch


 It would be nice to have 'show create' type of commands on the command-line 
 so that it would generate the DDL for the schema.
 A scenario that would make this useful is where a team works out a data model 
 over time with a dev cluster.  They want to use parts of that schema for new 
 clusters that they create, like a staging/prod cluster.  It would be very 
 handy in this scenario to have some sort of export mechanism.
 Another use case is for testing purposes - you want to replicate a problem.
 We currently have schematool for import/export but that is deprecated and it 
 exports into yaml.
 This new feature would just be able to 'show' - or export if they want the 
 entire keyspace - into a script or commands that could be used in a cli 
 script.  It would need to be able to regenerate everything about the keyspace 
 including indexes and metadata.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3010) Java CQL command-line shell

2011-08-09 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081977#comment-13081977
 ] 

Jeremy Hanna commented on CASSANDRA-3010:
-

If I had to choose one, it would be nice to be more descriptive (describe 
versus \d).  However, it would be really nice to have a basic concept of 
synonyms.  For example mysql's cli supports both describe and desc.  Building 
that type of functionality in from the start shouldn't be too onerous.

 Java CQL command-line shell
 ---

 Key: CASSANDRA-3010
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3010
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.0


 We need a real CQL shell that:
 - does not require installing additional environments
 - includes show keyspaces and other introspection tools
 - does not break existing cli scripts
 I.e., it needs to be java, but it should be a new tool instead of replacing 
 the existing cli.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3010) Java CQL command-line shell

2011-08-09 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13081977#comment-13081977
 ] 

Jeremy Hanna edited comment on CASSANDRA-3010 at 8/9/11 10:36 PM:
--

If I had to choose one, it would be nice to be more descriptive (describe 
versus \d).  However, it would be really nice to have a basic concept of 
synonyms.  For example mysql's cli supports both describe and desc.  Building 
that type of functionality in from the start would hopefully not be too onerous.

  was (Author: jeromatron):
If I had to choose one, it would be nice to be more descriptive (describe 
versus \d).  However, it would be really nice to have a basic concept of 
synonyms.  For example mysql's cli supports both describe and desc.  Building 
that type of functionality in from the start shouldn't be too onerous.
  
 Java CQL command-line shell
 ---

 Key: CASSANDRA-3010
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3010
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.0


 We need a real CQL shell that:
 - does not require installing additional environments
 - includes show keyspaces and other introspection tools
 - does not break existing cli scripts
 I.e., it needs to be java, but it should be a new tool instead of replacing 
 the existing cli.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3010) Java CQL command-line shell

2011-08-10 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082466#comment-13082466
 ] 

Jeremy Hanna commented on CASSANDRA-3010:
-

bq. we can support all of those notations using synonyms in the ANTLR grammar
bq. The drawbacks here outweigh the positives, IMO.

I agree that it would be good to have one as an inspiration.  However, adding 
synonyms for common variations of the same command without any expectation of 
compatibility seems reasonable/trivial.  Are there other drawbacks besides the 
expectation of compatibility?

 Java CQL command-line shell
 ---

 Key: CASSANDRA-3010
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3010
 Project: Cassandra
  Issue Type: New Feature
  Components: Tools
Reporter: Jonathan Ellis
Assignee: Pavel Yaskevich
 Fix For: 1.0


 We need a real CQL shell that:
 - does not require installing additional environments
 - includes show keyspaces and other introspection tools
 - does not break existing cli scripts
 I.e., it needs to be java, but it should be a new tool instead of replacing 
 the existing cli.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1608) Redesigned Compaction

2011-08-22 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088528#comment-13088528
 ] 

Jeremy Hanna commented on CASSANDRA-1608:
-

Is this going to make it into the 1.0 release?  Seems like it's awfully close.

 Redesigned Compaction
 -

 Key: CASSANDRA-1608
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1608
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Chris Goffinet
Assignee: Benjamin Coverston
 Attachments: 1608-v11.txt, 1608-v13.txt, 1608-v2.txt


 After seeing the I/O issues in CASSANDRA-1470, I've been doing some more 
 thinking on this subject that I wanted to lay out.
 I propose we redo the concept of how compaction works in Cassandra. At the 
 moment, compaction is kicked off based on a write access pattern, not read 
 access pattern. In most cases, you want the opposite. You want to be able to 
 track how well each SSTable is performing in the system. If we were to keep 
 statistics in-memory of each SSTable, prioritize them based on most accessed, 
 and bloom filter hit/miss ratios, we could intelligently group sstables that 
 are being read most often and schedule them for compaction. We could also 
 schedule lower priority maintenance on SSTable's not often accessed.
 I also propose we limit the size of each SSTable to a fix sized, that gives 
 us the ability to  better utilize our bloom filters in a predictable manner. 
 At the moment after a certain size, the bloom filters become less reliable. 
 This would also allow us to group data most accessed. Currently the size of 
 an SSTable can grow to a point where large portions of the data might not 
 actually be accessed as often.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3056) Able to set path location of HeapDump in cassandra-env

2011-08-22 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088960#comment-13088960
 ] 

Jeremy Hanna commented on CASSANDRA-3056:
-

For what we're doing (David and I), it would likely be fine to default it to 
the log directory.  We've already put the log directory on a large volume 
because already heap dumps get put there (in the case of OOM exceptions).  
Having it flexible and easy to change would probably be a nice side-effect too, 
in case people want to have it be in a completely separate location.

 Able to set path location of HeapDump in cassandra-env
 --

 Key: CASSANDRA-3056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3056
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.7.8, 0.8.4
Reporter: David Talbott
Priority: Minor
  Labels: lhf

 We should be able to designate the path location to put any perf dumps that 
 are performed. By Default with this not set the perf dump can occur on the 
 root disk and fill the drive. 
 Should be able to solve this by simply inserting JVM_OPTS=$JVM_OPTS 
 -XX:HeapDumpPath=path to dir into cassandra-env.sh as a default option 
 available and set. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2855) Skip rows with empty columns when slicing entire row

2011-08-22 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089045#comment-13089045
 ] 

Jeremy Hanna commented on CASSANDRA-2855:
-

True - wouldn't matter.

 Skip rows with empty columns when slicing entire row
 

 Key: CASSANDRA-2855
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
 Project: Cassandra
  Issue Type: Improvement
  Components: API
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor
  Labels: hadoop
 Fix For: 0.8.5

 Attachments: 2855-v2.txt, 2855-v3.txt, 2855-v4.txt


 We have been finding that range ghosts appear in results from Hadoop via Pig. 
  This could also happen if rows don't have data for the slice predicate that 
 is given.  This leads to having to do a painful amount of defensive checking 
 on the Pig side, especially in the case of range ghosts.
 We would like to add an option to skip rows that have no column values in it. 
  That functionality existed before in core Cassandra but was removed because 
 of the performance penalty of that checking.  However with Hadoop support in 
 the RecordReader, that is batch oriented anyway, so individual row reading 
 performance isn't as much of an issue.  Also we would make it an optional 
 config parameter for each job anyway, so people wouldn't have to incur that 
 penalty if they are confident that there won't be those empty rows or they 
 don't care.
 It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3056) Able to set path location of HeapDump in cassandra-env

2011-08-29 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092901#comment-13092901
 ] 

Jeremy Hanna commented on CASSANDRA-3056:
-

Do people generally configure their systems to have enough space in /tmp for 
the whole Cassandra heap?

 Able to set path location of HeapDump in cassandra-env
 --

 Key: CASSANDRA-3056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3056
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.7.8, 0.8.4
Reporter: David Talbott
Priority: Minor
  Labels: lhf
 Attachments: CASSANDRA-3056-1.txt


 We should be able to designate the path location to put any perf dumps that 
 are performed. By Default with this not set the perf dump can occur on the 
 root disk and fill the drive. 
 Should be able to solve this by simply inserting JVM_OPTS=$JVM_OPTS 
 -XX:HeapDumpPath=path to dir into cassandra-env.sh as a default option 
 available and set. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3100) Secondary index still does minor compacting after deleting index

2011-08-29 Thread Jeremy Hanna (JIRA)
Secondary index still does minor compacting after deleting index


 Key: CASSANDRA-3100
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3100
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.8
Reporter: Jeremy Hanna


We deleted all of our secondary indexes.  A couple of days later I was watching 
compactionstats on one of the nodes and it was in the process of minor 
compacting one of the deleted secondary indexes.  I double checked the keyspace 
definitions on the CLI and there were no secondary indexes defined.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3056) Able to set path location of HeapDump in cassandra-env

2011-08-29 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092901#comment-13092901
 ] 

Jeremy Hanna edited comment on CASSANDRA-3056 at 8/29/11 4:20 PM:
--

Do people generally configure their systems to have enough space in /tmp for 
the whole Cassandra heap?  I guess /tmp is generally the best place though.

  was (Author: jeromatron):
Do people generally configure their systems to have enough space in /tmp 
for the whole Cassandra heap?
  
 Able to set path location of HeapDump in cassandra-env
 --

 Key: CASSANDRA-3056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3056
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.7.8, 0.8.4
Reporter: David Talbott
Priority: Minor
  Labels: lhf
 Attachments: CASSANDRA-3056-1.txt, CASSANDRA-3056-2.txt


 We should be able to designate the path location to put any perf dumps that 
 are performed. By Default with this not set the perf dump can occur on the 
 root disk and fill the drive. 
 Should be able to solve this by simply inserting JVM_OPTS=$JVM_OPTS 
 -XX:HeapDumpPath=path to dir into cassandra-env.sh as a default option 
 available and set. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-3056) Able to set path location of HeapDump in cassandra-env

2011-08-29 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092901#comment-13092901
 ] 

Jeremy Hanna edited comment on CASSANDRA-3056 at 8/29/11 4:21 PM:
--

Do people generally configure their systems to have enough space in /tmp for 
the whole Cassandra heap?  What about the cassandra log directory as a default? 
 /tmp seems fine but this bit us because the heap dump is so big.

  was (Author: jeromatron):
Do people generally configure their systems to have enough space in /tmp 
for the whole Cassandra heap?  I guess /tmp is generally the best place though.
  
 Able to set path location of HeapDump in cassandra-env
 --

 Key: CASSANDRA-3056
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3056
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 0.7.8, 0.8.4
Reporter: David Talbott
Priority: Minor
  Labels: lhf
 Attachments: CASSANDRA-3056-1.txt, CASSANDRA-3056-2.txt


 We should be able to designate the path location to put any perf dumps that 
 are performed. By Default with this not set the perf dump can occur on the 
 root disk and fill the drive. 
 Should be able to solve this by simply inserting JVM_OPTS=$JVM_OPTS 
 -XX:HeapDumpPath=path to dir into cassandra-env.sh as a default option 
 available and set. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3100) Secondary index still does minor compacting after deleting index

2011-08-29 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13093255#comment-13093255
 ] 

Jeremy Hanna commented on CASSANDRA-3100:
-

Also relevant, when bootstrapping a new node into the ring, it streamed data 
from a node and in the compactionstats I also see it's trying to build a 
secondary index that shouldn't be there.

 Secondary index still does minor compacting after deleting index
 

 Key: CASSANDRA-3100
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3100
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.8
Reporter: Jeremy Hanna

 We deleted all of our secondary indexes.  A couple of days later I was 
 watching compactionstats on one of the nodes and it was in the process of 
 minor compacting one of the deleted secondary indexes.  I double checked the 
 keyspace definitions on the CLI and there were no secondary indexes defined.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2936) improve dependency situation between JDBC driver and Cassandra

2011-09-01 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095553#comment-13095553
 ] 

Jeremy Hanna commented on CASSANDRA-2936:
-

Rick - does this have to do with using patch -p0 ... versus using patch -p1 
... when trying to apply the patches?

 improve dependency situation between JDBC driver and Cassandra
 --

 Key: CASSANDRA-2936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2936
 Project: Cassandra
  Issue Type: Improvement
  Components: API, Core
Affects Versions: 0.8.1
Reporter: Eric Evans
Assignee: Eric Evans
Priority: Minor
  Labels: cql
 Fix For: 1.0

 Attachments: 
 v2-0001-CASSANDRA-2936-create-package-for-CQL-term-marshaling.txt, 
 v2-0002-convert-drivers-and-tests-to-o.a.c.cql.term.txt, 
 v2-0003-remove-extraneous-methods-from-o.a.c.db.marshal-classe.txt, 
 v2-0004-make-better-reuse-of-new-classes.txt, v2-0005-create-jar-file.txt


 The JDBC jar currently depends on the {{apache-cassandra-$version}} jar, 
 despite the fact that it only (directly) uses a handful of Cassandra's 
 classes.  In a perfect world, we'd break those classes out into their own jar 
 which both the JDBC driver and Cassandra (ala 
 {{apache-cassandra-$version.jar}}) could depend on.  However, the classes 
 used directly don't fall out to anything that makes much sense 
 organizationally (short of creating a 
 {{apache-cassandra-misc-$version.jar}}), and the situation only gets worse 
 when you take into account all of the transitive dependencies.
 See CASSANDRA-2761 for more background, in particular 
 ([this|https://issues.apache.org/jira/browse/CASSANDRA-2761?focusedCommentId=13048734page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13048734]
  and 
 [this|https://issues.apache.org/jira/browse/CASSANDRA-2761?focusedCommentId=13050884page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13050884])

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-1497) Add input support for Hadoop Streaming

2011-09-03 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096754#comment-13096754
 ] 

Jeremy Hanna commented on CASSANDRA-1497:
-

I don't think anyone was ever particularly against allowing hadoop streaming 
functionality.  I think there just wasn't the interest for a while.  On the 
input side, it will also require CASSANDRA-2799 which should be trivial.

 Add input support for Hadoop Streaming
 --

 Key: CASSANDRA-1497
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
 Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch


 related to CASSANDRA-1368 - create similar functionality for input streaming.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3134) Patch Hadoop Streaming Source to Support Cassandra IO

2011-09-03 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096809#comment-13096809
 ] 

Jeremy Hanna commented on CASSANDRA-3134:
-

fwiw - it might be simpler but not sure that you necessarily need CDH's 
streaming jar.  Could HADOOP-1722 be backported to 0.20.203 by itself?  That 
would allow it to be seamlessly integrated into Brisk as well.

btw, this sounds great - both streaming support as well as seamless support in 
hadoopy and dumbo.

 Patch Hadoop Streaming Source to Support Cassandra IO
 -

 Key: CASSANDRA-3134
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3134
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Brandyn White
Priority: Minor
  Labels: hadoop, hadoop_examples_streaming
   Original Estimate: 504h
  Remaining Estimate: 504h

 (text is a repost from 
 [CASSANDRA-1497|https://issues.apache.org/jira/browse/CASSANDRA-1497])
 I'm the author of the Hadoopy http://bwhite.github.com/hadoopy/ python 
 library and I'm interested in taking another stab at streaming support. 
 Hadoopy and Dumbo both use the TypedBytes format that is in CDH for 
 communication with the streaming jar. A simple way to get this to work is 
 modify the streaming code (make hadoop-cassandra-streaming.jar) so that it 
 uses the same TypedBytes communication with streaming programs, but the 
 actual job IO is using the Cassandra IO. The user would have the exact same 
 streaming interface, but the user would specify the keyspace, etc using 
 environmental variables.
 The benefits of this are
 1. Easy implementation: Take the cloudera-patched version of streaming and 
 change the IO, add environmental variable reading.
 2. Only Client side: As the streaming jar is included in the job, no server 
 side changes are required.
 3. Simple maintenance: If the Hadoop Cassandra interface changes, then this 
 would require the same simple fixup as any other Hadoop job.
 4. The TypedBytes format supports all of the necessary Cassandara types 
 (https://issues.apache.org/jira/browse/HADOOP-5450)
 5. Compatible with existing streaming libraries: Hadoopy and dumbo would only 
 need to know the path of this new streaming jar
 6. No need for avro
 The negatives of this are
 1. Duplicative code: This would be a dupe and patch of the streaming jar. 
 This can be stored itself as a patch.
 2. I'd have to check but this solution should work on a stock hadoop (cluster 
 side) but it requires TypedBytes (client side) which can be included in the 
 jar.
 I can code this up but I wanted to get some feedback from the community first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3134) Patch Hadoop Streaming Source to Support Cassandra IO

2011-09-06 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098047#comment-13098047
 ] 

Jeremy Hanna commented on CASSANDRA-3134:
-

Jonathan - any thoughts based on the update from Brandyn?

 Patch Hadoop Streaming Source to Support Cassandra IO
 -

 Key: CASSANDRA-3134
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3134
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Brandyn White
Priority: Minor
  Labels: hadoop, hadoop_examples_streaming
   Original Estimate: 504h
  Remaining Estimate: 504h

 (text is a repost from 
 [CASSANDRA-1497|https://issues.apache.org/jira/browse/CASSANDRA-1497])
 I'm the author of the Hadoopy http://bwhite.github.com/hadoopy/ python 
 library and I'm interested in taking another stab at streaming support. 
 Hadoopy and Dumbo both use the TypedBytes format that is in CDH for 
 communication with the streaming jar. A simple way to get this to work is 
 modify the streaming code (make hadoop-cassandra-streaming.jar) so that it 
 uses the same TypedBytes communication with streaming programs, but the 
 actual job IO is using the Cassandra IO. The user would have the exact same 
 streaming interface, but the user would specify the keyspace, etc using 
 environmental variables.
 The benefits of this are
 1. Easy implementation: Take the cloudera-patched version of streaming and 
 change the IO, add environmental variable reading.
 2. Only Client side: As the streaming jar is included in the job, no server 
 side changes are required.
 3. Simple maintenance: If the Hadoop Cassandra interface changes, then this 
 would require the same simple fixup as any other Hadoop job.
 4. The TypedBytes format supports all of the necessary Cassandara types 
 (https://issues.apache.org/jira/browse/HADOOP-5450)
 5. Compatible with existing streaming libraries: Hadoopy and dumbo would only 
 need to know the path of this new streaming jar
 6. No need for avro
 The negatives of this are
 1. Duplicative code: This would be a dupe and patch of the streaming jar. 
 This can be stored itself as a patch.
 2. I'd have to check but this solution should work on a stock hadoop (cluster 
 side) but it requires TypedBytes (client side) which can be included in the 
 jar.
 I can code this up but I wanted to get some feedback from the community first.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3100) Secondary index still does minor compacting after deleting index

2011-09-07 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13099282#comment-13099282
 ] 

Jeremy Hanna commented on CASSANDRA-3100:
-

I'm not 100% sure - we've since moved to 0.8.4 and I haven't seen it happen 
again.

 Secondary index still does minor compacting after deleting index
 

 Key: CASSANDRA-3100
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3100
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.7.8
Reporter: Jeremy Hanna
 Fix For: 0.7.10


 We deleted all of our secondary indexes.  A couple of days later I was 
 watching compactionstats on one of the nodes and it was in the process of 
 minor compacting one of the deleted secondary indexes.  I double checked the 
 keyspace definitions on the CLI and there were no secondary indexes defined.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3170) Schema versions output should be on separate lines for separate versions

2011-09-09 Thread Jeremy Hanna (JIRA)
Schema versions output should be on separate lines for separate versions


 Key: CASSANDRA-3170
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3170
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeremy Hanna
Priority: Minor


On the CLI if you do a 'describe cluster;' it would be really nice to have 
different versions on different lines or some way to call out multiple versions 
more.  Right now it's a UUID with a list of nodes which is hard to distinguish 
between one version and multiple versions at a glance.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-3236) nodetool scrub hangs at 100% complete

2011-09-21 Thread Jeremy Hanna (JIRA)
nodetool scrub hangs at 100% complete
-

 Key: CASSANDRA-3236
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3236
 Project: Cassandra
  Issue Type: Bug
Affects Versions: 0.8.6, 0.8.4
Reporter: Jeremy Hanna
Priority: Minor


After running nodetool scrub on our staging and production cluster, it would 
hang on 100% complete.  Eventually I restarted the nodes to get rid of those 
messages in compactionstats.  This was on 0.8.4.  In IRC (mvdir on 0.8.4 and 
jborg on 0.8.6) had the same experience.  Granted it was probably just fine to 
restart the servers as it had probably completed long before the restart.  
However, it's unnerving to do that when it involves production data.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3255) Sstable scrub status persists in compactionstats after scrub is complete

2011-09-24 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114117#comment-13114117
 ] 

Jeremy Hanna commented on CASSANDRA-3255:
-

Is this the same as CASSANDRA-3236?

 Sstable scrub status persists in compactionstats after scrub is complete
 

 Key: CASSANDRA-3255
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3255
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.8.5
Reporter: Jason Harvey
Assignee: Pavel Yaskevich
Priority: Trivial
  Labels: compaction
 Fix For: 0.8.7


 When scrubbing the sstables on a node, the 'Scrub' info persists in the 
 'compactionstats' nodetool utility, even after the scrub is complete.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4240) Only check the size of indexed column values when they are of type KEYS

2012-07-18 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13417646#comment-13417646
 ] 

Jeremy Hanna commented on CASSANDRA-4240:
-

I tested patch 4 on a three node cluster that was using DSE's alternate 
secondary index implementation that allows for larger values for indexed column 
values.  Worked fine.

Jake would you mind reviewing the later patch?

 Only check the size of indexed column values when they are of type KEYS
 ---

 Key: CASSANDRA-4240
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4240
 Project: Cassandra
  Issue Type: Improvement
Affects Versions: 1.0.0
Reporter: Joaquin Casares
Priority: Minor
  Labels: datastax_qa
 Attachments: CASSANDRA-4240.patch, CASSANDRA-4240.patch1, 
 cassandra-1.0.8-CASSANDRA-4240-patch2.txt, 
 cassandra-1.0.8-CASSANDRA-4240-patch3.txt, 
 cassandra-1.0.8-CASSANDRA-4240-patch4.txt, cassandra-1.0.8-CASSANDRA-4240.txt


 https://github.com/apache/cassandra/blob/cassandra-1.0.8/src/java/org/apache/cassandra/thrift/ThriftValidation.java#L431
 That line states that: Indexed column values cannot be larger than 64K. But 
 in some cases we would want the column values to be able to be larger than 
 64k, specifically if the index_type is not of type KEYS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4275) Oracle Java 1.7 u4 does not allow Xss128k

2012-07-20 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419602#comment-13419602
 ] 

Jeremy Hanna commented on CASSANDRA-4275:
-

Should this be reopened to set it to -Xss256k?  Both Ed Capriolo and another 
person posting in IRC have found 160 insufficient.  Ed is running in production 
with 256 and the other person is changing to 256.
Rav|2 how should I set -Xss for oracle java 7? 160k causes 
java.lang.StackOverflowError :(
ecapriolo 256 or higher
Rav|2 ecapriolo: everything is back to normal with 256. big thanks :)

 Oracle Java 1.7 u4 does not allow Xss128k
 -

 Key: CASSANDRA-4275
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4275
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9, 1.1.0
Reporter: Edward Capriolo
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 4275.txt, trunk-cassandra-4275.1.patch.txt, 
 v1-0001-CASSANDRA-4275-Use-JVM-s-reported-minimum-stack-size-o.txt


 Problem: This happens when you try to start it with default Xss setting of 
 128k
 ===
 The stack size specified is too small, Specify at least 160k
 Error: Could not create the Java Virtual Machine.
 Error: A fatal exception has occurred. Program will exit.
 Solution
 ===
 Set -Xss to 256k
 Problem: This happens when you try to start it with Xss = 160k
 
 ERROR [Thrift:14] 2012-05-22 14:42:40,479 AbstractCassandraDaemon.java (line 
 139) Fatal exception in thread Thread[Thrift:14,5,main]
 java.lang.StackOverflowError
 Solution
 ===
 Set -Xss to 256k

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-4457) Find the cause for the need for a larger stack size with jdk 7

2012-07-21 Thread Jeremy Hanna (JIRA)
Jeremy Hanna created CASSANDRA-4457:
---

 Summary: Find the cause for the need for a larger stack size with 
jdk 7
 Key: CASSANDRA-4457
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4457
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremy Hanna
Priority: Minor


Based on discussions post CASSANDRA-4275, it appears that on jdk 7 that the 
minimum stack size needs to be set to something higher than 160k.  That 
shouldn't be necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4275) Oracle Java 1.7 u4 does not allow Xss128k

2012-07-21 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419859#comment-13419859
 ] 

Jeremy Hanna commented on CASSANDRA-4275:
-

I'll create another ticket then.  The problem is that out of the box it sounds 
like it just doesn't work for people.  I agree that it's better to find the 
root cause, but leaving it at a lower level and having everyone who uses it 
with jdk 7 have the same error until they go into IRC and ask and get the 
workaround is also undesirable in the meantime.

Could 256k be committed for now until the new ticket is resolved so people can 
have it start up properly in the meantime?

Created CASSANDRA-4457.

 Oracle Java 1.7 u4 does not allow Xss128k
 -

 Key: CASSANDRA-4275
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4275
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9, 1.1.0
Reporter: Edward Capriolo
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 4275.txt, trunk-cassandra-4275.1.patch.txt, 
 v1-0001-CASSANDRA-4275-Use-JVM-s-reported-minimum-stack-size-o.txt


 Problem: This happens when you try to start it with default Xss setting of 
 128k
 ===
 The stack size specified is too small, Specify at least 160k
 Error: Could not create the Java Virtual Machine.
 Error: A fatal exception has occurred. Program will exit.
 Solution
 ===
 Set -Xss to 256k
 Problem: This happens when you try to start it with Xss = 160k
 
 ERROR [Thrift:14] 2012-05-22 14:42:40,479 AbstractCassandraDaemon.java (line 
 139) Fatal exception in thread Thread[Thrift:14,5,main]
 java.lang.StackOverflowError
 Solution
 ===
 Set -Xss to 256k

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4275) Oracle Java 1.7 u4 does not allow Xss128k

2012-07-21 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419894#comment-13419894
 ] 

Jeremy Hanna commented on CASSANDRA-4275:
-

Fair enough.  A clear notification would be nice so that they know to update 
the value without checking the user list or IRC.  It sounds like Ed and others 
are starting to use JDK7 in production based on the needs of their 
organization, so it would be nice to have something documented about support 
for JDK7 and tweaking as you've mentioned.

 Oracle Java 1.7 u4 does not allow Xss128k
 -

 Key: CASSANDRA-4275
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4275
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.9, 1.1.0
Reporter: Edward Capriolo
Assignee: Sylvain Lebresne
 Fix For: 1.1.2

 Attachments: 4275.txt, trunk-cassandra-4275.1.patch.txt, 
 v1-0001-CASSANDRA-4275-Use-JVM-s-reported-minimum-stack-size-o.txt


 Problem: This happens when you try to start it with default Xss setting of 
 128k
 ===
 The stack size specified is too small, Specify at least 160k
 Error: Could not create the Java Virtual Machine.
 Error: A fatal exception has occurred. Program will exit.
 Solution
 ===
 Set -Xss to 256k
 Problem: This happens when you try to start it with Xss = 160k
 
 ERROR [Thrift:14] 2012-05-22 14:42:40,479 AbstractCassandraDaemon.java (line 
 139) Fatal exception in thread Thread[Thrift:14,5,main]
 java.lang.StackOverflowError
 Solution
 ===
 Set -Xss to 256k

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4459) pig driver casts ints as bytearray

2012-07-24 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13421502#comment-13421502
 ] 

Jeremy Hanna commented on CASSANDRA-4459:
-

fwiw - see https://issues.apache.org/jira/browse/PIG-2764 for the addition of 
BigInteger and BigDecimal as built-in pig data types.  Also, I'm not sure how 
much of an issue it is for users to use pig ints for now because I don't know 
how many users know that the cassandra IntegerType is actually a BigInteger and 
not just a regular Integer.  That's not to say that it's not dangerous to try 
to put a BigInteger value into an Integer type.  It's just that I don't know if 
it's common knowledge that Cassandra uses a BigInteger underneath.

 pig driver casts ints as bytearray
 --

 Key: CASSANDRA-4459
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4459
 Project: Cassandra
  Issue Type: Bug
 Environment: C* 1.1.2 embedded in DSE
Reporter: Cathy Daw
Assignee: Brandon Williams
 Fix For: 1.1.3

 Attachments: 4459.txt


 we seem to be auto-mapping C* int columns to bytearray in Pig, and farther 
 down I can't seem to find a way to cast that to int and do an average.  
 {code}
 grunt cassandra_users = LOAD 'cassandra://cqldb/users' USING 
 CassandraStorage();
 grunt dump cassandra_users;
 (bobhatter,(act,22),(fname,bob),(gender,m),(highSchool,Cal 
 High),(lname,hatter),(sat,500),(state,CA),{})
 (alicesmith,(act,27),(fname,alice),(gender,f),(highSchool,Tuscon 
 High),(lname,smith),(sat,650),(state,AZ),{})
  
 // notice sat and act columns are bytearray values 
 grunt describe cassandra_users;
 cassandra_users: {key: chararray,act: (name: chararray,value: 
 bytearray),fname: (name: chararray,value: chararray),
 gender: (name: chararray,value: chararray),highSchool: (name: 
 chararray,value: chararray),lname: (name: chararray,value: chararray),
 sat: (name: chararray,value: bytearray),state: (name: chararray,value: 
 chararray),columns: {(name: chararray,value: chararray)}}
 grunt users_by_state = GROUP cassandra_users BY state;
 grunt dump users_by_state;
 ((state,AX),{(aoakley,(highSchool,Phoenix 
 High),(lname,Oakley),state,(act,22),(sat,500),(gender,m),(fname,Anne),{})})
 ((state,AZ),{(gjames,(highSchool,Tuscon 
 High),(lname,James),state,(act,24),(sat,650),(gender,f),(fname,Geronomo),{})})
 ((state,CA),{(philton,(highSchool,Beverly 
 High),(lname,Hilton),state,(act,37),(sat,220),(gender,m),(fname,Paris),{}),(jbrown,(highSchool,Cal
  High),(lname,Brown),state,(act,20),(sat,700),(gender,m),(fname,Jerry),{})})
 // Error - use explicit cast
 grunt user_avg = FOREACH users_by_state GENERATE cassandra_users.state, 
 AVG(cassandra_users.sat);
 grunt dump user_avg;
 2012-07-22 17:15:04,361 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1045: Could not infer the matching function for org.apache.pig.builtin.AVG as 
 multiple or none of them fit. Please use an explicit cast.
 // Unable to cast as int
 grunt user_avg = FOREACH users_by_state GENERATE cassandra_users.state, 
 AVG((int)cassandra_users.sat);
 grunt dump user_avg;
 2012-07-22 17:07:39,217 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1052: Cannot cast bag with schema sat: bag({name: chararray,value: 
 bytearray}) to int
 {code}
 *Seed data in CQL*
 {code}
 CREATE KEYSPACE cqldb with 
   strategy_class = 'org.apache.cassandra.locator.SimpleStrategy' 
   and strategy_options:replication_factor=3;  
 use cqldb;
 CREATE COLUMNFAMILY users (
   KEY text PRIMARY KEY, 
   fname text, lname text, gender varchar, 
   act int, sat int, highSchool text, state varchar);
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (gjames, Geronomo, James, f, 24, 650, 'Tuscon High', 'AZ');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (aoakley, Anne, Oakley, m , 22, 500, 'Phoenix High', 'AX');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (jbrown, Jerry, Brown, m , 20, 700, 'Cal High', 'CA');
 insert into users (KEY, fname, lname, gender, act, sat, highSchool, state)
 values (philton, Paris, Hilton, m , 37, 220, 'Beverly High', 'CA');
 select * from users;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3974) Per-CF TTL

2012-08-02 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427448#comment-13427448
 ] 

Jeremy Hanna commented on CASSANDRA-3974:
-

bq. If we're just going to have CF TTL being sugar for clients too lazy to 
apply what they want, then I'm not interested.

Also if that client happens to be Pig or Hive, there's not currently a way to 
set TTLs.  So in that case it's not laziness of the client.

A use case: I don't want to MapReduce over my giant archival column family so 
when ingesting data, I'll write to my archival column family and in addition a 
column family with a default TTL or however it's implemented, so it would just 
be data from the last 30 days.

 Per-CF TTL
 --

 Key: CASSANDRA-3974
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3974
 Project: Cassandra
  Issue Type: New Feature
Affects Versions: 1.2
Reporter: Jonathan Ellis
Assignee: Kirk True
Priority: Minor
 Fix For: 1.2

 Attachments: trunk-3974.txt, trunk-3974v2.txt, trunk-3974v3.txt, 
 trunk-3974v4.txt


 Per-CF TTL would allow compaction optimizations (drop an entire sstable's 
 worth of expired data) that we can't do with per-column.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-4549) Update the pig examples to include more recent pig/cassandra features

2012-08-16 Thread Jeremy Hanna (JIRA)
Jeremy Hanna created CASSANDRA-4549:
---

 Summary: Update the pig examples to include more recent 
pig/cassandra features
 Key: CASSANDRA-4549
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4549
 Project: Cassandra
  Issue Type: Task
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
Priority: Minor


Now that there is support for a variety of Cassandra features from Pig (esp 
1.1+), it would great to have some of them in the examples so that people can 
see how to use them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-4562) Cli getting odd states for Currently building index

2012-08-21 Thread Jeremy Hanna (JIRA)
Jeremy Hanna created CASSANDRA-4562:
---

 Summary: Cli getting odd states for Currently building index
 Key: CASSANDRA-4562
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4562
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Tools
Reporter: Jeremy Hanna
Priority: Minor


Whenever the cli outputs keyspace/column family data, if it's building an 
index, it will show the status of that build at the bottom of the output.  It 
looks like it's sometimes getting into a bad state.  One person reported seeing:
Currently building index index_name, completed d != java.lang.String

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-4563) Remove nodetool setcachecapcity

2012-08-21 Thread Jeremy Hanna (JIRA)
Jeremy Hanna created CASSANDRA-4563:
---

 Summary: Remove nodetool setcachecapcity
 Key: CASSANDRA-4563
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4563
 Project: Cassandra
  Issue Type: Task
  Components: Core
Affects Versions: 1.1.3
Reporter: Jeremy Hanna
Priority: Minor


nodetool setcachecapacity is now obsolete so it should be removed as it 
confuses users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4562) Cli getting odd states for Currently building index

2012-08-21 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439000#comment-13439000
 ] 

Jeremy Hanna commented on CASSANDRA-4562:
-

It was anecdotal but it did appear when the user was running upgradesstables, 
which is different than building the indexes from scratch.  Maybe there's a 
clue in that.

 Cli getting odd states for Currently building index
 -

 Key: CASSANDRA-4562
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4562
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Tools
Reporter: Jeremy Hanna
Priority: Minor

 Whenever the cli outputs keyspace/column family data, if it's building an 
 index, it will show the status of that build at the bottom of the output.  It 
 looks like it's sometimes getting into a bad state.  One person reported 
 seeing:
 Currently building index index_name, completed d != java.lang.String

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4562) Cli getting odd states for Currently building index

2012-08-21 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13439007#comment-13439007
 ] 

Jeremy Hanna commented on CASSANDRA-4562:
-

Heh, so I suppose I just meant to say that the building status getting shown in 
the cli output even when that's not what is happening.  So it doesn't have 
correct data and looks bad.  However, since the CLI is deprecated, I'm not sure 
it's worth fixing.

 Cli getting odd states for Currently building index
 -

 Key: CASSANDRA-4562
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4562
 Project: Cassandra
  Issue Type: Bug
  Components: Core, Tools
Reporter: Jeremy Hanna
Priority: Minor

 Whenever the cli outputs keyspace/column family data, if it's building an 
 index, it will show the status of that build at the bottom of the output.  It 
 looks like it's sometimes getting into a bad state.  One person reported 
 seeing:
 Currently building index index_name, completed d != java.lang.String

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-4571) Strange permament socket descriptors increasing leads to Too many open files

2012-08-23 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13440579#comment-13440579
 ] 

Jeremy Hanna commented on CASSANDRA-4571:
-

Tobias: is it possible to get the test case and the server setup to try to 
reproduce?  Heap dumps haven't proven very useful thus far.

 Strange permament socket descriptors increasing leads to Too many open files
 --

 Key: CASSANDRA-4571
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4571
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.1.2
 Environment: CentOS 5.8 Linux 2.6.18-308.13.1.el5 #1 SMP Tue Aug 21 
 17:10:18 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux. 
 java version 1.6.0_33
 Java(TM) SE Runtime Environment (build 1.6.0_33-b03)
 Java HotSpot(TM) 64-Bit Server VM (build 20.8-b03, mixed mode)
Reporter: Serg Shnerson
Priority: Critical

 On the two-node cluster there was found strange socket descriptors 
 increasing. lsof -n | grep java shows many rows like
 java   8380 cassandra  113r unix 0x8101a374a080
 938348482 socket
 java   8380 cassandra  114r unix 0x8101a374a080
 938348482 socket
 java   8380 cassandra  115r unix 0x8101a374a080
 938348482 socket
 java   8380 cassandra  116r unix 0x8101a374a080
 938348482 socket
 java   8380 cassandra  117r unix 0x8101a374a080
 938348482 socket
 java   8380 cassandra  118r unix 0x8101a374a080
 938348482 socket
 java   8380 cassandra  119r unix 0x8101a374a080
 938348482 socket
 java   8380 cassandra  120r unix 0x8101a374a080
 938348482 socket
  And number of this rows constantly increasing. After about 24 hours this 
 situation leads to error.
 We use PHPCassa client. Load is not so high (aroud ~50kb/s on write). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (CASSANDRA-4582) in cqlsh the output of describe columnfamily doesn't convert to hex

2012-08-28 Thread Jeremy Hanna (JIRA)
Jeremy Hanna created CASSANDRA-4582:
---

 Summary: in cqlsh the output of describe columnfamily doesn't 
convert to hex
 Key: CASSANDRA-4582
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4582
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremy Hanna
Priority: Minor


When the comparator=blob, cqlsh is outputting the metadata for column names in 
utf8.  Instead it should output them in hex because that's what will make it so 
its output can be used to re-create the column family.  Granted it's going to 
be pretty unreadable for the user, but it will work to re-create the CF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4593) Reading the ByteBuffer key from a map job causes an infinite fetch loop

2012-08-30 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13445627#comment-13445627
 ] 

Jeremy Hanna commented on CASSANDRA-4593:
-

It may be worth adding to the MapReduce or Troubleshooting section of 
http://wiki.apache.org/cassandra/HadoopSupport.  We were bitten by something 
like this at a previous job and it was hard to track down.

 Reading the ByteBuffer key from a map job causes an infinite fetch loop
 ---

 Key: CASSANDRA-4593
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4593
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.2
Reporter: Ben Frank
 Attachments: cassandra-1.1-4593.txt


 Reading the ByteBuffer key from a map job empties the buffer. One of these 
 key buffers is later used in ColumnFamilyRecordReader to figure out the last 
 token that was received, then using that as a start point to fetch more rows. 
 With a now empty buffer, the token defaults to the start of the range and 
 thus the end of the data is never reached.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (CASSANDRA-4582) in cqlsh the output of describe columnfamily doesn't convert to hex

2012-09-06 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450325#comment-13450325
 ] 

Jeremy Hanna edited comment on CASSANDRA-4582 at 9/7/12 3:05 PM:
-

Yeah. If it's that much work then don't worry about it. Sounds like it's 
already addressed in a newer version.

  was (Author: jeromatron):
Yeah. If its that much work then don't worry about it. Sounds like it's 
already addressed in a newer version.
  
 in cqlsh the output of describe columnfamily doesn't convert to hex
 ---

 Key: CASSANDRA-4582
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4582
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremy Hanna
Assignee: paul cannon
Priority: Minor
  Labels: cqlsh

 When the comparator=blob, cqlsh is outputting the metadata for column names 
 in utf8.  Instead it should output them in hex because that's what will make 
 it so its output can be used to re-create the column family.  Granted it's 
 going to be pretty unreadable for the user, but it will work to re-create the 
 CF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4582) in cqlsh the output of describe columnfamily doesn't convert to hex

2012-09-06 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450325#comment-13450325
 ] 

Jeremy Hanna commented on CASSANDRA-4582:
-

Yeah. If its that much work then don't worry about it. Sounds like it's already 
addressed in a newer version.

 in cqlsh the output of describe columnfamily doesn't convert to hex
 ---

 Key: CASSANDRA-4582
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4582
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremy Hanna
Assignee: paul cannon
Priority: Minor
  Labels: cqlsh

 When the comparator=blob, cqlsh is outputting the metadata for column names 
 in utf8.  Instead it should output them in hex because that's what will make 
 it so its output can be used to re-create the column family.  Granted it's 
 going to be pretty unreadable for the user, but it will work to re-create the 
 CF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (CASSANDRA-4350) cql cassandra version reporting is incorrect

2012-09-12 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna resolved CASSANDRA-4350.
-

Resolution: Cannot Reproduce

I tried to reproduce with cassandra 1.0.10 as the cluster version and tried 
cqlsh from 1.0.8, 1.0.9, and 1.0.11.  All three when starting reported the 
correct version (1.0.10) of the instance it connected to.  Resolving this as 
cannot reproduce.

 cql cassandra version reporting is incorrect
 

 Key: CASSANDRA-4350
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4350
 Project: Cassandra
  Issue Type: Bug
Reporter: Jeremy Hanna
Assignee: paul cannon
Priority: Minor
  Labels: cql, cqlsh

 It looks like either the docs are wrong or the functionality is wrong.  The 
 docs for show version say:
 {quote}
 Shows the version and build of the connected Cassandra instance, well as the 
 versions of the CQL spec and the Thrift protocol that the connected Cassandra 
 instance understands.
 {quote}
 On a cassandra node in the ring, I do nodetool -h localhost version and it 
 outputs the correct version (1.0.8).  From a remote node with 1.0.9 
 installed, I run nodetool -h same_node_in_ring version.  It outputs the 
 correct version.  However when I start cqlsh, it shows the remote node's 
 version (1.0.9).  Also when I use the 'show version;' command in cqlsh, it 
 also prints out 1.0.9.
 So either the docs are incorrect and it just outputs the version of the local 
 build or there's a bug in show version and the startup output and it should 
 really show the version of the connected node.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4749) Possible problem with widerow in Pig URI

2012-10-11 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474745#comment-13474745
 ] 

Jeremy Hanna commented on CASSANDRA-4749:
-

+1 to the changes, though don't we want to expose a way for them to set those 
variables with standard hadoop config, possibly namespaced with pig?  e.g. 
cassandra.pig.wide.row?

 Possible problem with widerow in Pig URI 
 -

 Key: CASSANDRA-4749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4749
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.5
 Environment: AWS running Centos 5.6 using Sun build 1.6.0_24-b07
Reporter: Will Oberman
Assignee: Brandon Williams
 Attachments: 4749.txt


 I don't have a good way to test this directly, but I'm concerned the Uri 
 parsing for widerows isn't going to work.  setLocation 
 1.) calls setLocationFromUri (which sets widerows to the Uri value)
 2.) sets widerows to a static value (which is defined as false)
 3.) sets widerows to the system setting if it exists.  
 That doesn't seem right...
 But setLocationFromUri also gets called from setStoreLocation, and I don't 
 really know the difference between setLocation and setStoreLocation in terms 
 of what is going on in terms of the integration between cassandra/pig/hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4749) Possible problem with widerow in Pig URI

2012-10-11 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474751#comment-13474751
 ] 

Jeremy Hanna commented on CASSANDRA-4749:
-

I was looking at 1.1.5 that didn't yet have the URL location for Cassandra 
accepting the widerows or use_secondary.  That's in there in 1.1-branch.

in other words +1 :)

 Possible problem with widerow in Pig URI 
 -

 Key: CASSANDRA-4749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4749
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.5
 Environment: AWS running Centos 5.6 using Sun build 1.6.0_24-b07
Reporter: Will Oberman
Assignee: Brandon Williams
 Attachments: 4749.txt


 I don't have a good way to test this directly, but I'm concerned the Uri 
 parsing for widerows isn't going to work.  setLocation 
 1.) calls setLocationFromUri (which sets widerows to the Uri value)
 2.) sets widerows to a static value (which is defined as false)
 3.) sets widerows to the system setting if it exists.  
 That doesn't seem right...
 But setLocationFromUri also gets called from setStoreLocation, and I don't 
 really know the difference between setLocation and setStoreLocation in terms 
 of what is going on in terms of the integration between cassandra/pig/hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (CASSANDRA-4749) Possible problem with widerow in Pig URI

2012-10-11 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474751#comment-13474751
 ] 

Jeremy Hanna edited comment on CASSANDRA-4749 at 10/12/12 3:00 AM:
---

I was looking at 1.1.5 that didn't yet have the URL location for Cassandra 
accepting the widerows or use_secondary flags.  That's in there in the 1.1 
branch.

in other words +1 :)

  was (Author: jeromatron):
I was looking at 1.1.5 that didn't yet have the URL location for Cassandra 
accepting the widerows or use_secondary.  That's in there in 1.1-branch.

in other words +1 :)
  
 Possible problem with widerow in Pig URI 
 -

 Key: CASSANDRA-4749
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4749
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 1.1.5
 Environment: AWS running Centos 5.6 using Sun build 1.6.0_24-b07
Reporter: Will Oberman
Assignee: Brandon Williams
 Attachments: 4749.txt


 I don't have a good way to test this directly, but I'm concerned the Uri 
 parsing for widerows isn't going to work.  setLocation 
 1.) calls setLocationFromUri (which sets widerows to the Uri value)
 2.) sets widerows to a static value (which is defined as false)
 3.) sets widerows to the system setting if it exists.  
 That doesn't seem right...
 But setLocationFromUri also gets called from setStoreLocation, and I don't 
 really know the difference between setLocation and setStoreLocation in terms 
 of what is going on in terms of the integration between cassandra/pig/hadoop.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-4840) FD metadata in JMX retains nodes removed from the ring

2012-10-19 Thread Jeremy Hanna (JIRA)
Jeremy Hanna created CASSANDRA-4840:
---

 Summary: FD metadata in JMX retains nodes removed from the ring
 Key: CASSANDRA-4840
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4840
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.10
Reporter: Jeremy Hanna
Priority: Minor


After nodes are removed from the ring and no longer appear in any of the nodes' 
nodetool ring output, some of the dead nodes show up in the 
o.a.c.net.FailureDetector SimpleStates metadata.  Also, some of the JMX stats 
are updating for the removed nodes (ie RecentTimeoutsPerHost and 
ResponsePendingTasks).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

2012-10-20 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480886#comment-13480886
 ] 

Jeremy Hanna commented on CASSANDRA-4815:
-

I let Ed know that in 1.2 there was support for creating a table with only a 
primary key (thanks Patrick).  He did ask a good question - is CQL3 going to be 
relatively set in stone in 1.2?  If people implement to CQL3, that's not going 
to change is it?

 Make CQL work naturally with wide rows
 --

 Key: CASSANDRA-4815
 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
 Project: Cassandra
  Issue Type: Wish
Reporter: Edward Capriolo

 I find that CQL3 is quite obtuse and does not provide me a language useful 
 for accessing my data. First, lets point out how we should design Cassandra 
 data. 
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 4) optimize for blind writes
 So here is a schema that abides by these tried and tested rules large 
 production uses are employing today. 
 Say we have a table of movie objects:
 Movie
 Name
 Description
 - tags   (string)
 - credits composite(role string, name string )
 -1 likesToday
 -1 blacklisted
 The above structure is a movie notice it hold a mix of static and dynamic 
 columns, but the other all number of columns is not very large. (even if it 
 was larger this is OK as well) Notice this table is not just 
 a single one to many relationship, it has 1 to 1 data and it has two sets of 
 1 to many data.
 The schema today is declared something like this:
 create column family movies
 with default_comparator=UTF8Type and
   column_metadata =
   [
 {column_name: blacklisted, validation_class: int},
 {column_name: likestoday, validation_class: long},
 {column_name: description, validation_class: UTF8Type}
   ];
 We should be able to insert data like this:
 set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
 set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
 set ['Cassandra Database, not looking for a 
 seQL']['credits-dir']='director:asf';
 set ['Cassandra Database, not looking for a 
 seQL']['credits-jir]='jiraguy:bob';
 set ['Cassandra Database, not looking for a seQL']['tags-action']='';
 set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
 set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
 set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
 This is the correct way to do it. 1 seek to find all the information related 
 to a movie. As long as this row does
 not get large there is no reason to optimize by breaking data into other 
 column families. (Notice you can not transpose this
 because movies is two 1-to-many relationships of potentially different types)
 Lets look at the CQL3 way to do this design:
 First, contrary to the original design of cassandra CQL does not like wide 
 rows. It also does not have a good way to dealing with dynamic rows together 
 with static rows either.
 You have two options:
 Option 1: lose all schema
 create table movies ( name string, column blob, value blob, primary 
 key(name)) with compact storage.
 This method is not so hot we have not lost all our validators, and by the way 
 you have to physically shutdown everything and rename files and recreate your 
 schema if you want to inform cassandra that a current table should be 
 compact. This could at very least be just a metadata change. Also you can not 
 add column schema either.
 Option 2  Normalize (is even worse)
 create table movie (name String, description string, likestoday int, 
 blacklisted int);
 create table movecredits( name string, role string, personname string, 
 primary key(name,role) );
 create table movetags( name string, tag string, primary key (name,tag) );
 This is a terrible design, of the 4 key characteristics how cassandra data 
 should be designed it fails 3:
 It does not:
 1) Denormalize
 2) Eliminate seeks
 3) Design for read
 Why is Cassandra steering toward this course, by making a language that does 
 not understand wide rows?
 So what can be done? My suggestions: 
 Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
 virtual view that is compact storage with no work to migrate data and 
 recreate schemas. Every table should have a compact view for the schemaless, 
 or a simple query hint like /*transposed*/ should make this change.
 Metadata should be definable by regex. For example, all columnes named tag* 
 are of type string.
 CQL should have the column[slice_start] .. column[slice_end] operator from 
 cql2. 
 CQL should support current users, users should not have to 
 switch between CQL versions, and possibly thrift, to work with wide rows. The 
 language should work for them even if 
 it not expressly designed for 

[jira] Created: (CASSANDRA-2221) 'show create' commands on the CLI to export schema

2011-02-22 Thread Jeremy Hanna (JIRA)
'show create' commands on the CLI to export schema
--

 Key: CASSANDRA-2221
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2221
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
Reporter: Jeremy Hanna
Priority: Trivial


It would be nice to have 'show create' type of commands on the command-line so 
that it would generate the DDL for the schema.

A scenario that would make this useful is where a team works out a data model 
over time with a dev cluster.  They want to use parts of that schema for new 
clusters that they create, like a staging/prod cluster.  It would be very handy 
in this scenario to have some sort of export mechanism.

Another use case is for testing purposes - you want to replicate a problem.

We currently have schematool for import/export but that is deprecated and it 
exports into yaml.

This new feature would just be able to 'show' - or export if they want the 
entire keyspace - into a script or commands that could be used in a cli script. 
 It would need to be able to regenerate everything about the keyspace including 
indexes and metadata.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-2255) ColumnFamilyOutputFormat drops mutations when batches fill up.

2011-02-28 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2255:


Attachment: (was: 2255-patch-2.txt)

 ColumnFamilyOutputFormat drops mutations when batches fill up.
 --

 Key: CASSANDRA-2255
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2255
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.2, 0.8
Reporter: Eldon Stegall
 Attachments: 0001_Stop_dropping_mutations.txt


 queue.poll() takes a mutation,
 but then the batch is already full,
 so the while loop exits, ant the mutation we just got is dropped.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-2255) ColumnFamilyOutputFormat drops mutations when batches fill up.

2011-02-28 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-2255:


Attachment: 2255-patch-2.txt

 ColumnFamilyOutputFormat drops mutations when batches fill up.
 --

 Key: CASSANDRA-2255
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2255
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.2, 0.8
Reporter: Eldon Stegall
 Attachments: 0001_Stop_dropping_mutations.txt, 2255-patch-2.txt


 queue.poll() takes a mutation,
 but then the batch is already full,
 so the while loop exits, ant the mutation we just got is dropped.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Assigned: (CASSANDRA-2255) ColumnFamilyOutputFormat drops mutations when batches fill up.

2011-02-28 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-2255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna reassigned CASSANDRA-2255:
---

Assignee: Jeremy Hanna

 ColumnFamilyOutputFormat drops mutations when batches fill up.
 --

 Key: CASSANDRA-2255
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2255
 Project: Cassandra
  Issue Type: Bug
  Components: Hadoop
Affects Versions: 0.7.2, 0.8
Reporter: Eldon Stegall
Assignee: Jeremy Hanna
 Attachments: 0001_Stop_dropping_mutations.txt, 2255-patch-2.txt


 queue.poll() takes a mutation,
 but then the batch is already full,
 so the while loop exits, ant the mutation we just got is dropped.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Updated: (CASSANDRA-1828) Create a pig storefunc

2011-03-03 Thread Jeremy Hanna (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-1828:


Reviewer: jeromatron

 Create a pig storefunc
 --

 Key: CASSANDRA-1828
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1828
 Project: Cassandra
  Issue Type: New Feature
  Components: Contrib, Hadoop
Affects Versions: 0.7 beta 1
Reporter: Brandon Williams
Assignee: Brandon Williams
Priority: Minor
 Fix For: 0.7.4

 Attachments: 0001-add-storage-ability-to-pig-CassandraStorage.txt, 
 0002-Fix-build-bin-script.txt, 0003-StoreFunc_with_deletion.txt

   Original Estimate: 32h
  Remaining Estimate: 32h

 Now that we have a ColumnFamilyOutputFormat, we can write data back to 
 cassandra in mapreduce jobs, however we can only do this in java.  It would 
 be nice if pig could also output to cassandra.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   3   4   5   6   7   8   9   10   >