[jira] [Issue Comment Edited] (CASSANDRA-1497) Add input support for Hadoop Streaming

2011-09-03 Thread Brandyn White (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096766#comment-13096766
 ] 

Brandyn White edited comment on CASSANDRA-1497 at 9/3/11 10:02 PM:
---

Good point.  It'll be easier to update the Cassandra Hadoop API to support the 
old-style Hadoop interface. After that we can add in the Cassandra IO and 
command line switches with a small patch.

  was (Author: bwhite):
Good point, it'll be easier to update the Cassandra Hadoop API than Hadoop 
streaming.
  
 Add input support for Hadoop Streaming
 --

 Key: CASSANDRA-1497
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
 Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch


 related to CASSANDRA-1368 - create similar functionality for input streaming.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-1497) Add input support for Hadoop Streaming

2011-04-18 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021284#comment-13021284
 ] 

Jeremy Hanna edited comment on CASSANDRA-1497 at 4/18/11 9:45 PM:
--

The patch that I had submitted moves everything about the RecordReader into an 
abstract class except the actual marshalling/unmarshalling of the data.  So it 
could be used to build a typed json impl.  It would also make it so people 
could make their own serialization mechanisms - say for Dumbo, a python way to 
do hadoop MR.

  was (Author: jeromatron):
The patch that I had submitted moves everything about the RecordReader into 
an abstract class except the actual marshalling/unmarshalling of the data.  So 
it could be used to build a typed json impl.  It would also make it so people 
could make their own serialization mechanisms - say for Dumbo, the python based 
MR programs.
  
 Add input support for Hadoop Streaming
 --

 Key: CASSANDRA-1497
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
 Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch


 related to CASSANDRA-1368 - create similar functionality for input streaming.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Issue Comment Edited: (CASSANDRA-1497) Add input support for Hadoop Streaming

2010-10-20 Thread Stu Hood (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923292#action_12923292
 ] 

Stu Hood edited comment on CASSANDRA-1497 at 10/20/10 11:00 PM:


contrib/hadoop_streaming_input/bin/mapper.py
* Mentions the original source multiple times, and claims to be both a mapper 
and reducer
* I suspect that extract_text can be turned into a one-liner somehow

contrib/hadoop_streaming_input/bin/reducer.py
* Needs an Apache header

contrib/hadoop_streaming_input/[input/]README.txt
* Mentions -input: {{bin/streaming}} should fake the input, and explain why
* There is an extra copy of README.txt in an unused 'input' subdirectory

.../hadoop/ColumnFamilyRecordReader.java
* Indentation

.../hadoop/streaming/AvroResolver.java
* Updated javadoc

I looked a little bit into the immediate runtime failure, but didn't come to 
any conclusions. One suspicious aspect is that Streaming appears to use the 
result of Resolver.getInputWriterClass to write to both the mapper and reducer 
scripts: see 
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java?view=markup#l783

  was (Author: stuhood):
contrib/hadoop_streaming_input/bin/mapper.py
* Mentions the original source multiple times, and claims to be both a mapper 
and reducer
* I suspect that extract_text can be turned into a one-liner somehow
contrib/hadoop_streaming_input/bin/reducer.py
* Needs an Apache header
contrib/hadoop_streaming_input/[input/]README.txt
* Mentions -input: {{bin/streaming}} should fake the input, and explain why
* There is an extra copy of README.txt in an unused 'input' subdirectory
.../hadoop/ColumnFamilyRecordReader.java
* Indentation
.../hadoop/streaming/AvroResolver.java
* Updated javadoc

I looked a little bit into the immediate runtime failure, but didn't come to 
any conclusions. One suspicious aspect is that Streaming appears to use the 
result of Resolver.getInputWriterClass to write to both the mapper and reducer 
scripts: see 
http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java?view=markup#l783
  
 Add input support for Hadoop Streaming
 --

 Key: CASSANDRA-1497
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
 Fix For: 0.7.1

 Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch


 related to CASSANDRA-1368 - create similar functionality for input streaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (CASSANDRA-1497) Add input support for Hadoop Streaming

2010-10-08 Thread Jeremy Hanna (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919305#action_12919305
 ] 

Jeremy Hanna edited comment on CASSANDRA-1497 at 10/8/10 11:25 AM:
---

In addition to the previous comment, I've also left the original 
ColumnFamilyRecordReader and ColumnFamilyInputFormat undeprecated.  So there 
will be 3 extensions of the abstract - CFRR/CFIF, TCFRR/TCFIF, and ACFRR/ACFIF. 
 The only code that is in the extension classes has to do with data type 
marshalling, so it's not duplicated code.

  was (Author: jeromatron):
In addition to the previous comment, I've also left the original 
ColumnFamilyRecordReader and ColumnFamilyInputFormat undeprecated.  So there 
will be 3 extensions of the abstract - CFRR/CFIF, TCFRR/TCFIF, and ACFRR/ACFIF. 
 The only code that is in the extension is in the extension classes has to do 
with data type marshalling, so it's not duplicated code.
  
 Add input support for Hadoop Streaming
 --

 Key: CASSANDRA-1497
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497
 Project: Cassandra
  Issue Type: New Feature
  Components: Hadoop
Reporter: Jeremy Hanna
Assignee: Jeremy Hanna
 Fix For: 0.7.0

 Attachments: 0001-1497-foundation-changes.patch


 related to CASSANDRA-1368 - create similar functionality for input streaming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.