[jira] [Issue Comment Edited] (CASSANDRA-1497) Add input support for Hadoop Streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13096766#comment-13096766 ] Brandyn White edited comment on CASSANDRA-1497 at 9/3/11 10:02 PM: --- Good point. It'll be easier to update the Cassandra Hadoop API to support the old-style Hadoop interface. After that we can add in the Cassandra IO and command line switches with a small patch. was (Author: bwhite): Good point, it'll be easier to update the Cassandra Hadoop API than Hadoop streaming. Add input support for Hadoop Streaming -- Key: CASSANDRA-1497 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch related to CASSANDRA-1368 - create similar functionality for input streaming. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-1497) Add input support for Hadoop Streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13021284#comment-13021284 ] Jeremy Hanna edited comment on CASSANDRA-1497 at 4/18/11 9:45 PM: -- The patch that I had submitted moves everything about the RecordReader into an abstract class except the actual marshalling/unmarshalling of the data. So it could be used to build a typed json impl. It would also make it so people could make their own serialization mechanisms - say for Dumbo, a python way to do hadoop MR. was (Author: jeromatron): The patch that I had submitted moves everything about the RecordReader into an abstract class except the actual marshalling/unmarshalling of the data. So it could be used to build a typed json impl. It would also make it so people could make their own serialization mechanisms - say for Dumbo, the python based MR programs. Add input support for Hadoop Streaming -- Key: CASSANDRA-1497 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch related to CASSANDRA-1368 - create similar functionality for input streaming. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Issue Comment Edited: (CASSANDRA-1497) Add input support for Hadoop Streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923292#action_12923292 ] Stu Hood edited comment on CASSANDRA-1497 at 10/20/10 11:00 PM: contrib/hadoop_streaming_input/bin/mapper.py * Mentions the original source multiple times, and claims to be both a mapper and reducer * I suspect that extract_text can be turned into a one-liner somehow contrib/hadoop_streaming_input/bin/reducer.py * Needs an Apache header contrib/hadoop_streaming_input/[input/]README.txt * Mentions -input: {{bin/streaming}} should fake the input, and explain why * There is an extra copy of README.txt in an unused 'input' subdirectory .../hadoop/ColumnFamilyRecordReader.java * Indentation .../hadoop/streaming/AvroResolver.java * Updated javadoc I looked a little bit into the immediate runtime failure, but didn't come to any conclusions. One suspicious aspect is that Streaming appears to use the result of Resolver.getInputWriterClass to write to both the mapper and reducer scripts: see http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java?view=markup#l783 was (Author: stuhood): contrib/hadoop_streaming_input/bin/mapper.py * Mentions the original source multiple times, and claims to be both a mapper and reducer * I suspect that extract_text can be turned into a one-liner somehow contrib/hadoop_streaming_input/bin/reducer.py * Needs an Apache header contrib/hadoop_streaming_input/[input/]README.txt * Mentions -input: {{bin/streaming}} should fake the input, and explain why * There is an extra copy of README.txt in an unused 'input' subdirectory .../hadoop/ColumnFamilyRecordReader.java * Indentation .../hadoop/streaming/AvroResolver.java * Updated javadoc I looked a little bit into the immediate runtime failure, but didn't come to any conclusions. One suspicious aspect is that Streaming appears to use the result of Resolver.getInputWriterClass to write to both the mapper and reducer scripts: see http://svn.apache.org/viewvc/hadoop/mapreduce/trunk/src/contrib/streaming/src/java/org/apache/hadoop/streaming/StreamJob.java?view=markup#l783 Add input support for Hadoop Streaming -- Key: CASSANDRA-1497 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Jeremy Hanna Fix For: 0.7.1 Attachments: 0001-An-updated-avro-based-input-streaming-solution.patch related to CASSANDRA-1368 - create similar functionality for input streaming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (CASSANDRA-1497) Add input support for Hadoop Streaming
[ https://issues.apache.org/jira/browse/CASSANDRA-1497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12919305#action_12919305 ] Jeremy Hanna edited comment on CASSANDRA-1497 at 10/8/10 11:25 AM: --- In addition to the previous comment, I've also left the original ColumnFamilyRecordReader and ColumnFamilyInputFormat undeprecated. So there will be 3 extensions of the abstract - CFRR/CFIF, TCFRR/TCFIF, and ACFRR/ACFIF. The only code that is in the extension classes has to do with data type marshalling, so it's not duplicated code. was (Author: jeromatron): In addition to the previous comment, I've also left the original ColumnFamilyRecordReader and ColumnFamilyInputFormat undeprecated. So there will be 3 extensions of the abstract - CFRR/CFIF, TCFRR/TCFIF, and ACFRR/ACFIF. The only code that is in the extension is in the extension classes has to do with data type marshalling, so it's not duplicated code. Add input support for Hadoop Streaming -- Key: CASSANDRA-1497 URL: https://issues.apache.org/jira/browse/CASSANDRA-1497 Project: Cassandra Issue Type: New Feature Components: Hadoop Reporter: Jeremy Hanna Assignee: Jeremy Hanna Fix For: 0.7.0 Attachments: 0001-1497-foundation-changes.patch related to CASSANDRA-1368 - create similar functionality for input streaming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.