[jira] [Commented] (GIRAPH-1000) Multi Output support
[ https://issues.apache.org/jira/browse/GIRAPH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379631#comment-14379631 ] Lukas Nalezenec commented on GIRAPH-1000: - FYI: Its already possible to write multiple outputs in Giraph using WorkerContext. Its not ideal - you have to care of failed tasks manually but it works. See file SimpleVertexWithWorkerContext.java in project giraph-examples. Multi Output support Key: GIRAPH-1000 URL: https://issues.apache.org/jira/browse/GIRAPH-1000 Project: Giraph Issue Type: Improvement Components: bsp, conf and scripts, graph Affects Versions: 1.0.0, 1.1.0, 1.2.0-SNAPSHOT Reporter: Alessio Arleo Labels: features Hadoop natively supports multiple outputs. The objective is to extend Giraph to support multiple output formats during a single giraph run. According to the official Hadoop apidocs*, to take advantage of multiple outputs the the pattern is the following: - Modify the job submission - Modify the reducer class to write on the declared different outputs Since Giraph jobs are executed as mappers, probably this approach (or at least its second part) is not feasible, so further investigation is necessary. *https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1000) Multi Output support
[ https://issues.apache.org/jira/browse/GIRAPH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379892#comment-14379892 ] Sergey Edunov commented on GIRAPH-1000: --- That would be a great addition to Giraph! I was thinking about it a while ago. Seems like we can implement it in a similar to multiple input format way. See for example: org.apache.giraph.io.formats.multi.MultiVertexInputFormat and other classes in the same package. This is essentially a wrapper around a list on inputs providing the same API as single input format does. In a same way we can have a wrapper around VertexOutputFormat and EdgeOutputFormat providing same APIs, and then just plug them in. We also need this feature, so I'll be happy to help Multi Output support Key: GIRAPH-1000 URL: https://issues.apache.org/jira/browse/GIRAPH-1000 Project: Giraph Issue Type: Improvement Components: bsp, conf and scripts, graph Affects Versions: 1.0.0, 1.1.0, 1.2.0-SNAPSHOT Reporter: Alessio Arleo Labels: features Hadoop natively supports multiple outputs. The objective is to extend Giraph to support multiple output formats during a single giraph run. According to the official Hadoop apidocs*, to take advantage of multiple outputs the the pattern is the following: - Modify the job submission - Modify the reducer class to write on the declared different outputs Since Giraph jobs are executed as mappers, probably this approach (or at least its second part) is not feasible, so further investigation is necessary. *https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1000) Multi Output support
[ https://issues.apache.org/jira/browse/GIRAPH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379707#comment-14379707 ] Lukas Nalezenec commented on GIRAPH-1000: - I have never used Hadoop MultipleOutputs - I evaluated it when it was new but it was hard to unit test. We have decided to replace it in MapReduce by our own internal implementation. In my humble opinion MultipleOutputs are badly designed. Just my two cents. I think there is not much documentation on Giraph internals. You have to read source code. The code is well written and you will learn a lot. I don know much about these parts of Giraph but if I will know i will help you. Multi Output support Key: GIRAPH-1000 URL: https://issues.apache.org/jira/browse/GIRAPH-1000 Project: Giraph Issue Type: Improvement Components: bsp, conf and scripts, graph Affects Versions: 1.0.0, 1.1.0, 1.2.0-SNAPSHOT Reporter: Alessio Arleo Labels: features Hadoop natively supports multiple outputs. The objective is to extend Giraph to support multiple output formats during a single giraph run. According to the official Hadoop apidocs*, to take advantage of multiple outputs the the pattern is the following: - Modify the job submission - Modify the reducer class to write on the declared different outputs Since Giraph jobs are executed as mappers, probably this approach (or at least its second part) is not feasible, so further investigation is necessary. *https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)