[jira] [Commented] (GIRAPH-1000) Multi Output support

2015-03-25 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379631#comment-14379631
 ] 

Lukas Nalezenec commented on GIRAPH-1000:
-

FYI: 
Its already possible to write multiple outputs in Giraph using WorkerContext. 
Its not ideal - you have to care of failed tasks manually but it works.

See file SimpleVertexWithWorkerContext.java in project giraph-examples.

 Multi Output support
 

 Key: GIRAPH-1000
 URL: https://issues.apache.org/jira/browse/GIRAPH-1000
 Project: Giraph
  Issue Type: Improvement
  Components: bsp, conf and scripts, graph
Affects Versions: 1.0.0, 1.1.0, 1.2.0-SNAPSHOT
Reporter: Alessio Arleo
  Labels: features

 Hadoop natively supports multiple outputs. The objective is to extend Giraph 
 to support multiple output formats during a single giraph run.
 According to the official Hadoop apidocs*, to take advantage of multiple 
 outputs the  the pattern is the following:
 - Modify the job submission
 - Modify the reducer class to write on the declared different outputs
 Since Giraph jobs are executed as mappers, probably this approach (or at 
 least its second part) is not feasible, so further investigation is necessary.
 *https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-1000) Multi Output support

2015-03-25 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379892#comment-14379892
 ] 

Sergey Edunov commented on GIRAPH-1000:
---

That would be a great addition to Giraph! 
I was thinking about it a while ago. Seems like we can implement it in a 
similar to multiple input format way. See for example: 
org.apache.giraph.io.formats.multi.MultiVertexInputFormat and other classes in 
the same package. This is essentially a wrapper around a list on inputs 
providing the same API as single input format does. 
In a same way we can have a wrapper around VertexOutputFormat and 
EdgeOutputFormat providing same APIs, and then just plug them in.
We also need this feature, so I'll be happy to help

 Multi Output support
 

 Key: GIRAPH-1000
 URL: https://issues.apache.org/jira/browse/GIRAPH-1000
 Project: Giraph
  Issue Type: Improvement
  Components: bsp, conf and scripts, graph
Affects Versions: 1.0.0, 1.1.0, 1.2.0-SNAPSHOT
Reporter: Alessio Arleo
  Labels: features

 Hadoop natively supports multiple outputs. The objective is to extend Giraph 
 to support multiple output formats during a single giraph run.
 According to the official Hadoop apidocs*, to take advantage of multiple 
 outputs the  the pattern is the following:
 - Modify the job submission
 - Modify the reducer class to write on the declared different outputs
 Since Giraph jobs are executed as mappers, probably this approach (or at 
 least its second part) is not feasible, so further investigation is necessary.
 *https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-1000) Multi Output support

2015-03-25 Thread Lukas Nalezenec (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379707#comment-14379707
 ] 

Lukas Nalezenec commented on GIRAPH-1000:
-

I have never used Hadoop MultipleOutputs - I evaluated it when it was new but 
it was hard to unit test. We have decided to replace it in MapReduce by our own 
internal implementation. In my humble opinion MultipleOutputs are badly 
designed. Just my two cents.

I think there is not much documentation on Giraph internals. You have to read 
source code. The code is well written and you will learn a lot. I don know much 
about these parts of Giraph but if I will know i will help you.

 Multi Output support
 

 Key: GIRAPH-1000
 URL: https://issues.apache.org/jira/browse/GIRAPH-1000
 Project: Giraph
  Issue Type: Improvement
  Components: bsp, conf and scripts, graph
Affects Versions: 1.0.0, 1.1.0, 1.2.0-SNAPSHOT
Reporter: Alessio Arleo
  Labels: features

 Hadoop natively supports multiple outputs. The objective is to extend Giraph 
 to support multiple output formats during a single giraph run.
 According to the official Hadoop apidocs*, to take advantage of multiple 
 outputs the  the pattern is the following:
 - Modify the job submission
 - Modify the reducer class to write on the declared different outputs
 Since Giraph jobs are executed as mappers, probably this approach (or at 
 least its second part) is not feasible, so further investigation is necessary.
 *https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)