[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-27 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13158131#comment-13158131
 ] 

Hyunsik Choi commented on GIRAPH-45:


Claudio,

Thank for a nice suggestion. That seems a cool idea. 
However, I concern with the platform dependency of leveldb. 
The leveldb is built in C++ language. It may give us a burden of the 
distribution of Giraph.

What does anyone else think?

> Improve the way to keep outgoing messages
> -
>
> Key: GIRAPH-45
> URL: https://issues.apache.org/jira/browse/GIRAPH-45
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
> potential problem to cause out of memory when the rate of message generation 
> is higher than the rate of message flush (or network bandwidth).
> To overcome this problem, we need more eager strategy for message flushing or 
> some approach to spill messages into disk.
> The below link is Dmitriy's suggestion.
> https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2011-11-17 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152583#comment-13152583
 ] 

Hyunsik Choi commented on GIRAPH-77:


I also think that this feature is necessary because we would not depend on 
MapReduce anymore after we port Giraph to Yarn.

> Coordinator should expose a web interface with progress, vertex region 
> assignments, etc.
> 
>
> Key: GIRAPH-77
> URL: https://issues.apache.org/jira/browse/GIRAPH-77
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>
> It would be nice if the coordinator worker had a web interface that showed 
> progress, splits, etc. during job execution. Right now it would duplicate 
> information currently being exposed through task status, but with the move to 
> YARN, it will be a necessity.  It would be great if we could do this in a 
> modern way to avoid the screen-scraping, etc. currently used to get 
> information from most other Hadoop project's web interfaces.  The coordinator 
> could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-68) Implement a Graph Generator

2011-11-17 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-68?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13152185#comment-13152185
 ] 

Hyunsik Choi commented on GIRAPH-68:


I missed javadoc. I will reattach the patch including javadoc.

> Implement a Graph Generator
> ---
>
> Key: GIRAPH-68
> URL: https://issues.apache.org/jira/browse/GIRAPH-68
> Project: Giraph
>  Issue Type: New Feature
>  Components: benchmark
>Affects Versions: 0.70.0
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Attachments: GIRAPH-68_1.patch, GIRAPH-68_2.patch
>
>
> To provide users with benchmark environments and to deeply test the 
> input/output system of giraph, we need a graph generator. We will enable the 
> graph generator to generate various kinds of graph data sets by specifying a 
> VertexInputFormat and a VertexOutputFormat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-92) Need outputformat for just vertex ID and value

2011-11-16 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151733#comment-13151733
 ] 

Hyunsik Choi commented on GIRAPH-92:


+1

> Need outputformat for just vertex ID and value
> --
>
> Key: GIRAPH-92
> URL: https://issues.apache.org/jira/browse/GIRAPH-92
> Project: Giraph
>  Issue Type: New Feature
>  Components: lib
>Affects Versions: 0.70.0
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.70.0
>
> Attachments: GIRAPH-92.patch
>
>
> We should have an text outputformat that just spits out the vertex id and 
> value without its edges:
> {noformat}index.html 0.9423{noformat}
> This would be particularly helpful for further processing by, for instance, 
> Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-15 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13151038#comment-13151038
 ] 

Hyunsik Choi commented on GIRAPH-45:


I'm in another time zone. I'm sad to miss the hot party.

I consider this problem as Giraph becomes slow, but works well or Giraph cannot 
deal with some problems or data when the volume of generated messages exceeds 
the memory capacity. As you mentioned, apparently spilling data to disk is the 
simplest way to solve this problem. In addition, this way does not affect usual 
cases if spilling data is started only when the memory is getting tight.

Anyway, the discussion is concluded as follows?
- Each worker sends outgoing messages in an eager manner (immediately or 
periodically).
- The receiving side spills incoming messages into disk only when the memory is 
getting tight.


Avery,
I also agree that storing partitions to disk is a good way to mitigate the 
memory problem. Also, I think that both ways are compatible and have different 
effects. The storing partitioning is more efficient if the volume of graph data 
is very large. Later, if Giraph enables users to choose the options (i.e., 
spilling, storing to partitions, or both), users can choose some of them 
according to their programs.

> Improve the way to keep outgoing messages
> -
>
> Key: GIRAPH-45
> URL: https://issues.apache.org/jira/browse/GIRAPH-45
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
> potential problem to cause out of memory when the rate of message generation 
> is higher than the rate of message flush (or network bandwidth).
> To overcome this problem, we need more eager strategy for message flushing or 
> some approach to spill messages into disk.
> The below link is Dmitriy's suggestion.
> https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-11-14 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13150110#comment-13150110
 ] 

Hyunsik Choi commented on GIRAPH-11:


I also think that it's great at this moment.

+1

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: GIRAPH-11.2.diff, GIRAPH-11.3.diff, GIRAPH-11.4.diff, 
> GIRAPH-11.diff
>
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-11-14 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149485#comment-13149485
 ] 

Hyunsik Choi commented on GIRAPH-11:


You are welcome. But, the second patch still occurs the following error:

{code}
hyunsik@code:~/Code/giraph/giraph-trunk$ patch -p0 < 
~/Downloads/GIRAPH-11.2.diff patching file pom.xml
patching file 
src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java
patching file src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
patching file src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
patching file src/main/java/org/apache/giraph/comm/CommunicationsInterface.java
patching file src/main/java/org/apache/giraph/comm/RPCCommunications.java
patching file src/main/java/org/apache/giraph/comm/ServerInterface.java
patching file src/main/java/org/apache/giraph/comm/WorkerCommunications.java
patching file 
src/main/java/org/apache/giraph/examples/GeneratedVertexInputFormat.java
patching file 
src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java
patching file src/main/java/org/apache/giraph/examples/MaxAggregator.java
patching file src/main/java/org/apache/giraph/examples/MinAggregator.java
patching file 
src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
patching file 
src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
patching file src/main/java/org/apache/giraph/examples/SuperstepBalancer.java
patching file 
src/main/java/org/apache/giraph/examples/SuperstepHashPartitioner.java
patching file src/main/java/org/apache/giraph/examples/VerifyMessage.java
patching file src/main/java/org/apache/giraph/graph/AutoBalancer.java
patching file src/main/java/org/apache/giraph/graph/BasicVertex.java
patching file 
src/main/java/org/apache/giraph/graph/BasicVertexRangeBalancer.java
patching file src/main/java/org/apache/giraph/graph/BspService.java
patching file src/main/java/org/apache/giraph/graph/BspServiceMaster.java
patching file src/main/java/org/apache/giraph/graph/BspServiceWorker.java
patching file src/main/java/org/apache/giraph/graph/BspUtils.java
patching file src/main/java/org/apache/giraph/graph/GiraphJob.java
patching file src/main/java/org/apache/giraph/graph/GlobalStats.java
patching file src/main/java/org/apache/giraph/graph/GraphMapper.java
patching file src/main/java/org/apache/giraph/graph/GraphState.java
patching file 
src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java
patching file src/main/java/org/apache/giraph/graph/MutableVertex.java
patching file src/main/java/org/apache/giraph/graph/StaticBalancer.java
patching file src/main/java/org/apache/giraph/graph/Vertex.java
patching file src/main/java/org/apache/giraph/graph/VertexEdgeCount.java
patching file src/main/java/org/apache/giraph/graph/VertexRange.java
patching file src/main/java/org/apache/giraph/graph/VertexRangeBalancer.java
patching file src/main/java/org/apache/giraph/graph/WorkerInfo.java
patching file 
src/main/java/org/apache/giraph/graph/partition/BasicPartitionOwner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/GraphPartitioner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/HashPartitioner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/HashRangePartitioner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/HashRangeWorkerPartitioner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java
patching file src/main/java/org/apache/giraph/graph/partition/Partition.java
patching file 
src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java
patching file 
src/main/java/org/apache/giraph/graph/partition/PartitionExchange.java
patching file 
src/main/java/org/apache/giraph/graph/partition/PartitionOwner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/PartitionStats.java
patching file 
src/main/java/org/apache/giraph/graph/partition/PartitionUtils.java
patching file 
src/main/java/org/apache/giraph/graph/partition/RangeMasterPartitioner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/RangePartitionOwner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/RangePartitionStats.java
patching file 
src/main/java/org/apache/giraph/graph/partition/RangePartitioner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/RangeSplitHint.java
patching file 
src/main/java/org/apache/giraph/graph/partition/RangeWorkerPartitioner.java
patching file 
src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java
patching file src/main/java/org/apache/giraph/utils/WritableUtils.java
patching file src/main/java/or

[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-11-13 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149471#comment-13149471
 ] 

Hyunsik Choi commented on GIRAPH-11:


Thank you for rebase.

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: GIRAPH-11.diff
>
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-79) Change the menu layout of the site

2011-11-13 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149438#comment-13149438
 ] 

Hyunsik Choi commented on GIRAPH-79:


Thank you so much :)

> Change the menu layout of the site
> --
>
> Key: GIRAPH-79
> URL: https://issues.apache.org/jira/browse/GIRAPH-79
> Project: Giraph
>  Issue Type: Task
>  Components: site
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>  Labels: site
> Attachments: GIRAPH-79_1.patch, GIRAPH-79_2.patch
>
>
> The current site has the basic menu layout generated by maven site plugin.
> This layout is restricted to embrace new contents.
> I would like to suggest the following menu layout.
> http://people.apache.org/~hyunsik/giraph/site/index.html
> Although the layout includes most existing contents, it has two addition 
> categories, Giraph and Documentation. I think that this layout is simpler and 
> is good to add new contents.
> Anyone has any other suggestions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-11-13 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149412#comment-13149412
 ] 

Hyunsik Choi commented on GIRAPH-11:


Avery, 

I'm sorry for delaying the review. Now, I'm digging your patch. 
That looks great! Based on this work, we can consider some advanced graph 
partitioner based on the number of edge-cuts on graph partitions.

I need about one more day for more investigation because the patch is somewhat 
complicated for me :) 

Besides, for the deeper review, I would like to execute the some tests and 
trace them. Your patch needs the rebase. Could you rebase the patch?

Thank you :)

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: GIRAPH-11.diff
>
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-79) Change the menu layout of the site

2011-11-13 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149304#comment-13149304
 ] 

Hyunsik Choi commented on GIRAPH-79:


Gianmarco,

I misunderstood your mention. You did not the remove of the project reports.
I'm agree that the report is placed deeper in the site.

> Change the menu layout of the site
> --
>
> Key: GIRAPH-79
> URL: https://issues.apache.org/jira/browse/GIRAPH-79
> Project: Giraph
>  Issue Type: Task
>  Components: site
>Reporter: Hyunsik Choi
>  Labels: site
> Attachments: GIRAPH-79_1.patch
>
>
> The current site has the basic menu layout generated by maven site plugin.
> This layout is restricted to embrace new contents.
> I would like to suggest the following menu layout.
> http://people.apache.org/~hyunsik/giraph/site/index.html
> Although the layout includes most existing contents, it has two addition 
> categories, Giraph and Documentation. I think that this layout is simpler and 
> is good to add new contents.
> Anyone has any other suggestions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-79) Change the menu layout of the site

2011-11-13 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149303#comment-13149303
 ] 

Hyunsik Choi commented on GIRAPH-79:


Initially, the project reports were included because we had little contents.
However, I also think that the project reports can be removed if there are no 
objection.
Now, we are getting new contents. Most of them can be replaced with the reports 
generated by Jenkins.


> Change the menu layout of the site
> --
>
> Key: GIRAPH-79
> URL: https://issues.apache.org/jira/browse/GIRAPH-79
> Project: Giraph
>  Issue Type: Task
>  Components: site
>Reporter: Hyunsik Choi
>  Labels: site
> Attachments: GIRAPH-79_1.patch
>
>
> The current site has the basic menu layout generated by maven site plugin.
> This layout is restricted to embrace new contents.
> I would like to suggest the following menu layout.
> http://people.apache.org/~hyunsik/giraph/site/index.html
> Although the layout includes most existing contents, it has two addition 
> categories, Giraph and Documentation. I think that this layout is simpler and 
> is good to add new contents.
> Anyone has any other suggestions?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-75) Create sections on how to get involved and how to generate patches on website

2011-11-13 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13149289#comment-13149289
 ] 

Hyunsik Choi commented on GIRAPH-75:


+1

The patch contains very informative for newbies of Giraph.

I would like to give a suggestion.
The index page is likely to include a mix of various contents.
How about making a separate page for these sections?

> Create sections on how to get involved and how to generate patches on website
> -
>
> Key: GIRAPH-75
> URL: https://issues.apache.org/jira/browse/GIRAPH-75
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-75.patch
>
>
> We've had several questions lately on how to get started. It would be good to 
> document this on the site.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-11-10 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148187#comment-13148187
 ] 

Hyunsik Choi commented on GIRAPH-11:


That's a huge patch :)
I have just started to explore your patch.
I will leave some comments (maybe tomorrow).

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: GIRAPH-11.diff
>
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-64) Create VertexRunner to make it easier to run users' computations

2011-11-07 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13146023#comment-13146023
 ] 

Hyunsik Choi commented on GIRAPH-64:


In my case, 'mvn package' is ok, but 'mvn assembly:assembly' incurs the error 
as I mentioned above.

{code}
hyunsik@code:~$ mvn --version
Apache Maven 3.0.3 (r1075438; 2011-03-01 02:31:09+0900)
Maven home: /home/hyunsik/Local/maven-3
Java version: 1.6.0_26, vendor: Sun Microsystems Inc.
Java home: /usr/lib/jvm/java-6-sun-1.6.0.26/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.0.0-12-generic", arch: "amd64", family: "unix"
{code}

> Create VertexRunner to make it easier to run users' computations
> 
>
> Key: GIRAPH-64
> URL: https://issues.apache.org/jira/browse/GIRAPH-64
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-64.patch
>
>
> Currently, if a user wants to implement a Giraph algorithm by extending 
> {{Vertex}} they must also write all the boilerplate around the {{Tool}} 
> interface and bundle it with the Giraph jar (or get Giraph on the classpath 
> and playing nice with the implementation).  For example, what is included in 
> the PageRankBenchmark and what Kohei has done: 
> https://github.com/smly/java-Giraph-LabelPropagation  It would be better if 
> we had perhaps a Vertex implementation to be subclassed that already had all 
> the standard Tooling included such that all one had to run would be (assuming 
> the Giraph jar was already on the classpath):
> {noformat}hadoop jar my-awesome-vertex.jar my.awesome.vertex -i jazz_input -o 
> jazz_output -if org.apache.giraph.lib.in.text.adjacency-list.LongDoubleDouble 
> -of org.apache.giraph.lib.out.text.adjacency-list.LongDoubleDouble{noformat} 
> This wouldn't work with every algorithm, but would be useful in a large 
> number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-64) Create VertexRunner to make it easier to run users' computations

2011-11-03 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13143253#comment-13143253
 ] 

Hyunsik Choi commented on GIRAPH-64:


The patch looks nice!
I have really wanted this feature!

However, when I applied the patch to trunk and executed 'mvn 
assembly:assembly', it caused some error as follows:

{noformat}
[INFO] Compiling 14 source files to 
/home/hyunsik/Code/giraph/giraph-review/target/test-classes
[INFO] 
[INFO] --- maven-surefire-plugin:2.6:test (default-test) @ giraph ---
[INFO] Tests are skipped.
[INFO] 
[INFO] --- maven-jar-plugin:2.3.1:jar (default-jar) @ giraph ---
[INFO] Building jar: 
/home/hyunsik/Code/giraph/giraph-review/target/giraph-0.70.jar
[INFO] 
[INFO] --- maven-assembly-plugin:2.2:single (make-assembly) @ giraph ---
[INFO] Reading assembly descriptor: 
/home/hyunsik/Code/giraph/giraph-review/src/main/assembly/assembly.xml
[WARNING] NOTE: Currently, inclusion of module dependencies may produce 
unpredictable results if a version conflict occurs.
[INFO] Building tar : 
/home/hyunsik/Code/giraph/giraph-review/target/giraph-0.70-bin.tar.gz
[WARNING] Entry: 
giraph-0.70/src/test/java/org/apache/giraph/lib/TestLongDoubleDoubleAdjacencyListVertexInputFormat.java
 longer than 100 characters.
[WARNING] Resulting tar file can only be processed successfully by GNU 
compatible tar commands
[WARNING] Entry: 
giraph-0.70/src/test/java/org/apache/giraph/lib/TestTextDoubleDoubleAdjacencyListVertexInputFormat.java
 longer than 100 characters.
[INFO] 
[INFO] <<< maven-assembly-plugin:2.2:assembly (default-cli) @ giraph <<<
[INFO] 
[INFO] --- maven-assembly-plugin:2.2:assembly (default-cli) @ giraph ---
[INFO] 
[INFO] BUILD FAILURE
[INFO] 
[INFO] Total time: 11.818s
[INFO] Finished at: Fri Nov 04 00:36:35 KST 2011
[INFO] Final Memory: 21M/413M
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:2.2:assembly (default-cli) on 
project giraph: Error reading assemblies: No assembly descriptors found. -> 
[Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
{noformat}

I'm using mvn 3.0.3 and sun-jdk 1.6.0_26.
I have surveyed this problem. I still don't know whether this is some bug or my 
fault.

Any one knows what is problem?

> Create VertexRunner to make it easier to run users' computations
> 
>
> Key: GIRAPH-64
> URL: https://issues.apache.org/jira/browse/GIRAPH-64
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-64.patch
>
>
> Currently, if a user wants to implement a Giraph algorithm by extending 
> {{Vertex}} they must also write all the boilerplate around the {{Tool}} 
> interface and bundle it with the Giraph jar (or get Giraph on the classpath 
> and playing nice with the implementation).  For example, what is included in 
> the PageRankBenchmark and what Kohei has done: 
> https://github.com/smly/java-Giraph-LabelPropagation  It would be better if 
> we had perhaps a Vertex implementation to be subclassed that already had all 
> the standard Tooling included such that all one had to run would be (assuming 
> the Giraph jar was already on the classpath):
> {noformat}hadoop jar my-awesome-vertex.jar my.awesome.vertex -i jazz_input -o 
> jazz_output -if org.apache.giraph.lib.in.text.adjacency-list.LongDoubleDouble 
> -of org.apache.giraph.lib.out.text.adjacency-list.LongDoubleDouble{noformat} 
> This wouldn't work with every algorithm, but would be useful in a large 
> number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

2011-10-31 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13139976#comment-13139976
 ] 

Hyunsik Choi commented on GIRAPH-36:


Looks great!

Actually, I need more time to fully keep up with this patch.
First of all, I have executed unit tests on real hadoop cluster running on 
local host.
All tests are passed!

> Ensure that subclassing BasicVertex is possible by user apps
> 
>
> Key: GIRAPH-36
> URL: https://issues.apache.org/jira/browse/GIRAPH-36
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Blocker
> Fix For: 0.70.0
>
> Attachments: GIRAPH-36.diff
>
>
> Original assumptions in Giraph were that all users would subclass Vertex 
> (which extended MutableVertex extended BasicVertex).  Classes which wish to 
> have application specific data structures (ie. not a TreeMap>) 
> may need to extend either MutableVertex or BasicVertex.  Unfortunately 
> VertexRange extends ArrayList, and there are other places where the 
> assumption is that vertex classes are either Vertex, or at least 
> MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-67) Provide AdjacencyList InputFormat for Ids of Strings and double values

2011-10-27 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13138020#comment-13138020
 ] 

Hyunsik Choi commented on GIRAPH-67:


+1

:)



> Provide AdjacencyList InputFormat for Ids of Strings and double values
> --
>
> Key: GIRAPH-67
> URL: https://issues.apache.org/jira/browse/GIRAPH-67
> Project: Giraph
>  Issue Type: New Feature
>  Components: lib
>Affects Versions: 0.70.0
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.70.0
>
> Attachments: GIRAPH-67-2.patch, GIRAPH-67.patch
>
>
> Playing with some more graphs, it'd be useful to have an adj list format 
> where the ids are strings, such as names.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-67) Provide AdjacencyList InputFormat for Ids of Strings and double values

2011-10-27 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-67?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137927#comment-13137927
 ] 

Hyunsik Choi commented on GIRAPH-67:


The generic type parameter in the patch seems strange.
{code:java}
public class TextDoubleDoubleAdjacencyListVertexInputFormat
extends TextVertexInputFormat {
...
{code}

Should it be changed as follows?
{code:java}
public class TextDoubleDoubleAdjacencyListVertexInputFormat
extends TextVertexInputFormat {
...
{code}

> Provide AdjacencyList InputFormat for Ids of Strings and double values
> --
>
> Key: GIRAPH-67
> URL: https://issues.apache.org/jira/browse/GIRAPH-67
> Project: Giraph
>  Issue Type: New Feature
>  Components: lib
>Affects Versions: 0.70.0
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Fix For: 0.70.0
>
> Attachments: GIRAPH-67.patch
>
>
> Playing with some more graphs, it'd be useful to have an adj list format 
> where the ids are strings, such as names.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-56) Create a CSV TextOutputFormat

2011-10-27 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13137856#comment-13137856
 ] 

Hyunsik Choi commented on GIRAPH-56:


+1

It would be very useful for debugging and developing giraph applications.

In addition to unit tests, I tested it with a simple graph generator, 
and it works well.

> Create a CSV TextOutputFormat
> -
>
> Key: GIRAPH-56
> URL: https://issues.apache.org/jira/browse/GIRAPH-56
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>Assignee: Jakob Homan
>  Labels: newbie
> Attachments: GIRAPH-56.patch
>
>
> Right now we've got an outputformat that spits out Base64-encoded text.  It 
> would be nice to one that just did regular text, for testing or small graphs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

2011-10-27 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13136877#comment-13136877
 ] 

Hyunsik Choi commented on GIRAPH-13:


Probably, there are many prerequisite and difficult issues.
I'm willing to wait for your update :)

> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

2011-10-24 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134755#comment-13134755
 ] 

Hyunsik Choi commented on GIRAPH-13:


Jakob,

How about the progress of this issue? I have little experience about developing 
Yarn app.
If you share your progress or separate this issue into sub tasks, I can help 
you a bit.

Thank you

> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-54) CommunicationsInterface shouldn't implement VersionedProtocol

2011-10-24 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134753#comment-13134753
 ] 

Hyunsik Choi commented on GIRAPH-54:


+1

CommunicationInterface is only used for peer-peer to communicate to each other 
while one giraph job is running. So, I think that we don't need to consider the 
compatibility of different Hadoop versions.

> CommunicationsInterface shouldn't implement VersionedProtocol
> -
>
> Key: GIRAPH-54
> URL: https://issues.apache.org/jira/browse/GIRAPH-54
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>
> Currently, CommunicationsInterface, the base interface for peer-peer 
> communication implements as part of its definition Hadoop's 
> VersionedProtocol, which is part of Hadoop's RPC stack.  Other RPC 
> implementations need to implement CommunicationsInterface, but shouldn't need 
> to implement an HRPC specific interface.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-10-04 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120604#comment-13120604
 ] 

Hyunsik Choi commented on GIRAPH-12:


Thank you for review. I agree with your opinion. The virtual memory size seems 
very important in 32-bit JVMs. I only considered 64-bit JVMs. I overlooked that 
point.

Anyway, this patch allows users to control the number of threads. It is more 
helpful in restricted environment (e.g., 32-bit JVM).

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch, GIRAPH-12_3.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-28 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116961#comment-13116961
 ] 

Hyunsik Choi commented on GIRAPH-12:


Avery,

Thank you for your review. You are right. Runtime's totalMem() and freeMem() 
methods doesn't measure stack sizes. I'm sure of it after testing the below 
code.

https://gist.github.com/1249761

I have looked for how to measure the stack size of a java application. I could 
not find about that. Still, I'm not sure how to show that thread stack memory 
is reduced by the thread pool approach. Now, your way seems a only method to 
prove them.

However, I'm curious to know how much thread overhead is in terms of memory 
consumption. Before I try your approach. I conducted some simple experiments.

I used the above source code to investigate the memory usage of threads. This 
is executed on a machine with intel i3, ubuntu 11.10 (64bit), and 8G memory. I 
measure their memory by using 'top'. 'top' shows several columns including VIRT 
and RES, and SHR. We only need to focus RES, resident memory. RES includes all 
resident memory usages, such as heap and stack. I could know this from this 
page (http://goo.gl/JE7fD).

Firstly, I executed the above code with 1000 threads and without a jvm option 
'-Xss'. Accoring to this page (http://goo.gl/sz2qM), the default stack size 
'Xss' is 1024k on the jvm of 64bit linux. After all threads are created, I 
executed 'top' to print the memory usages as follows:

1k threads with default thread stack size.
{noformat}
  VIRT   RES SHR
9163 hyunsik   20   0 3366m  30m 8296 S   18  0.4   0:01.52 java
{noformat}

2k threads with default thread stack size.
{noformat}
   VIRT   RES SHR
11223 hyunsik   20   0 4434m  46m 8340 S   40  0.6   0:04.11 java
{noformat}

With 1k and 2k threads, that program consumes only 30 and 46 mega bytes 
respectively. The memory usage of threads are smaller than I expected. I wonder 
if thread stack size is the main cause of the memory problem that we have faced.

Besides, the default stack size is 1024k. The thread stack size seems to not 
affect RES. I had more tests with 'Xss' in order to investigate more the thread 
stack size.

1k threads with '-Xss4096k'.
{noformat}
28301 hyunsik   20   0 6380m  30m 8292 S   17  0.4   0:05.25 java
{noformat}

2k threads with '-Xss4096k'
{noformat}
29326 hyunsik   20   0 10.1g  46m 8300 S   38  0.6   0:03.42 java
{noformat}

VIRT surely is affected by '-Xss', but RES is not. 'Xss' seems the maximum 
stack size of each thread because it doesn't affect RES.

What do you think about that?

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-28 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116960#comment-13116960
 ] 

Hyunsik Choi commented on GIRAPH-12:


Dmitriy,

Thank you for your comments. Regardless of the problem caused by thread stack 
size, those approaches look promising. Especially, spilling messages to disk 
looks necessary so that Giraph deals with really large graph data. Otherwise, 
out of memory may occur when the message generating rate are higher than 
network bandwidth. I'll open a separate issue about this.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-42) The MapReduce counter 'Sent Messages' doesn't work.

2011-09-28 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-42?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116592#comment-13116592
 ] 

Hyunsik Choi commented on GIRAPH-42:


I also like the third approach. The current implementation seems to aim at the 
third approach.

> The MapReduce counter 'Sent Messages' doesn't work.
> ---
>
> Key: GIRAPH-42
> URL: https://issues.apache.org/jira/browse/GIRAPH-42
> Project: Giraph
>  Issue Type: Bug
>  Components: bsp
>Reporter: Hyunsik Choi
>Priority: Minor
>
> The MapReduce counter 'Sent Messages' doesn't work. It always shows 0.
> {noformat}
> .
> .
> 11/09/28 10:51:22 INFO mapred.JobClient: Current workers=20
> 11/09/28 10:51:22 INFO mapred.JobClient: Current master task partition=0
> 11/09/28 10:51:22 INFO mapred.JobClient: Sent messages=0
> 11/09/28 10:51:22 INFO mapred.JobClient: Aggregate finished 
> vertices=60
> 11/09/28 10:51:22 INFO mapred.JobClient: Aggregate vertices=60
> .
> .
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-27 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116177#comment-13116177
 ] 

Hyunsik Choi commented on GIRAPH-12:


FYI, I record how I collected the memory log messages of Runtime.

{noformat}
grep "totalMem" hadoop/logs/userlogs/job_201109281028_0007/ -r > orig_1.log

cat orig_1.log | awk '{print $6" "$7" "$8}' | sed 's/totalMem=//g' | sed 
's/maxMem=//' | 
sed 's/freeMem=//' | awk '{total=total+$1; max=max+$2; free=free+$3} END {print 
total/NR,max/NR,free/NR}'
{noformat}

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-27 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13116166#comment-13116166
 ] 

Hyunsik Choi commented on GIRAPH-12:


Avery,
Thank you for your comments. I decided to use Runtime. It seems to be enough to 
investigate this issue.

Again, I conducted a benchmark to measure memory consumption with 
RandomMessageBenchmark as follows:

{noformat}
hadoop jar giraph-0.70-jar-with-dependencies.jar 
org.apache.giraph.benchmark.RandomMessageBenchmark -e 2 -s 3 -w 20 -b 4 -n 150 
-V ${V} -v -f ${f}
{noformat}
, where 'f' option indicates the number of threads of thread pool. And, I 
changed the the thread executor as FixedThreadPool class.

I conducted two times for every experiment and I got the average of them. You 
can see the results from the below link:
http://goo.gl/arP62

This experiments was conducted in two cluaster nodes, each of which has 24 
cores and 64GB mem. They are connected each other over 1Gbps ethernet. I 
measured the memory footprints from Runtime in GraphMapper as Avery recommended.

In sum, the thread pool approach is better than original approach in terms of 
processing times. I guess that this is because the thread pool approach reduces 
the context switching cost and narrow the synchronization area.

Unfortunately, however, the thread pool approach doesn't reduce the memory 
consumption. This is the main focus of this issue. Rather, this approach needs 
slightly more memory as shown in Figure 3 and 4. However, we need to note the 
experiments with f = 5 and f = 20. In these experiments, the number of threads 
has small effect on the memory consumption.

We have faced the memory problem. We may need to approach this problem from 
another aspect.
I think that this problem may be mainly caused by the current message flushing 
strategy.

In current implementation, outgoing messages are transmitted to other peers by 
only two cases:
1) When the number of outgoing messages for a specific peer exceeds the a 
threshold (i.e., maxSize), the outgoing messages for the peer are transmitted 
to the peer.
2) When one super step is finished, the entire messages are flushed to other 
peers.

Flush (case 2) is only triggered at the end of superstep. During processing, 
the message flushing only depends on the case 1. This may be not effective 
because the case 1 only consider the the number of messages for each specific 
peer. It never take account of the real memory occupation. If destinations of 
outgoing messages are uniform, out of memory may occur before any 'case 1' is 
triggered.

To overcome this problem, we may need more eager message flushing strategy or 
some approach to store overflow messages into disk.

Let me know what you think.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira