[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-25 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262278#comment-13262278
 ] 

Hyunsik Choi commented on GIRAPH-185:
-

If there is a trade-off relationship between the performance and memory 
consumption, the memory consumption seems more important in the current giraph 
implementation. Also, I agree that some benchmarks are necessary.

> Improve concurrency of putMsg / putMsgList
> --
>
> Key: GIRAPH-185
> URL: https://issues.apache.org/jira/browse/GIRAPH-185
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.2.0
>Reporter: Bo Wang
>Assignee: Bo Wang
> Fix For: 0.2.0
>
> Attachments: GIRAPH-185.patch, GIRAPH-185.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Currently in putMsg / putMsgList, a synchronized closure is used to protect 
> the whole transientInMessages when adding the new message. This lock prevents 
> other concurrent calls to putMsg/putMsgList and increases the response time. 
> We should use fine-grain locks to allow high concurrency in message 
> communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-174) ConnectedComponentsVertex for loops can be replaced with for-each loops

2012-04-30 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi reassigned GIRAPH-174:
---

Assignee: Roman K

> ConnectedComponentsVertex for loops can be replaced with for-each loops
> ---
>
> Key: GIRAPH-174
> URL: https://issues.apache.org/jira/browse/GIRAPH-174
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Jakob Homan
>Assignee: Roman K
>Priority: Trivial
>  Labels: newbie
> Attachments: GIRAPH-174.patch
>
>
> {code}// First superstep is special, because we can simply look at the 
> neighbors
> if (getSuperstep() == 0) {
>   for (Iterator edges = iterator(); edges.hasNext();) {
> int neighbor = edges.next().get();
> if (neighbor < currentComponent) {
>   currentComponent = neighbor;
> }
>   }
>   // Only need to send value if it is not the own id
>   if (currentComponent != getVertexValue().get()) {
> setVertexValue(new IntWritable(currentComponent));
> for (Iterator edges = iterator();
> edges.hasNext();) {
>   int neighbor = edges.next().get();
>   if (neighbor > currentComponent) {
> sendMsg(new IntWritable(neighbor), getVertexValue());
>   }
> }
>   }{code}
> Both of the for loops in this chunk from ConnectedComponentsVertex can be 
> replaced with for(IntWritable i : iterator()) loops to be more idiomatic.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-11 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083871#comment-13083871
 ] 

Hyunsik Choi commented on GIRAPH-1:
---

Apache Infra team has admin privileges on apache svn server.

Before I comment on this issue, I tried to load the svn dump to my local svn 
repository. However, it is taking more two hours. The dump file seems contain 
tens of thousands revisions.

{code}

<<< Started new transaction, based on original revision 33939

--- Committed new rev 36176 (loaded from original rev 33939) >>>

<<< Started new transaction, based on original revision 33940

--- Committed new rev 36177 (loaded from original rev 33940) >>>

{code}

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-11 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083911#comment-13083911
 ] 

Hyunsik Choi commented on GIRAPH-1:
---

The dump file seems to be strange because it contains other projects instead of 
giraph in github.

@Avery can you check the dump file?



> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-11 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13083947#comment-13083947
 ] 

Hyunsik Choi commented on GIRAPH-1:
---

Avery,

Ok, however, I think we need more concise dump because numerous empty commit 
logs of the current dump file may be harmful for the apache incubator svn 
repository.

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-1) Initial code import

2011-08-16 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085634#comment-13085634
 ] 

Hyunsik Choi commented on GIRAPH-1:
---

I just submitted the issue to request the load of svndump to asf infra team.
See INFRA-3855.

> Initial code import
> ---
>
> Key: GIRAPH-1
> URL: https://issues.apache.org/jira/browse/GIRAPH-1
> Project: Giraph
>  Issue Type: Task
>Affects Versions: 0.1.0
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 0.1.0
>
>
> I did the initial code import from github.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-2) make the project homepage

2011-08-23 Thread Hyunsik Choi (JIRA)
make the project homepage
-

 Key: GIRAPH-2
 URL: https://issues.apache.org/jira/browse/GIRAPH-2
 Project: Giraph
  Issue Type: Task
Reporter: Hyunsik Choi


We need to make the project homepage at http://incubator.apache.org/giraph/.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-2) make the project homepage

2011-08-23 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089526#comment-13089526
 ] 

Hyunsik Choi commented on GIRAPH-2:
---

I'm sorry for replying to giraph-dev.

As you know, committers or contributors usually use confluence wiki in order to 
describe technical issues and documents about projects. The project homepage 
usually contains some static contents like an introduction to project, how to 
get source code, and so on.

We can follow this convention. One idea for the project homepage is that 
firstly let us make a simple project home by using maven-site-plugin. Then, it 
would be good to improve the project homepage as the project grows.

> make the project homepage
> -
>
> Key: GIRAPH-2
> URL: https://issues.apache.org/jira/browse/GIRAPH-2
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>
> We need to make the project homepage at http://incubator.apache.org/giraph/.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-2) make the project homepage

2011-08-25 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091467#comment-13091467
 ] 

Hyunsik Choi commented on GIRAPH-2:
---

Looks good!

http://incubator.apache.org/guides/branding.html#disclaimers

According to the above page, an incubator project site should contain 
*disclaimers* and *incubator logo*. We should consider that.

"continuous integration", "dependency"", and "plugin management" may be 
unnecessary because these pages are meaningless to most users and contributors.

How about you?


> make the project homepage
> -
>
> Key: GIRAPH-2
> URL: https://issues.apache.org/jira/browse/GIRAPH-2
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Jakob Homan
> Attachments: GIRAPH-2.patch
>
>
> We need to make the project homepage at http://incubator.apache.org/giraph/.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-3) Vertex:sentMsgToAllEdges should be sendMsg

2011-08-25 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091475#comment-13091475
 ] 

Hyunsik Choi commented on GIRAPH-3:
---

I agree to avoid abbreviations in method names. It would be good to address the 
naming convention in a separate issue.

And, I vote +1 to the patch.

> Vertex:sentMsgToAllEdges should be sendMsg
> --
>
> Key: GIRAPH-3
> URL: https://issues.apache.org/jira/browse/GIRAPH-3
> Project: Giraph
>  Issue Type: Bug
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-3.patch
>
>
> The method Vertex.java:sentMsgToAllEdges() should be sendMsgToAllEdges()

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-2) make the project homepage

2011-08-26 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092172#comment-13092172
 ] 

Hyunsik Choi commented on GIRAPH-2:
---

Jakob,

great! I agree with your thinking.
Also, I vote +1 to mvn3.


Avery,

It would be good to back to 0.1 if Giraph has not been released yet.

> make the project homepage
> -
>
> Key: GIRAPH-2
> URL: https://issues.apache.org/jira/browse/GIRAPH-2
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Jakob Homan
> Attachments: GIRAPH-2.patch, GIRAPH-2b.patch
>
>
> We need to make the project homepage at http://incubator.apache.org/giraph/.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-9) Change Yahoo License Header to Apache License Header

2011-08-28 Thread Hyunsik Choi (JIRA)
Change Yahoo License Header to Apache License Header


 Key: GIRAPH-9
 URL: https://issues.apache.org/jira/browse/GIRAPH-9
 Project: Giraph
  Issue Type: Task
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi
 Fix For: 0.1.0


All source codes contains Yahoo License Header as follows
{noformat}
Licensed to Yahoo! under one or more contributor license agreements. 
...
{noformat}

These license header should be as follows
{noformat}
Licensed to the Apache Software Foundation (ASF) under one 
or more contributor license agreements.
...
{noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-9) Change Yahoo License Header to Apache License Header

2011-08-28 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-9:
--

Attachment: GIRAPH-9.patch

I attach the patch. This patch changes Yahoo license header in all the source 
files  to Apache license.

> Change Yahoo License Header to Apache License Header
> 
>
> Key: GIRAPH-9
> URL: https://issues.apache.org/jira/browse/GIRAPH-9
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Fix For: 0.1.0
>
> Attachments: GIRAPH-9.patch
>
>
> All source codes contains Yahoo License Header as follows
> {noformat}
> Licensed to Yahoo! under one or more contributor license agreements. 
> ...
> {noformat}
> These license header should be as follows
> {noformat}
> Licensed to the Apache Software Foundation (ASF) under one 
> or more contributor license agreements.
> ...
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-9) Change Yahoo License Header to Apache License Header

2011-08-28 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092619#comment-13092619
 ] 

Hyunsik Choi commented on GIRAPH-9:
---

Unfortunately, I didn't know that there exists such a tool.
I changed them in hand :)

> Change Yahoo License Header to Apache License Header
> 
>
> Key: GIRAPH-9
> URL: https://issues.apache.org/jira/browse/GIRAPH-9
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Fix For: 0.1.0
>
> Attachments: GIRAPH-9.patch
>
>
> All source codes contains Yahoo License Header as follows
> {noformat}
> Licensed to Yahoo! under one or more contributor license agreements. 
> ...
> {noformat}
> These license header should be as follows
> {noformat}
> Licensed to the Apache Software Foundation (ASF) under one 
> or more contributor license agreements.
> ...
> {noformat}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-08-28 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092622#comment-13092622
 ] 

Hyunsik Choi commented on GIRAPH-12:


Netty seems to be good solution. Now, Apache Avro provides the netty-based 
server.
If we use Avro as a rpc mechanism among workers, we could solve this problem 
easily.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Priority: Minor
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-12) Investigate communication improvements

2011-08-29 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi reassigned GIRAPH-12:
--

Assignee: Hyunsik Choi

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

2011-08-29 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093321#comment-13093321
 ] 

Hyunsik Choi commented on GIRAPH-13:


I totally agree with you.

> Port Giraph to YARN
> ---
>
> Key: GIRAPH-13
> URL: https://issues.apache.org/jira/browse/GIRAPH-13
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>
> Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
> trunk, we should think about what it would take to separate out the graph 
> processing bits of Giraph from the MR1-specific code so as to take advantage 
> of the less-MR centric aspects of YARN, while still supporting both over the 
> medium term.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-14) Support for the Facebook Hadoop branch

2011-08-29 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093345#comment-13093345
 ] 

Hyunsik Choi commented on GIRAPH-14:


The below link is for hudson.
http://wiki.apache.org/general/Hudson

I'll create another issue about it.

> Support for the Facebook Hadoop branch
> --
>
> Key: GIRAPH-14
> URL: https://issues.apache.org/jira/browse/GIRAPH-14
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> I've been working with Joe Xie on support to get Giraph running on the 
> Facebook Hadoop branch.  He verified today that the examples worked on their 
> cluster.  I need to clean up my changes a little, but otherwise, will submit 
> a cleaned up diff.  As a side note, does anyone know how we can get Hudson 
> support for Giraph?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-15) Use of Jenkins for tests and builds

2011-08-29 Thread Hyunsik Choi (JIRA)
Use of Jenkins for tests and builds
---

 Key: GIRAPH-15
 URL: https://issues.apache.org/jira/browse/GIRAPH-15
 Project: Giraph
  Issue Type: Task
Reporter: Hyunsik Choi


We can use Jenkins server (https://builds.apache.org/) for regular builds and 
tests. To use jenkins, there are some processes.

Here is FAQ about use of Jenkins.
http://wiki.apache.org/general/Hudson




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-15) Use of Jenkins for tests and builds

2011-08-29 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi reassigned GIRAPH-15:
--

Assignee: Hyunsik Choi

> Use of Jenkins for tests and builds
> ---
>
> Key: GIRAPH-15
> URL: https://issues.apache.org/jira/browse/GIRAPH-15
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> We can use Jenkins server (https://builds.apache.org/) for regular builds and 
> tests. To use jenkins, there are some processes.
> Here is FAQ about use of Jenkins.
> http://wiki.apache.org/general/Hudson

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds

2011-08-29 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13093437#comment-13093437
 ] 

Hyunsik Choi commented on GIRAPH-15:


According to FAQ (http://wiki.apache.org/general/Hudson), PMC has to add 
committers to hudson-jobadmin group as follows:
{noformat}
modify_appgroups.pl hudson-jobadmin --add=
{noformat}

Should we ask mentors to execute the above command?
Anyone knows about it?

> Use of Jenkins for tests and builds
> ---
>
> Key: GIRAPH-15
> URL: https://issues.apache.org/jira/browse/GIRAPH-15
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> We can use Jenkins server (https://builds.apache.org/) for regular builds and 
> tests. To use jenkins, there are some processes.
> Here is FAQ about use of Jenkins.
> http://wiki.apache.org/general/Hudson

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds

2011-08-31 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094584#comment-13094584
 ] 

Hyunsik Choi commented on GIRAPH-15:


Thank you for informing :)
I'll ask Chris for that.

> Use of Jenkins for tests and builds
> ---
>
> Key: GIRAPH-15
> URL: https://issues.apache.org/jira/browse/GIRAPH-15
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> We can use Jenkins server (https://builds.apache.org/) for regular builds and 
> tests. To use jenkins, there are some processes.
> Here is FAQ about use of Jenkins.
> http://wiki.apache.org/general/Hudson

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-19) Create a CHANGES.txt file

2011-08-31 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-19?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094732#comment-13094732
 ] 

Hyunsik Choi commented on GIRAPH-19:


agreed +1

> Create a CHANGES.txt file
> -
>
> Key: GIRAPH-19
> URL: https://issues.apache.org/jira/browse/GIRAPH-19
> Project: Giraph
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
>
> It is helpful to have a file that is updated with each change along with who 
> contributed and committed the patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-21) Revise CODE_CONVENTIONS

2011-08-31 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-21?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094757#comment-13094757
 ] 

Hyunsik Choi commented on GIRAPH-21:


I prefer 80 chars per line and 2 space indent.

> Revise CODE_CONVENTIONS
> ---
>
> Key: GIRAPH-21
> URL: https://issues.apache.org/jira/browse/GIRAPH-21
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Priority: Minor
>
> Currently there is a CODE_CONVENTIONS file in the base path of Giraph.  It's 
> fairly sparse and we have been assuming an 80 char limit per line.  It's good 
> to have common conventions so that the code doesn't get too messy.  Does 
> anyone have any opinions on this now?  Probably best to tackle early and then 
> have something to follow.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-24) Job-level statistics reports one superstep greater than workers

2011-08-31 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-24?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095033#comment-13095033
 ] 

Hyunsik Choi commented on GIRAPH-24:


+1 nice work

On another side, we need to create jira components about each part of giraph.

> Job-level statistics reports one superstep greater than workers
> ---
>
> Key: GIRAPH-24
> URL: https://issues.apache.org/jira/browse/GIRAPH-24
> Project: Giraph
>  Issue Type: Bug
>Reporter: Jakob Homan
>Assignee: Jakob Homan
> Attachments: GIRAPH-24.patch
>
>
> In {{BspServiceMaster::coordinateSuperstep()}} the {{superStepCounter}} is 
> incremented when the coordination begins, but since the counter starts at 
> zero, this has the job level statistic being at superstep {{n+1}} when the 
> workers are reporting that they are working on {{n}}.  This discrepancy 
> persists throughout the job.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds

2011-08-31 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095069#comment-13095069
 ] 

Hyunsik Choi commented on GIRAPH-15:


FYI, I post the progress of this issue.

Some jira issues that disscussed hudson-jobadmin group say that 
bui...@apache.org maintains hudson-jobadmin group. 
I sent an email to bui...@apache.org. I'm waiting for response.

> Use of Jenkins for tests and builds
> ---
>
> Key: GIRAPH-15
> URL: https://issues.apache.org/jira/browse/GIRAPH-15
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> We can use Jenkins server (https://builds.apache.org/) for regular builds and 
> tests. To use jenkins, there are some processes.
> Here is FAQ about use of Jenkins.
> http://wiki.apache.org/general/Hudson

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds

2011-09-01 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095768#comment-13095768
 ] 

Hyunsik Choi commented on GIRAPH-15:


Chris added me to hudson-jobadmin group. 
I created INFRA-3900 for hudson account request.

> Use of Jenkins for tests and builds
> ---
>
> Key: GIRAPH-15
> URL: https://issues.apache.org/jira/browse/GIRAPH-15
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> We can use Jenkins server (https://builds.apache.org/) for regular builds and 
> tests. To use jenkins, there are some processes.
> Here is FAQ about use of Jenkins.
> http://wiki.apache.org/general/Hudson

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-15) Use of Jenkins for tests and builds

2011-09-02 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095890#comment-13095890
 ] 

Hyunsik Choi commented on GIRAPH-15:


I created a jenkins job to build and test for each commit.
https://builds.apache.org/job/Giraph-trunk-Commit/

This jenkins job notifies us by updating the relevant jira issues.

If we need another build job, tell me about that. For example, some projects 
(e.g., hadoop) use the pre-commit build 
(http://wiki.apache.org/general/PreCommitBuilds) that compiles and tests 
patches submitted by users. If we follow the review than commit process, this 
kind job would be useful.

> Use of Jenkins for tests and builds
> ---
>
> Key: GIRAPH-15
> URL: https://issues.apache.org/jira/browse/GIRAPH-15
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> We can use Jenkins server (https://builds.apache.org/) for regular builds and 
> tests. To use jenkins, there are some processes.
> Here is FAQ about use of Jenkins.
> http://wiki.apache.org/general/Hudson

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-15) Use of Jenkins for tests and builds

2011-09-06 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi resolved GIRAPH-15.


Resolution: Fixed

> Use of Jenkins for tests and builds
> ---
>
> Key: GIRAPH-15
> URL: https://issues.apache.org/jira/browse/GIRAPH-15
> Project: Giraph
>  Issue Type: Task
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>
> We can use Jenkins server (https://builds.apache.org/) for regular builds and 
> tests. To use jenkins, there are some processes.
> Here is FAQ about use of Jenkins.
> http://wiki.apache.org/general/Hudson

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-06 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098551#comment-13098551
 ] 

Hyunsik Choi commented on GIRAPH-12:


Jake,
Thank you for recommendation :)

Avery,
Thank you for informing me.


I post my progress of this issue.

Recently, I have implemented and tested a lightweight RPC implementation based 
on netty and protocol-buffer, which resembles to YarnRPC. Apparently, an 
alternative RPC can give a performance gain.

finagle is very mature in compared to my own. It would be better solution. I'll 
test my own and finagle together. As soon as completed tests, I'll post the 
results.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-06 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098587#comment-13098587
 ] 

Hyunsik Choi commented on GIRAPH-12:


Jake,

Thank you for your help :)
While I'm trying finagle, I will ask you if I have any questions.
Sooner I'll upload git branch with my test code :)


> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-08 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13100811#comment-13100811
 ] 

Hyunsik Choi commented on GIRAPH-27:


Looks great!

I agree on import reordering.
All unit tests are passed.

+1




> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-29) Implement TextVertexInputFormat for text-format graph data

2011-09-10 Thread Hyunsik Choi (JIRA)
Implement TextVertexInputFormat for text-format graph data
--

 Key: GIRAPH-29
 URL: https://issues.apache.org/jira/browse/GIRAPH-29
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi
Priority: Minor
 Fix For: 0.70.0


Supporting text-format graph data would be nice. It is helpful for developing 
graph algorithms and debugging because text-format graph data are 
human-readable and enable users to easily write sample data sets. Furthermore, 
text-format data are exchangeable regardless of operating systems or 
programming languages.

So, we need a basic InputFormat to help users develop user-defined InputFormat 
classes to deal text-represented graph data sets.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-29) Implement TextVertexInputFormat for text-format graph data

2011-09-10 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102111#comment-13102111
 ] 

Hyunsik Choi commented on GIRAPH-29:


I'm sorry for my big mistake. I overlookd org.apache.giraph.lib package.

I have a question. When a program use TextVertexInputFormat, the active workers 
are determined by the number of blocks? How does giraph work when the blocks 
are more than numWorkers? Should the numWorkers is set by user by considering 
both the length of input data and the number of numWorkers.



> Implement TextVertexInputFormat for text-format graph data
> --
>
> Key: GIRAPH-29
> URL: https://issues.apache.org/jira/browse/GIRAPH-29
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>Priority: Minor
> Fix For: 0.70.0
>
>
> Supporting text-format graph data would be nice. It is helpful for developing 
> graph algorithms and debugging because text-format graph data are 
> human-readable and enable users to easily write sample data sets. 
> Furthermore, text-format data are exchangeable regardless of operating 
> systems or programming languages.
> So, we need a basic InputFormat to help users develop user-defined 
> InputFormat classes to deal text-represented graph data sets.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-29) Implement TextVertexInputFormat for text-format graph data

2011-09-10 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi resolved GIRAPH-29.


Resolution: Won't Fix

> Implement TextVertexInputFormat for text-format graph data
> --
>
> Key: GIRAPH-29
> URL: https://issues.apache.org/jira/browse/GIRAPH-29
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>Priority: Minor
> Fix For: 0.70.0
>
>
> Supporting text-format graph data would be nice. It is helpful for developing 
> graph algorithms and debugging because text-format graph data are 
> human-readable and enable users to easily write sample data sets. 
> Furthermore, text-format data are exchangeable regardless of operating 
> systems or programming languages.
> So, we need a basic InputFormat to help users develop user-defined 
> InputFormat classes to deal text-represented graph data sets.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-29) Implement TextVertexInputFormat for text-format graph data

2011-09-10 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102177#comment-13102177
 ] 

Hyunsik Choi commented on GIRAPH-29:


Thank you for your kind reply.

> Implement TextVertexInputFormat for text-format graph data
> --
>
> Key: GIRAPH-29
> URL: https://issues.apache.org/jira/browse/GIRAPH-29
> Project: Giraph
>  Issue Type: New Feature
>  Components: bsp
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
>Priority: Minor
> Fix For: 0.70.0
>
>
> Supporting text-format graph data would be nice. It is helpful for developing 
> graph algorithms and debugging because text-format graph data are 
> human-readable and enable users to easily write sample data sets. 
> Furthermore, text-format data are exchangeable regardless of operating 
> systems or programming languages.
> So, we need a basic InputFormat to help users develop user-defined 
> InputFormat classes to deal text-represented graph data sets.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-12) Investigate communication improvements

2011-09-11 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-12:
---

Attachment: GIRAPH-12_1.patch

As Avery mentioned, in the current architecture, each worker requires N threads 
that communicate with N remote peers. This may incur severe context-switching 
overheads (especially when all messages are flushed) and more memory 
consumption. Firstly, I considered about replacing RPC system to another one. 
However, it is not simple work. I need more time.

Instead, I have considered an alternative way to employ ThreadPoolExecutor in 
order to adjust active threads. When Giraph deals with large graphs, the 
performance of Giraph is usually bounded on network bandwidth. I think that 
this approach would be effective. In addition, I tried to reduce the 
synchronization area, where BasicRPCCommunicator (374-394 lines) sends large 
buffered messages to specific peers.

I attached the patch in progress. Now, I cannot access to real hadoop cluster 
for one week. I didn't test this in real cluster. Besides, all unit test are 
passed.

How about this approach? Could you review this?

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-12) Investigate communication improvements

2011-09-11 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-12:
---

Component/s: bsp

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-12 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102682#comment-13102682
 ] 

Hyunsik Choi commented on GIRAPH-12:


Like the current PeerThread, initially each PeerConnection gets one established 
RPC proxy. These connections are kept during whole processing. So, there is no 
connection overhead. 

If you test this code on Yahoo!'s clusters, I'll appreciate your help. And, 
next week I can access to my lab's hadoop cluster. At that time, I'll also do 
some tests.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-13 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104230#comment-13104230
 ] 

Hyunsik Choi commented on GIRAPH-12:


Sorry for late response. Actually, I was on vacation between September 12-13.

Thank you for your testing. As you pointed out, the current patch incurs 
hotspots on the receiving side. I will add code lines to randomize flushes to 
mitigate skewness problem and some tweaks to improve the performance.



> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-14 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104340#comment-13104340
 ] 

Hyunsik Choi commented on GIRAPH-12:


You mean that we need some benchmark program to test the performance and 
scalability of message passing methods. If so, I'll add two benchmarking 
programs, which are sending messages to peers in random and skewed distribution 
respectively. For this, I'll create another issue.

Let me know what you think :)

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-32) Implement benchmarks to evaluate the performance of message passing

2011-09-14 Thread Hyunsik Choi (JIRA)
Implement benchmarks to evaluate the performance of message passing 


 Key: GIRAPH-32
 URL: https://issues.apache.org/jira/browse/GIRAPH-32
 Project: Giraph
  Issue Type: Task
  Components: benchmark
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi
 Fix For: 0.70.0


Message passing framework plays an important role in Giraph.
We need some benchmark programs to evaluate the improvement related to message 
passing method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-33) Missing license header of GraphState.java

2011-09-14 Thread Hyunsik Choi (JIRA)
Missing license header of GraphState.java
-

 Key: GIRAPH-33
 URL: https://issues.apache.org/jira/browse/GIRAPH-33
 Project: Giraph
  Issue Type: Task
  Components: graph
Reporter: Hyunsik Choi
Priority: Trivial
 Fix For: 0.70.0


GraphState.java doesn't contain apache license header.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-33) Missing license header of GraphState.java

2011-09-14 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-33:
---

Attachment: GIRAPH-33.patch

This patch adds apache license header.

> Missing license header of GraphState.java
> -
>
> Key: GIRAPH-33
> URL: https://issues.apache.org/jira/browse/GIRAPH-33
> Project: Giraph
>  Issue Type: Task
>  Components: graph
>Reporter: Hyunsik Choi
>Priority: Trivial
> Fix For: 0.70.0
>
> Attachments: GIRAPH-33.patch
>
>
> GraphState.java doesn't contain apache license header.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-33) Missing license header of GraphState.java

2011-09-14 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-33?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi resolved GIRAPH-33.


Resolution: Fixed

This is a trivial fix.
I just committed.

> Missing license header of GraphState.java
> -
>
> Key: GIRAPH-33
> URL: https://issues.apache.org/jira/browse/GIRAPH-33
> Project: Giraph
>  Issue Type: Task
>  Components: graph
>Reporter: Hyunsik Choi
>Priority: Trivial
> Fix For: 0.70.0
>
> Attachments: GIRAPH-33.patch
>
>
> GraphState.java doesn't contain apache license header.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-35) Modifying the site to indicated that Jake Mannix and Dmitriy Ryaboy are now Giraph committers

2011-09-15 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13105795#comment-13105795
 ] 

Hyunsik Choi commented on GIRAPH-35:


+1

Welcome new committers :)

> Modifying the site to indicated that Jake Mannix and Dmitriy Ryaboy are now 
> Giraph committers
> -
>
> Key: GIRAPH-35
> URL: https://issues.apache.org/jira/browse/GIRAPH-35
> Project: Giraph
>  Issue Type: Task
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: GIRAPH-35.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-16 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107014#comment-13107014
 ] 

Hyunsik Choi commented on GIRAPH-12:


Above all, I'm sorry for delaying this work. 

Jake,
I welcome your plan! We can compare both approaches and advice each other.
As you said, we can choose better one :)


> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-32) Implement benchmarks to evaluate the performance of message passing

2011-09-17 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-32:
---

Attachment: GIRAPH-32.patch

I attach the patch about this issue.

This patch includes a benchmark class. In this benchmark, for each vertex, the 
compute function sends a meaningless message into all edges of the vertex. 
Actually, I intend this benchmark to send messages into random workers. 
PseudoRandomVertexInputFormat already generates random edges. I employed it.

This benchmark allows users to set the size of message bytes and the number of 
sending messages per edge. This is because I think they are basic factors to 
evaluate the behavior and performance of some message delivery system. Besides, 
users can adjust the number of edges per vertex rather than adjusting the 
number of sending messages per. It allows users to make the sending pattern 
either more spread or more skewed.

Anyone can review this?

> Implement benchmarks to evaluate the performance of message passing 
> 
>
> Key: GIRAPH-32
> URL: https://issues.apache.org/jira/browse/GIRAPH-32
> Project: Giraph
>  Issue Type: Task
>  Components: benchmark
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Fix For: 0.70.0
>
> Attachments: GIRAPH-32.patch
>
>
> Message passing framework plays an important role in Giraph.
> We need some benchmark programs to evaluate the improvement related to 
> message passing method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107061#comment-13107061
 ] 

Hyunsik Choi commented on GIRAPH-12:


(a note for sharing)

Graph mutation functions (e.g., addVertexRequest, addEdgeRequest..) directly 
invoke RPC functions. 
This approach incurs RPC round-trip overheads during processing. Especially 
when many workers try to mutate vertices or edges, synchronization overheads 
may also occur in receiving sides. It may be severe as the size of cluster 
increases.

If we change graph mutation API to asynchronous messages, it would be more 
efficient. If possible, graph mutation messages and value messages (i.e., 
sendMsg) can be integrated into one message passing API.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107063#comment-13107063
 ] 

Hyunsik Choi commented on GIRAPH-12:


(a note for sharing)

In current implementation, outgoing messages are sent to other peers in only 
two triggers:
1) When the number of outgoing messages for a specific peer exceeds the a 
threshold (i.e., maxSize), the outgoing messages for the peer are transmitted 
to the peer.
2) When one super step is finished, the entire messages are flushed to other 
peers.

In the case 1, however, the current implementation only consider the number of 
messages instead of the size of messages. The outgoing messages reside in main 
memory until they are sent to other peers. It is another important factor to 
consume main memory. It would be good to consider not only the number of 
messages but also the size of messages.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107312#comment-13107312
 ] 

Hyunsik Choi commented on GIRAPH-12:


No problem :)

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-32) Implement benchmarks to evaluate the performance of message passing

2011-09-18 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-32:
---

Attachment: GIRAPH-32_2.patch

Good idea! According to which InputFormat we use, we could choose the 
distribution of destination vertices.

I attach the patch that corrected coding convention.

> Implement benchmarks to evaluate the performance of message passing 
> 
>
> Key: GIRAPH-32
> URL: https://issues.apache.org/jira/browse/GIRAPH-32
> Project: Giraph
>  Issue Type: Task
>  Components: benchmark
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Fix For: 0.70.0
>
> Attachments: GIRAPH-32.patch, GIRAPH-32_2.patch
>
>
> Message passing framework plays an important role in Giraph.
> We need some benchmark programs to evaluate the improvement related to 
> message passing method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-32) Implement benchmarks to evaluate the performance of message passing

2011-09-19 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi resolved GIRAPH-32.


Resolution: Fixed

Because this issue got +1, I just committed.

> Implement benchmarks to evaluate the performance of message passing 
> 
>
> Key: GIRAPH-32
> URL: https://issues.apache.org/jira/browse/GIRAPH-32
> Project: Giraph
>  Issue Type: Task
>  Components: benchmark
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Fix For: 0.70.0
>
> Attachments: GIRAPH-32.patch, GIRAPH-32_2.patch
>
>
> Message passing framework plays an important role in Giraph.
> We need some benchmark programs to evaluate the improvement related to 
> message passing method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-12) Investigate communication improvements

2011-09-21 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-12:
---

Attachment: GIRAPH-12_2.patch

I attach the second patch.

I have benchmarked this patch via GIRAPH-32. The results are shown as the 
below. In the results, the improved version is slightly better than current 
implementation. As Avery mentioned, the improved one makes threads 
controllable, so it is an improve. 

Users can adjust the number of core threads and max threads by using 
GiraphJob's constants, such as MSG_FLUSHER_CORE_SIZE and MSG_FLUSHER_MAX_SIZE. 
This setting can affect the performance. So, we may need to guide users to find 
the best parameters.

But, this experiment may be not enough to evaluate this approach because this 
experiment is conducted in small cluster.

*the result of original version*
{noformat}
org.apache.giraph.benchmark.RandomMessageBenchmark -e 2 -s 3 -w 6 -b 4 -n 150 
-V 30 -v

= 1st =
11/09/22 00:55:06 INFO mapred.JobClient: Total (milliseconds)=63096
11/09/22 00:55:06 INFO mapred.JobClient: Superstep 3 (milliseconds)=551
11/09/22 00:55:06 INFO mapred.JobClient: Setup (milliseconds)=1331
11/09/22 00:55:06 INFO mapred.JobClient: Shutdown (milliseconds)=1008
11/09/22 00:55:06 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=516
11/09/22 00:55:06 INFO mapred.JobClient: Superstep 0 (milliseconds)=16079
11/09/22 00:55:06 INFO mapred.JobClient: Superstep 2 (milliseconds)=25657
11/09/22 00:55:06 INFO mapred.JobClient: Superstep 1 (milliseconds)=17950

= 2rd =
11/09/22 00:58:13 INFO mapred.JobClient: Total (milliseconds)=62771
11/09/22 00:58:13 INFO mapred.JobClient: Superstep 3 (milliseconds)=600
11/09/22 00:58:13 INFO mapred.JobClient: Setup (milliseconds)=1290
11/09/22 00:58:13 INFO mapred.JobClient: Shutdown (milliseconds)=950
11/09/22 00:58:13 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=614
11/09/22 00:58:13 INFO mapred.JobClient: Superstep 0 (milliseconds)=15654
11/09/22 00:58:13 INFO mapred.JobClient: Superstep 2 (milliseconds)=25157
11/09/22 00:58:13 INFO mapred.JobClient: Superstep 1 (milliseconds)=18499
{noformat}

*the result of patched version*
{noformat}
= 1st =
11/09/22 00:59:41 INFO mapred.JobClient: Total (milliseconds)=60068
11/09/22 00:59:41 INFO mapred.JobClient: Superstep 3 (milliseconds)=542
11/09/22 00:59:41 INFO mapred.JobClient: Setup (milliseconds)=1219
11/09/22 00:59:41 INFO mapred.JobClient: Shutdown (milliseconds)=1025
11/09/22 00:59:41 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=616
11/09/22 00:59:41 INFO mapred.JobClient: Superstep 0 (milliseconds)=15887
11/09/22 00:59:41 INFO mapred.JobClient: Superstep 2 (milliseconds)=23149
11/09/22 00:59:41 INFO mapred.JobClient: Superstep 1 (milliseconds)=17626

= 2rd =
11/09/22 01:01:05 INFO mapred.JobClient: Total (milliseconds)=60359
11/09/22 01:01:05 INFO mapred.JobClient: Superstep 3 (milliseconds)=510
11/09/22 01:01:05 INFO mapred.JobClient: Setup (milliseconds)=1399
11/09/22 01:01:05 INFO mapred.JobClient: Shutdown (milliseconds)=956
11/09/22 01:01:05 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=550
11/09/22 01:01:05 INFO mapred.JobClient: Superstep 0 (milliseconds)=16054
11/09/22 01:01:05 INFO mapred.JobClient: Superstep 2 (milliseconds)=23049
11/09/22 01:01:05 INFO mapred.JobClient: Superstep 1 (milliseconds)=17835
{noformat}

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

2011-09-21 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109682#comment-13109682
 ] 

Hyunsik Choi commented on GIRAPH-37:


Two weeks ago, in GIRAPH-12 I said that I had tested rpc system based on 
protobuf and netty. I said that I need more time and I would upload the 
progress. The below link is my ongoing work.

https://github.com/hyunsik/giraph-rpc

This is not completed. It needs more tests and more features like hadoop 
security, and it needs to handle exceptions well. However, I think that it has 
the basic features. Since you seem to start this issue, I don't proceed this 
work. I just hope the implementation would be a bit of help to your work :)

> Implement Netty-backed rpc solution
> ---
>
> Key: GIRAPH-37
> URL: https://issues.apache.org/jira/browse/GIRAPH-37
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with 
> Netty, but didn't went in another direction. I think there is still value in 
> this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-41) Change graph mutation RPC API to a kind of message

2011-09-21 Thread Hyunsik Choi (JIRA)
Change graph mutation RPC API to a kind of message
--

 Key: GIRAPH-41
 URL: https://issues.apache.org/jira/browse/GIRAPH-41
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi


Graph mutation functions (e.g., addVertexRequest, addEdgeRequest..) directly 
invoke RPC functions. 
In processing, these RPC calls may incur delays caused by TCP round-trip time 
and communication overheads caused by frequent RPC call. Especially, when many 
workers try to mutate vertices and edges simultaneously, the synchronization 
overheads may also occur in receiving sides. It may become intensive as the 
size of cluster increases.

If we change graph mutation API to a kind of messages, it would be more 
efficient. If possible, graph mutation message API and data messages API (i.e., 
sendMsg)can be integrated into one message passing API.

What do you think about that?

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-26 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114646#comment-13114646
 ] 

Hyunsik Choi commented on GIRAPH-12:


I'm sorry too for late response. I was out of town due to my personal work. I 
just come to home. 
The previous experiments are too simple. Actually, that experiment cannot show 
any meaningful result. I sorry for that. As to the question 3, this issue was 
originated from the memory usage.  I should have measured the memory usage. 
Sooner, I'll answer your 3 questions :)

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-26 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114709#comment-13114709
 ] 

Hyunsik Choi commented on GIRAPH-12:


I have thought about question 3. That is, how we can measure the memory usage 
while Giraph is running.

Probably, the most basic way is to use the hadoop metrics 
(http://www.cloudera.com/blog/2009/03/hadoop-metrics/). However, this way needs 
to change _hadoop-metrics.properties_ file. So, it may be restricted for most 
large clusters; e.g., Yahoo! cluster that Avery can access. 

If the above way is impossible, we can implement a thread class mimic to hadoop 
metric in order to measure the memory usage on JVM periodically and sends that 
to a specific remote server.

What do you think about that?

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira