[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-10-04 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13120604#comment-13120604
 ] 

Hyunsik Choi commented on GIRAPH-12:


Thank you for review. I agree with your opinion. The virtual memory size seems 
very important in 32-bit JVMs. I only considered 64-bit JVMs. I overlooked that 
point.

Anyway, this patch allows users to control the number of threads. It is more 
helpful in restricted environment (e.g., 32-bit JVM).

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch, GIRAPH-12_3.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-10-04 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13120607#comment-13120607
 ] 

Avery Ching commented on GIRAPH-12:
---

You should commit =).  There hasn't been any objection and I +1'd it.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch, GIRAPH-12_3.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-10-04 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13120639#comment-13120639
 ] 

Hudson commented on GIRAPH-12:
--

Integrated in Giraph-trunk-Commit #13 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/13/])
GIRAPH-12: Investigate communication improvements (hyunsik)

hyunsik : 
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1179029
Files : 
* /incubator/giraph/trunk/CHANGELOG
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ArrayListWritable.java
* 
/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/MsgList.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java


 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Fix For: 0.70.0

 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch, GIRAPH-12_3.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-28 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116259#comment-13116259
 ] 

Avery Ching commented on GIRAPH-12:
---

I wonder if Runtime methods measure the stack sizes of the threads consuming 
all that memory?  My guess is no.  Since we're using less threads, we should 
have less stacks consuming all the minimum memory.  I agree that the heap 
memory won't change much in going to your approach.  I'm not sure how to prove 
that thread stack memory is being reduced if Runtime fails to capture this 
though.  

One crude way would simply be to increase the heap space with your test until 
you find a maximum heap size that can be used in your code and the original 
code.  If your code can reach a higher heap allocation, than that should prove 
a memory win (more memory can be used for the heap).  Here's some arguments to 
try out that approach if you're interested:

-Dmapred.child.java.opts=-Xms2750m -Xmx2750m

As you bump up the -Xms and -Xmx values simultaneously, eventually your job 
won't start and hopefully your changes enable a higher limit...

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-28 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116960#comment-13116960
 ] 

Hyunsik Choi commented on GIRAPH-12:


Dmitriy,

Thank you for your comments. Regardless of the problem caused by thread stack 
size, those approaches look promising. Especially, spilling messages to disk 
looks necessary so that Giraph deals with really large graph data. Otherwise, 
out of memory may occur when the message generating rate are higher than 
network bandwidth. I'll open a separate issue about this.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-28 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116961#comment-13116961
 ] 

Hyunsik Choi commented on GIRAPH-12:


Avery,

Thank you for your review. You are right. Runtime's totalMem() and freeMem() 
methods doesn't measure stack sizes. I'm sure of it after testing the below 
code.

https://gist.github.com/1249761

I have looked for how to measure the stack size of a java application. I could 
not find about that. Still, I'm not sure how to show that thread stack memory 
is reduced by the thread pool approach. Now, your way seems a only method to 
prove them.

However, I'm curious to know how much thread overhead is in terms of memory 
consumption. Before I try your approach. I conducted some simple experiments.

I used the above source code to investigate the memory usage of threads. This 
is executed on a machine with intel i3, ubuntu 11.10 (64bit), and 8G memory. I 
measure their memory by using 'top'. 'top' shows several columns including VIRT 
and RES, and SHR. We only need to focus RES, resident memory. RES includes all 
resident memory usages, such as heap and stack. I could know this from this 
page (http://goo.gl/JE7fD).

Firstly, I executed the above code with 1000 threads and without a jvm option 
'-Xss'. Accoring to this page (http://goo.gl/sz2qM), the default stack size 
'Xss' is 1024k on the jvm of 64bit linux. After all threads are created, I 
executed 'top' to print the memory usages as follows:

1k threads with default thread stack size.
{noformat}
  VIRT   RES SHR
9163 hyunsik   20   0 3366m  30m 8296 S   18  0.4   0:01.52 java
{noformat}

2k threads with default thread stack size.
{noformat}
   VIRT   RES SHR
11223 hyunsik   20   0 4434m  46m 8340 S   40  0.6   0:04.11 java
{noformat}

With 1k and 2k threads, that program consumes only 30 and 46 mega bytes 
respectively. The memory usage of threads are smaller than I expected. I wonder 
if thread stack size is the main cause of the memory problem that we have faced.

Besides, the default stack size is 1024k. The thread stack size seems to not 
affect RES. I had more tests with 'Xss' in order to investigate more the thread 
stack size.

1k threads with '-Xss4096k'.
{noformat}
28301 hyunsik   20   0 6380m  30m 8292 S   17  0.4   0:05.25 java
{noformat}

2k threads with '-Xss4096k'
{noformat}
29326 hyunsik   20   0 10.1g  46m 8300 S   38  0.6   0:03.42 java
{noformat}

VIRT surely is affected by '-Xss', but RES is not. 'Xss' seems the maximum 
stack size of each thread because it doesn't affect RES.

What do you think about that?

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-28 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117011#comment-13117011
 ] 

Avery Ching commented on GIRAPH-12:
---

If the default stack size is 1 MB, then for instance if you have 1024 workers, 
you are talking about 1 GB just wasted for thread stack space per node.  The 
aggregate wasted memory would be 1 GB * 1024 = 1 TB, that's a lot of memory =).

The issue is that many clusters (including Yahoo!'s) have are running only 
32-bit JVMs.  So if you are using 1 GB just for stack space, you only get so 
much left for heap (graph + messages).  I think this should help quite a bit 
until GIRAPH-37 is taken on. 

Can you run the unittests against a real Hadoop instance as well?  Then I'd say 
+1, unless someone disagrees.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-27 Thread Hyunsik Choi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116166#comment-13116166
 ] 

Hyunsik Choi commented on GIRAPH-12:


Avery,
Thank you for your comments. I decided to use Runtime. It seems to be enough to 
investigate this issue.

Again, I conducted a benchmark to measure memory consumption with 
RandomMessageBenchmark as follows:

{noformat}
hadoop jar giraph-0.70-jar-with-dependencies.jar 
org.apache.giraph.benchmark.RandomMessageBenchmark -e 2 -s 3 -w 20 -b 4 -n 150 
-V ${V} -v -f ${f}
{noformat}
, where 'f' option indicates the number of threads of thread pool. And, I 
changed the the thread executor as FixedThreadPool class.

I conducted two times for every experiment and I got the average of them. You 
can see the results from the below link:
http://goo.gl/arP62

This experiments was conducted in two cluaster nodes, each of which has 24 
cores and 64GB mem. They are connected each other over 1Gbps ethernet. I 
measured the memory footprints from Runtime in GraphMapper as Avery recommended.

In sum, the thread pool approach is better than original approach in terms of 
processing times. I guess that this is because the thread pool approach reduces 
the context switching cost and narrow the synchronization area.

Unfortunately, however, the thread pool approach doesn't reduce the memory 
consumption. This is the main focus of this issue. Rather, this approach needs 
slightly more memory as shown in Figure 3 and 4. However, we need to note the 
experiments with f = 5 and f = 20. In these experiments, the number of threads 
has small effect on the memory consumption.

We have faced the memory problem. We may need to approach this problem from 
another aspect.
I think that this problem may be mainly caused by the current message flushing 
strategy.

In current implementation, outgoing messages are transmitted to other peers by 
only two cases:
1) When the number of outgoing messages for a specific peer exceeds the a 
threshold (i.e., maxSize), the outgoing messages for the peer are transmitted 
to the peer.
2) When one super step is finished, the entire messages are flushed to other 
peers.

Flush (case 2) is only triggered at the end of superstep. During processing, 
the message flushing only depends on the case 1. This may be not effective 
because the case 1 only consider the the number of messages for each specific 
peer. It never take account of the real memory occupation. If destinations of 
outgoing messages are uniform, out of memory may occur before any 'case 1' is 
triggered.

To overcome this problem, we may need more eager message flushing strategy or 
some approach to store overflow messages into disk.

Let me know what you think.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-26 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114646#comment-13114646
 ] 

Hyunsik Choi commented on GIRAPH-12:


I'm sorry too for late response. I was out of town due to my personal work. I 
just come to home. 
The previous experiments are too simple. Actually, that experiment cannot show 
any meaningful result. I sorry for that. As to the question 3, this issue was 
originated from the memory usage.  I should have measured the memory usage. 
Sooner, I'll answer your 3 questions :)

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-26 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114709#comment-13114709
 ] 

Hyunsik Choi commented on GIRAPH-12:


I have thought about question 3. That is, how we can measure the memory usage 
while Giraph is running.

Probably, the most basic way is to use the hadoop metrics 
(http://www.cloudera.com/blog/2009/03/hadoop-metrics/). However, this way needs 
to change _hadoop-metrics.properties_ file. So, it may be restricted for most 
large clusters; e.g., Yahoo! cluster that Avery can access. 

If the above way is impossible, we can implement a thread class mimic to hadoop 
metric in order to measure the memory usage on JVM periodically and sends that 
to a specific remote server.

What do you think about that?

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-26 Thread Avery Ching (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115239#comment-13115239
 ] 

Avery Ching commented on GIRAPH-12:
---

Are hadoop metrics better than simply using Runtime?  We do this here:

https://github.com/apache/giraph/blob/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java#L559

Or perhaps 
http://download.oracle.com/javase/6/docs/api/java/lang/management/package-summary.html?
  I haven't used it, but it's been suggested on stack overflow.

http://download.oracle.com/javase/6/docs/api/java/lang/management/MemoryMXBean.html



 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-25 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114433#comment-13114433
 ] 

Avery Ching commented on GIRAPH-12:
---

Sorry Hyunsik, I am currently in between jobs and won't have access to a 
cluster for a little while.  My apologies for the testing delay.  I will 
updated this JIRA when I get access to a large cluster again.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107061#comment-13107061
 ] 

Hyunsik Choi commented on GIRAPH-12:


(a note for sharing)

Graph mutation functions (e.g., addVertexRequest, addEdgeRequest..) directly 
invoke RPC functions. 
This approach incurs RPC round-trip overheads during processing. Especially 
when many workers try to mutate vertices or edges, synchronization overheads 
may also occur in receiving sides. It may be severe as the size of cluster 
increases.

If we change graph mutation API to asynchronous messages, it would be more 
efficient. If possible, graph mutation messages and value messages (i.e., 
sendMsg) can be integrated into one message passing API.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107063#comment-13107063
 ] 

Hyunsik Choi commented on GIRAPH-12:


(a note for sharing)

In current implementation, outgoing messages are sent to other peers in only 
two triggers:
1) When the number of outgoing messages for a specific peer exceeds the a 
threshold (i.e., maxSize), the outgoing messages for the peer are transmitted 
to the peer.
2) When one super step is finished, the entire messages are flushed to other 
peers.

In the case 1, however, the current implementation only consider the number of 
messages instead of the size of messages. The outgoing messages reside in main 
memory until they are sent to other peers. It is another important factor to 
consume main memory. It would be good to consider not only the number of 
messages but also the size of messages.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107189#comment-13107189
 ] 

Avery Ching commented on GIRAPH-12:
---

I am assigned?  Huh??

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-16 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106842#comment-13106842
 ] 

Jake Mannix commented on GIRAPH-12:
---

Hey Hyunsik, if you're going to write a benchmark for the RPC stuff, that 
would be totally great.  I'd like to start playing around with trying Finagle 
in here, and we can compare notes on what kinds of techniques among both 
approaches work better, unless I'd be stepping on your toes by doing so...

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-14 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104340#comment-13104340
 ] 

Hyunsik Choi commented on GIRAPH-12:


You mean that we need some benchmark program to test the performance and 
scalability of message passing methods. If so, I'll add two benchmarking 
programs, which are sending messages to peers in random and skewed distribution 
respectively. For this, I'll create another issue.

Let me know what you think :)

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-13 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103422#comment-13103422
 ] 

Avery Ching commented on GIRAPH-12:
---

Hyunsik, just to update, I grabbed your patch and it passed unittest on my 
machine.  Then I ran it on a cluster at Yahoo!.  

I didn't have time to make a messaging benchmark, so I ran PageRankBenchmark.  
I ran with 100 workers, 1 M vertices, 3 supersteps, and 10 edges per vertex.

Here are 2 runs with the original code:

11/09/13 07:02:08 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:02:08 INFO mapred.JobClient: Total (milliseconds)=46709
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 3 (milliseconds)=1682
11/09/13 07:02:08 INFO mapred.JobClient: Setup (milliseconds)=3228
11/09/13 07:02:08 INFO mapred.JobClient: Shutdown (milliseconds)=1223
11/09/13 07:02:08 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3578
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 0 (milliseconds)=16222
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 2 (milliseconds)=12302
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 1 (milliseconds)=8467

13 07:14:51 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:14:51 INFO mapred.JobClient: Total (milliseconds)=51475
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 3 (milliseconds)=1348
11/09/13 07:14:51 INFO mapred.JobClient: Setup (milliseconds)=7233
11/09/13 07:14:51 INFO mapred.JobClient: Shutdown (milliseconds)=884
11/09/13 07:14:51 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3284
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 0 (milliseconds)=22213
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 2 (milliseconds)=8553
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 1 (milliseconds)=7955


Here are 2 runs with your code:

11/09/13 07:06:56 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:06:56 INFO mapred.JobClient: Total (milliseconds)=51935
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 3 (milliseconds)=1150
11/09/13 07:06:56 INFO mapred.JobClient: Setup (milliseconds)=3338
11/09/13 07:06:56 INFO mapred.JobClient: Shutdown (milliseconds)=833
11/09/13 07:06:56 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3401
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 0 (milliseconds)=17297
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 2 (milliseconds)=14384
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 1 (milliseconds)=11528

11/09/13 07:12:09 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:12:09 INFO mapred.JobClient: Total (milliseconds)=51985
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 3 (milliseconds)=1362
11/09/13 07:12:09 INFO mapred.JobClient: Setup (milliseconds)=3776
11/09/13 07:12:09 INFO mapred.JobClient: Shutdown (milliseconds)=710
11/09/13 07:12:09 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3771
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 0 (milliseconds)=17741
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 2 (milliseconds)=13068
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 1 (milliseconds)=11551

In my limited testing, numbers aren't too different.  I also see that the 
connections are maintained throughout the application run as you mentioned.  So 
the only tradeoff is possibly the reduced parallelization of message sending 
(user chosen vs all threads).  I like the approach and think it's an 
improvement (controllable threads).  Perhaps the only comment is that regarding 
the following code block.

for(PeerConnection pc : peerConnections.values()) {
futures.add(executor.submit(new PeerFlushExecutor(pc)));
}

Probably would be good to randomize the PeerConnection objects to avoid 
hotspots on the receiving side?


 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: 

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-13 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104230#comment-13104230
 ] 

Hyunsik Choi commented on GIRAPH-12:


Sorry for late response. Actually, I was on vacation between September 12-13.

Thank you for your testing. As you pointed out, the current patch incurs 
hotspots on the receiving side. I will add code lines to randomize flushes to 
mitigate skewness problem and some tweaks to improve the performance.



 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-13 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104270#comment-13104270
 ] 

Avery Ching commented on GIRAPH-12:
---

Sound great, hope you had a nice vacation. Perhaps if you have some extra time, 
could you draft up a message passing benchmark that could be useful to compare 
you final implementation against the original?

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-12 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102682#comment-13102682
 ] 

Hyunsik Choi commented on GIRAPH-12:


Like the current PeerThread, initially each PeerConnection gets one established 
RPC proxy. These connections are kept during whole processing. So, there is no 
connection overhead. 

If you test this code on Yahoo!'s clusters, I'll appreciate your help. And, 
next week I can access to my lab's hadoop cluster. At that time, I'll also do 
some tests.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-11 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102410#comment-13102410
 ] 

Avery Ching commented on GIRAPH-12:
---

Hyunsik, this sounds good to be able to adjust the number of threads.  The 
tradeoff would be that perhaps some overhead of starting up and shutting down 
connections to the RPC servers?  I'll see if I can test this code on Yahoo!'s 
clusters.  I'll add a messaging benchmark with a configurable number of 
messages to random vertices and then do some tests.  What do you think?

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-06 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13098512#comment-13098512
 ] 

Avery Ching commented on GIRAPH-12:
---

Jake from Twitter also recommended thinking about using Finagle.  His 
description:

A fault tolerant, protocol-agnostic RPC system based on Netty [which I see is 
already under consideration], written in scala, but with very mature java 
bindings too).  We use it internally at Twitter for clusters of mid-tier 
servers which have many dozens of machines talking to hundreds of other 
machines, without blowing up on thread-stack or using a gazillion threads.  
It's mavenized, so it's easy to try out.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor

 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-08-28 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092622#comment-13092622
 ] 

Hyunsik Choi commented on GIRAPH-12:


Netty seems to be good solution. Now, Apache Avro provides the netty-based 
server.
If we use Avro as a rpc mechanism among workers, we could solve this problem 
easily.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Priority: Minor

 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira