subject:"\[jira\] \[Commented\] $GIRAPH\-12$ Investigate communication improvements"

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-10-04 Thread Hyunsik Choi (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13120604#comment-13120604
]

Hyunsik Choi commented on GIRAPH-12:

Thank you for review. I agree with your opinion. The virtual memory size seems
very important in 32-bit JVMs. I only considered 64-bit JVMs. I overlooked that
point.

Anyway, this patch allows users to control the number of threads. It is more
helpful in restricted environment (e.g., 32-bit JVM).

Investigate communication improvements
--

Key: GIRAPH-12
URL: https://issues.apache.org/jira/browse/GIRAPH-12
Project: Giraph
Issue Type: Improvement
Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
Attachments: GIRAPH-12_1.patch, GIRAPH-12_2.patch, GIRAPH-12_3.patch

Currently every worker will start up a thread to communicate with every other
workers. Hadoop RPC is used for communication. For instance if there are
400 workers, each worker will create 400 threads. This ends up using a lot
of memory, even with the option
-Dmapred.child.java.opts=-Xss64k.
It would be good to investigate using frameworks like Netty or custom roll
our own to improve this situation. By moving away from Hadoop RPC, we would
also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-28 Thread Avery Ching (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116259#comment-13116259
]

Avery Ching commented on GIRAPH-12:
---

I wonder if Runtime methods measure the stack sizes of the threads consuming
all that memory? My guess is no. Since we're using less threads, we should
have less stacks consuming all the minimum memory. I agree that the heap
memory won't change much in going to your approach. I'm not sure how to prove
that thread stack memory is being reduced if Runtime fails to capture this
though.

One crude way would simply be to increase the heap space with your test until
you find a maximum heap size that can be used in your code and the original
code. If your code can reach a higher heap allocation, than that should prove
a memory win (more memory can be used for the heap). Here's some arguments to
try out that approach if you're interested:

-Dmapred.child.java.opts=-Xms2750m -Xmx2750m

As you bump up the -Xms and -Xmx values simultaneously, eventually your job
won't start and hopefully your changes enable a higher limit...

Investigate communication improvements
--

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-28 Thread Hyunsik Choi (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116960#comment-13116960
]

Hyunsik Choi commented on GIRAPH-12:

Dmitriy,

Thank you for your comments. Regardless of the problem caused by thread stack
size, those approaches look promising. Especially, spilling messages to disk
looks necessary so that Giraph deals with really large graph data. Otherwise,
out of memory may occur when the message generating rate are higher than
network bandwidth. I'll open a separate issue about this.

Investigate communication improvements
--

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-28 Thread Hyunsik Choi (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116961#comment-13116961
]

Hyunsik Choi commented on GIRAPH-12:

Avery,

Thank you for your review. You are right. Runtime's totalMem() and freeMem()
methods doesn't measure stack sizes. I'm sure of it after testing the below
code.

https://gist.github.com/1249761

I have looked for how to measure the stack size of a java application. I could
not find about that. Still, I'm not sure how to show that thread stack memory
is reduced by the thread pool approach. Now, your way seems a only method to
prove them.

However, I'm curious to know how much thread overhead is in terms of memory
consumption. Before I try your approach. I conducted some simple experiments.

I used the above source code to investigate the memory usage of threads. This
is executed on a machine with intel i3, ubuntu 11.10 (64bit), and 8G memory. I
measure their memory by using 'top'. 'top' shows several columns including VIRT
and RES, and SHR. We only need to focus RES, resident memory. RES includes all
resident memory usages, such as heap and stack. I could know this from this
page (http://goo.gl/JE7fD).

Firstly, I executed the above code with 1000 threads and without a jvm option
'-Xss'. Accoring to this page (http://goo.gl/sz2qM), the default stack size
'Xss' is 1024k on the jvm of 64bit linux. After all threads are created, I
executed 'top' to print the memory usages as follows:

1k threads with default thread stack size.
{noformat}
VIRT RES SHR
9163 hyunsik 20 0 3366m 30m 8296 S 18 0.4 0:01.52 java
{noformat}

2k threads with default thread stack size.
{noformat}
VIRT RES SHR
11223 hyunsik 20 0 4434m 46m 8340 S 40 0.6 0:04.11 java
{noformat}

With 1k and 2k threads, that program consumes only 30 and 46 mega bytes
respectively. The memory usage of threads are smaller than I expected. I wonder
if thread stack size is the main cause of the memory problem that we have faced.

Besides, the default stack size is 1024k. The thread stack size seems to not
affect RES. I had more tests with 'Xss' in order to investigate more the thread
stack size.

1k threads with '-Xss4096k'.
{noformat}
28301 hyunsik 20 0 6380m 30m 8292 S 17 0.4 0:05.25 java
{noformat}

2k threads with '-Xss4096k'
{noformat}
29326 hyunsik 20 0 10.1g 46m 8300 S 38 0.6 0:03.42 java
{noformat}

VIRT surely is affected by '-Xss', but RES is not. 'Xss' seems the maximum
stack size of each thread because it doesn't affect RES.

What do you think about that?

Investigate communication improvements
--

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-28 Thread Avery Ching (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13117011#comment-13117011
]

Avery Ching commented on GIRAPH-12:
---

If the default stack size is 1 MB, then for instance if you have 1024 workers,
you are talking about 1 GB just wasted for thread stack space per node. The
aggregate wasted memory would be 1 GB * 1024 = 1 TB, that's a lot of memory =).

The issue is that many clusters (including Yahoo!'s) have are running only
32-bit JVMs. So if you are using 1 GB just for stack space, you only get so
much left for heap (graph + messages). I think this should help quite a bit
until GIRAPH-37 is taken on.

Can you run the unittests against a real Hadoop instance as well? Then I'd say
+1, unless someone disagrees.

Investigate communication improvements
--

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-27 Thread Hyunsik Choi (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116166#comment-13116166
]

Hyunsik Choi commented on GIRAPH-12:

Avery,
Thank you for your comments. I decided to use Runtime. It seems to be enough to
investigate this issue.

Again, I conducted a benchmark to measure memory consumption with
RandomMessageBenchmark as follows:

{noformat}
hadoop jar giraph-0.70-jar-with-dependencies.jar
org.apache.giraph.benchmark.RandomMessageBenchmark -e 2 -s 3 -w 20 -b 4 -n 150
-V ${V} -v -f ${f}
{noformat}
, where 'f' option indicates the number of threads of thread pool. And, I
changed the the thread executor as FixedThreadPool class.

I conducted two times for every experiment and I got the average of them. You
can see the results from the below link:
http://goo.gl/arP62

This experiments was conducted in two cluaster nodes, each of which has 24
cores and 64GB mem. They are connected each other over 1Gbps ethernet. I
measured the memory footprints from Runtime in GraphMapper as Avery recommended.

In sum, the thread pool approach is better than original approach in terms of
processing times. I guess that this is because the thread pool approach reduces
the context switching cost and narrow the synchronization area.

Unfortunately, however, the thread pool approach doesn't reduce the memory
consumption. This is the main focus of this issue. Rather, this approach needs
slightly more memory as shown in Figure 3 and 4. However, we need to note the
experiments with f = 5 and f = 20. In these experiments, the number of threads
has small effect on the memory consumption.

We have faced the memory problem. We may need to approach this problem from
another aspect.
I think that this problem may be mainly caused by the current message flushing
strategy.

In current implementation, outgoing messages are transmitted to other peers by
only two cases:
1) When the number of outgoing messages for a specific peer exceeds the a
threshold (i.e., maxSize), the outgoing messages for the peer are transmitted
to the peer.
2) When one super step is finished, the entire messages are flushed to other
peers.

Flush (case 2) is only triggered at the end of superstep. During processing,
the message flushing only depends on the case 1. This may be not effective
because the case 1 only consider the the number of messages for each specific
peer. It never take account of the real memory occupation. If destinations of
outgoing messages are uniform, out of memory may occur before any 'case 1' is
triggered.

To overcome this problem, we may need more eager message flushing strategy or
some approach to store overflow messages into disk.

Let me know what you think.

Investigate communication improvements
--

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-26 Thread Hyunsik Choi (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114646#comment-13114646
]

Hyunsik Choi commented on GIRAPH-12:

I'm sorry too for late response. I was out of town due to my personal work. I
just come to home.
The previous experiments are too simple. Actually, that experiment cannot show
any meaningful result. I sorry for that. As to the question 3, this issue was
originated from the memory usage. I should have measured the memory usage.
Sooner, I'll answer your 3 questions :)

Investigate communication improvements
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-26 Thread Hyunsik Choi (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13114709#comment-13114709
]

Hyunsik Choi commented on GIRAPH-12:

I have thought about question 3. That is, how we can measure the memory usage
while Giraph is running.

Probably, the most basic way is to use the hadoop metrics
(http://www.cloudera.com/blog/2009/03/hadoop-metrics/). However, this way needs
to change _hadoop-metrics.properties_ file. So, it may be restricted for most
large clusters; e.g., Yahoo! cluster that Avery can access.

If the above way is impossible, we can implement a thread class mimic to hadoop
metric in order to measure the memory usage on JVM periodically and sends that
to a specific remote server.

What do you think about that?

Investigate communication improvements
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-26 Thread Avery Ching (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13115239#comment-13115239
]

Avery Ching commented on GIRAPH-12:
---

Are hadoop metrics better than simply using Runtime? We do this here:

https://github.com/apache/giraph/blob/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java#L559

Or perhaps
http://download.oracle.com/javase/6/docs/api/java/lang/management/package-summary.html?
I haven't used it, but it's been suggested on stack overflow.

http://download.oracle.com/javase/6/docs/api/java/lang/management/MemoryMXBean.html

Investigate communication improvements
--

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Hyunsik Choi (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107061#comment-13107061
]

Hyunsik Choi commented on GIRAPH-12:

(a note for sharing)

Graph mutation functions (e.g., addVertexRequest, addEdgeRequest..) directly
invoke RPC functions.
This approach incurs RPC round-trip overheads during processing. Especially
when many workers try to mutate vertices or edges, synchronization overheads
may also occur in receiving sides. It may be severe as the size of cluster
increases.

If we change graph mutation API to asynchronous messages, it would be more
efficient. If possible, graph mutation messages and value messages (i.e.,
sendMsg) can be integrated into one message passing API.

Investigate communication improvements
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Hyunsik Choi (JIRA)

[
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107063#comment-13107063
]

Hyunsik Choi commented on GIRAPH-12:

(a note for sharing)

In current implementation, outgoing messages are sent to other peers in only
two triggers:
1) When the number of outgoing messages for a specific peer exceeds the a
threshold (i.e., maxSize), the outgoing messages for the peer are transmitted
to the peer.
2) When one super step is finished, the entire messages are flushed to other
peers.

In the case 1, however, the current implementation only consider the number of
messages instead of the size of messages. The outgoing messages reside in main
memory until they are sent to other peers. It is another important factor to
consume main memory. It would be good to consider not only the number of
messages but also the size of messages.

Investigate communication improvements
--

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Avery Ching (JIRA)


[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13107189#comment-13107189
 ] 

Avery Ching commented on GIRAPH-12:
---

I am assigned?  Huh??

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-16 Thread Jake Mannix (JIRA)


[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13106842#comment-13106842
 ] 

Jake Mannix commented on GIRAPH-12:
---

Hey Hyunsik, if you're going to write a benchmark for the RPC stuff, that 
would be totally great.  I'd like to start playing around with trying Finagle 
in here, and we can compare notes on what kinds of techniques among both 
approaches work better, unless I'd be stepping on your toes by doing so...

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-14 Thread Hyunsik Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104340#comment-13104340
 ] 

Hyunsik Choi commented on GIRAPH-12:


You mean that we need some benchmark program to test the performance and 
scalability of message passing methods. If so, I'll add two benchmarking 
programs, which are sending messages to peers in random and skewed distribution 
respectively. For this, I'll create another issue.

Let me know what you think :)

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-13 Thread Avery Ching (JIRA)


[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103422#comment-13103422
 ] 

Avery Ching commented on GIRAPH-12:
---

Hyunsik, just to update, I grabbed your patch and it passed unittest on my 
machine.  Then I ran it on a cluster at Yahoo!.  

I didn't have time to make a messaging benchmark, so I ran PageRankBenchmark.  
I ran with 100 workers, 1 M vertices, 3 supersteps, and 10 edges per vertex.

Here are 2 runs with the original code:

11/09/13 07:02:08 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:02:08 INFO mapred.JobClient: Total (milliseconds)=46709
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 3 (milliseconds)=1682
11/09/13 07:02:08 INFO mapred.JobClient: Setup (milliseconds)=3228
11/09/13 07:02:08 INFO mapred.JobClient: Shutdown (milliseconds)=1223
11/09/13 07:02:08 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3578
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 0 (milliseconds)=16222
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 2 (milliseconds)=12302
11/09/13 07:02:08 INFO mapred.JobClient: Superstep 1 (milliseconds)=8467

13 07:14:51 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:14:51 INFO mapred.JobClient: Total (milliseconds)=51475
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 3 (milliseconds)=1348
11/09/13 07:14:51 INFO mapred.JobClient: Setup (milliseconds)=7233
11/09/13 07:14:51 INFO mapred.JobClient: Shutdown (milliseconds)=884
11/09/13 07:14:51 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3284
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 0 (milliseconds)=22213
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 2 (milliseconds)=8553
11/09/13 07:14:51 INFO mapred.JobClient: Superstep 1 (milliseconds)=7955


Here are 2 runs with your code:

11/09/13 07:06:56 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:06:56 INFO mapred.JobClient: Total (milliseconds)=51935
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 3 (milliseconds)=1150
11/09/13 07:06:56 INFO mapred.JobClient: Setup (milliseconds)=3338
11/09/13 07:06:56 INFO mapred.JobClient: Shutdown (milliseconds)=833
11/09/13 07:06:56 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3401
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 0 (milliseconds)=17297
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 2 (milliseconds)=14384
11/09/13 07:06:56 INFO mapred.JobClient: Superstep 1 (milliseconds)=11528

11/09/13 07:12:09 INFO mapred.JobClient:   Giraph Timers
11/09/13 07:12:09 INFO mapred.JobClient: Total (milliseconds)=51985
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 3 (milliseconds)=1362
11/09/13 07:12:09 INFO mapred.JobClient: Setup (milliseconds)=3776
11/09/13 07:12:09 INFO mapred.JobClient: Shutdown (milliseconds)=710
11/09/13 07:12:09 INFO mapred.JobClient: Vertex input superstep 
(milliseconds)=3771
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 0 (milliseconds)=17741
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 2 (milliseconds)=13068
11/09/13 07:12:09 INFO mapred.JobClient: Superstep 1 (milliseconds)=11551

In my limited testing, numbers aren't too different.  I also see that the 
connections are maintained throughout the application run as you mentioned.  So 
the only tradeoff is possibly the reduced parallelization of message sending 
(user chosen vs all threads).  I like the approach and think it's an 
improvement (controllable threads).  Perhaps the only comment is that regarding 
the following code block.

for(PeerConnection pc : peerConnections.values()) {
futures.add(executor.submit(new PeerFlushExecutor(pc)));
}

Probably would be good to randomize the PeerConnection objects to avoid 
hotspots on the receiving side?


 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see:

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-12 Thread Hyunsik Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102682#comment-13102682
 ] 

Hyunsik Choi commented on GIRAPH-12:


Like the current PeerThread, initially each PeerConnection gets one established 
RPC proxy. These connections are kept during whole processing. So, there is no 
connection overhead. 

If you test this code on Yahoo!'s clusters, I'll appreciate your help. And, 
next week I can access to my lab's hadoop cluster. At that time, I'll also do 
some tests.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-08-28 Thread Hyunsik Choi (JIRA)


[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13092622#comment-13092622
 ] 

Hyunsik Choi commented on GIRAPH-12:


Netty seems to be good solution. Now, Apache Avro provides the netty-based 
server.
If we use Avro as a rpc mechanism among workers, we could solve this problem 
easily.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Priority: Minor

 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

[jira] [Commented] (GIRAPH-12) Investigate communication improvements

17 matches

Site Navigation

Mail list logo

Footer information