[jira] [Updated] (GIRAPH-32) Implement benchmarks to evaluate the performance of message passing

2011-09-17 Thread Hyunsik Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-32?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyunsik Choi updated GIRAPH-32:
---

Attachment: GIRAPH-32.patch

I attach the patch about this issue.

This patch includes a benchmark class. In this benchmark, for each vertex, the 
compute function sends a meaningless message into all edges of the vertex. 
Actually, I intend this benchmark to send messages into random workers. 
PseudoRandomVertexInputFormat already generates random edges. I employed it.

This benchmark allows users to set the size of message bytes and the number of 
sending messages per edge. This is because I think they are basic factors to 
evaluate the behavior and performance of some message delivery system. Besides, 
users can adjust the number of edges per vertex rather than adjusting the 
number of sending messages per. It allows users to make the sending pattern 
either more spread or more skewed.

Anyone can review this?

> Implement benchmarks to evaluate the performance of message passing 
> 
>
> Key: GIRAPH-32
> URL: https://issues.apache.org/jira/browse/GIRAPH-32
> Project: Giraph
>  Issue Type: Task
>  Components: benchmark
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Fix For: 0.70.0
>
> Attachments: GIRAPH-32.patch
>
>
> Message passing framework plays an important role in Giraph.
> We need some benchmark programs to evaluate the improvement related to 
> message passing method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107061#comment-13107061
 ] 

Hyunsik Choi commented on GIRAPH-12:


(a note for sharing)

Graph mutation functions (e.g., addVertexRequest, addEdgeRequest..) directly 
invoke RPC functions. 
This approach incurs RPC round-trip overheads during processing. Especially 
when many workers try to mutate vertices or edges, synchronization overheads 
may also occur in receiving sides. It may be severe as the size of cluster 
increases.

If we change graph mutation API to asynchronous messages, it would be more 
efficient. If possible, graph mutation messages and value messages (i.e., 
sendMsg) can be integrated into one message passing API.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107063#comment-13107063
 ] 

Hyunsik Choi commented on GIRAPH-12:


(a note for sharing)

In current implementation, outgoing messages are sent to other peers in only 
two triggers:
1) When the number of outgoing messages for a specific peer exceeds the a 
threshold (i.e., maxSize), the outgoing messages for the peer are transmitted 
to the peer.
2) When one super step is finished, the entire messages are flushed to other 
peers.

In the case 1, however, the current implementation only consider the number of 
messages instead of the size of messages. The outgoing messages reside in main 
memory until they are sent to other peers. It is another important factor to 
consume main memory. It would be good to consider not only the number of 
messages but also the size of messages.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-37) Implement Netty-backed rpc solution

2011-09-17 Thread Jakob Homan (JIRA)
Implement Netty-backed rpc solution
---

 Key: GIRAPH-37
 URL: https://issues.apache.org/jira/browse/GIRAPH-37
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan
Assignee: Jakob Homan


GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, 
but didn't went in another direction. I think there is still value in this 
approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

2011-09-17 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107165#comment-13107165
 ] 

Jake Mannix commented on GIRAPH-37:
---

We should make sure we don't all work on the same thing (note the discussion at 
the end of GIRAPH-12) - two at a time might be fine, but half of the developers 
all on RPC might be excessive.  Do you want to take this one?  I was going to 
go in and try and implement a Finagle-based solution, as it's already an async 
RPC-system on top of Netty, but if you're already going to look at this, I can 
drop what I was doing and work on something else.

> Implement Netty-backed rpc solution
> ---
>
> Key: GIRAPH-37
> URL: https://issues.apache.org/jira/browse/GIRAPH-37
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with 
> Netty, but didn't went in another direction. I think there is still value in 
> this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Jake Mannix (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Mannix reassigned GIRAPH-12:
-

Assignee: Avery Ching  (was: Hyunsik Choi)

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-32) Implement benchmarks to evaluate the performance of message passing

2011-09-17 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-32?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107188#comment-13107188
 ] 

Avery Ching commented on GIRAPH-32:
---

+1, only minor comment is the 'for(' -> 'for (' to fit other code conventions.  
I imagine that this benchmark will evolve over time (i.e. allows Jake's power 
law distributed input (GIRAPH-26) to be chosen as input instead of the random 
edges.  But looks good to me!  Hopefully it will help you guys in your 
communication testing.

> Implement benchmarks to evaluate the performance of message passing 
> 
>
> Key: GIRAPH-32
> URL: https://issues.apache.org/jira/browse/GIRAPH-32
> Project: Giraph
>  Issue Type: Task
>  Components: benchmark
>Reporter: Hyunsik Choi
>Assignee: Hyunsik Choi
> Fix For: 0.70.0
>
> Attachments: GIRAPH-32.patch
>
>
> Message passing framework plays an important role in Giraph.
> We need some benchmark programs to evaluate the improvement related to 
> message passing method.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107189#comment-13107189
 ] 

Avery Ching commented on GIRAPH-12:
---

I am assigned?  Huh??

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Avery Ching
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-35) Modifying the site to indicated that Jake Mannix and Dmitriy Ryaboy are now Giraph committers

2011-09-17 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107195#comment-13107195
 ] 

Avery Ching commented on GIRAPH-35:
---

Just to document what I did, after the pom.xml changes, I had to run

mvn site (generates the actual site documentation)

Do the svn commands to check in the new site documentation (as indicated 
above).  There must be a better way than what I didbut I didn't get any 
comments =).

Then I had to 

ssh people.apache.org
cd cd /www/incubator.apache.org/giraph
svn update

And the site is viewable now at http://incubator.apache.org/giraph/

> Modifying the site to indicated that Jake Mannix and Dmitriy Ryaboy are now 
> Giraph committers
> -
>
> Key: GIRAPH-35
> URL: https://issues.apache.org/jira/browse/GIRAPH-35
> Project: Giraph
>  Issue Type: Task
>Reporter: Avery Ching
>Assignee: Avery Ching
> Attachments: GIRAPH-35.patch
>
>


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Jake Mannix (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Mannix reassigned GIRAPH-12:
-

Assignee: Hyunsik Choi  (was: Avery Ching)

Sorry, my 4-year old clicked when I was looking at this ticket.  Didn't notice 
that it managed to make an actual assignment, reverting!

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

2011-09-17 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107281#comment-13107281
 ] 

Jake Mannix commented on GIRAPH-36:
---

Initial thoughts:

  VertexReader defines a "next(MutableVertex vertex)" method, which does the 
sensible thing of filling in the vertex from the HDFS block, and because it 
takes a vertex object and messes with it, it's natural that the vertex be 
"required" to be a MutableVertex.

  But of course this implies that *everything* be a MutableVertex, because if 
you can't be read in by a VertexReader, where do you get instantiated at all?  
If BasicVertex implements Writable, we could always readFields() data in, but 
not allow mutation, but this seems like it would interfere with the way 
VertexReader allows users to read straight from Text, etc.  This would allow 
VertexList to extend ArrayList instead of ArrayList, at 
the same time.

Anyone have any thoughts/ideas?  Are we wedded to making VertexReader 
implementations deal with MutableVertex, or can we swap them to handle Writable 
BasicVertex?

> Ensure that subclassing BasicVertex is possible by user apps
> 
>
> Key: GIRAPH-36
> URL: https://issues.apache.org/jira/browse/GIRAPH-36
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Blocker
> Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex 
> (which extended MutableVertex extended BasicVertex).  Classes which wish to 
> have application specific data structures (ie. not a TreeMap>) 
> may need to extend either MutableVertex or BasicVertex.  Unfortunately 
> VertexRange extends ArrayList, and there are other places where the 
> assumption is that vertex classes are either Vertex, or at least 
> MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

2011-09-17 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107282#comment-13107282
 ] 

Jake Mannix commented on GIRAPH-36:
---

In fact, thinking about VertexReader further, it seems its entire API is a 
little backwards.  Why are we *passing in* instantiated Vertices, and filling 
them in?  Shouldn't they effectively be "iterators" over the InputSplit?

> Ensure that subclassing BasicVertex is possible by user apps
> 
>
> Key: GIRAPH-36
> URL: https://issues.apache.org/jira/browse/GIRAPH-36
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Blocker
> Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex 
> (which extended MutableVertex extended BasicVertex).  Classes which wish to 
> have application specific data structures (ie. not a TreeMap>) 
> may need to extend either MutableVertex or BasicVertex.  Unfortunately 
> VertexRange extends ArrayList, and there are other places where the 
> assumption is that vertex classes are either Vertex, or at least 
> MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

2011-09-17 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107286#comment-13107286
 ] 

Avery Ching commented on GIRAPH-36:
---

The reason for the current VertexReader API was to match the old Hadoop 
RecordReader API and make it natural for folks to move to vertices instead of 
keys and values.  The old Hadoop RecordReader API 

org.apache.hadoop.mapred.RecordReader

boolean next(K key, V value) throws IOException;

and the current VertexReader API is 

boolean next(MutableVertex vertex)
throws IOException, InterruptedException;

That being said, the new Hadoop RecordReader API is different:

org.apache.hadoop.mapreduce.RecordReader

boolean nextKeyValue() throws IOException, InterruptedException;
KEYIN getCurrentKey() throws IOException, InterruptedException;
VALUEIN getCurrentValue() throws IOException, InterruptedException;

It's probably easier to follow that (especially regarding your points).  Given 
it's a user facing API we should get a few more opinions on it though.  I 
imagine the change would be something closer to:

boolean nextVertex() throws IOException, InterruptedException;
BasicVertex getCurrentVertex() throws IOException, 
InterruptedException;

As far as the questions about BasicVertex and MutableVertex, the general idea 
would be that BasicVertex would be a safer interface to use whenever possible.  
However, the Vertex class hierarchy has evolved and I wouldn't mind changing it 
since it's not really as useful as it should be.  In general, we should only 
provide the interfaces necessary for each method to ensure we (or the users) 
can't do something stupid.  So probably a(n) (nearly) immutable interface for 
storage, one for the user to access their methods, etc...



> Ensure that subclassing BasicVertex is possible by user apps
> 
>
> Key: GIRAPH-36
> URL: https://issues.apache.org/jira/browse/GIRAPH-36
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Blocker
> Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex 
> (which extended MutableVertex extended BasicVertex).  Classes which wish to 
> have application specific data structures (ie. not a TreeMap>) 
> may need to extend either MutableVertex or BasicVertex.  Unfortunately 
> VertexRange extends ArrayList, and there are other places where the 
> assumption is that vertex classes are either Vertex, or at least 
> MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

2011-09-17 Thread Jakob Homan (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107308#comment-13107308
 ] 

Jakob Homan commented on GIRAPH-37:
---

yeah, if no one else has started this, I'd like to begin.  Seeing as 12 didn't 
end with this solution, I started playing around on the flight back from London 
and plan on working on this this week, now that my vacation is over.  It's a 
blocker for some things we're trying to do with Giraph at the moment.

> Implement Netty-backed rpc solution
> ---
>
> Key: GIRAPH-37
> URL: https://issues.apache.org/jira/browse/GIRAPH-37
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with 
> Netty, but didn't went in another direction. I think there is still value in 
> this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-17 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107312#comment-13107312
 ] 

Hyunsik Choi commented on GIRAPH-12:


No problem :)

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-11) Improve the graph distribution of Giraph

2011-09-17 Thread Avery Ching (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Avery Ching updated GIRAPH-11:
--

Affects Version/s: 0.70.0

> Improve the graph distribution of Giraph
> 
>
> Key: GIRAPH-11
> URL: https://issues.apache.org/jira/browse/GIRAPH-11
> Project: Giraph
>  Issue Type: Improvement
>Affects Versions: 0.70.0
>Reporter: Avery Ching
>Assignee: Avery Ching
>
> Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
>  If the user data is not sorted by the vertex id, they must first run a 
> MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
> inconvenient.
> Giraph graph partitioning is currently range based and there are some 
> advantages and disadvantages of this approach.  The proposal of this JIRA 
> would be to allow for both range and hash based partitioning and provide more 
> flexibility to the user.
> Design goals for the graph distribution:
> * Allow vertices to be unordered or unordered
> * Ability to repartition
> * Select the partitioning scheme based on user needs (i.e. hash or range 
> based)
> * Ability to provide user-specific hints about partitions
> Hash-based partitioning
> * Good vertex balancing across ranges for random data
> * Bad at vertex id locality
> Range-based partitioning
> * Good at vertex id locality
> * Ability to split ranges easily
> * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

2011-09-17 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107334#comment-13107334
 ] 

Jake Mannix commented on GIRAPH-37:
---

Cool, you planning on trying Finagle?  It seems like it could save a lot of 
work in comparison to doing something totally custom on top of Netty (maven 
repo here: http://maven.twttr.com/com/twitter/finagle/1.9.0/ for the "whole 
thing", or smaller slices, like finagle-thrift, here: 
http://maven.twttr.com/com/twitter/finagle-thrift/1.9.0/ ).

> Implement Netty-backed rpc solution
> ---
>
> Key: GIRAPH-37
> URL: https://issues.apache.org/jira/browse/GIRAPH-37
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Jakob Homan
>Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with 
> Netty, but didn't went in another direction. I think there is still value in 
> this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

2011-09-17 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107339#comment-13107339
 ] 

Jake Mannix commented on GIRAPH-36:
---

Yeah, I having it return the current vertex sounds good, I guess.  There's 
still something nagging at me about the way Writables are being used: Giraph is 
*different* from Hadoop: there's a persistent, in-memory data structure being 
built here, where there *isn't* in Hadoop.  Regardless of how we read the data, 
or send the data over the wire, or write it to disk, we're also hanging onto 
it.  I wonder if we need to make the abstraction around that more clear?

Maybe simply solving the title of this JIRA ticket would do the trick, which 
would at a minimum require that BasicVertex implement Writable, and other than 
that, it could work with VertexReader API's of either flavor.

I think I can try working on this ticket without monkeying with the 
VertexReader API, but I won't know until I start unravelling this ball of 
string a bit.

> Ensure that subclassing BasicVertex is possible by user apps
> 
>
> Key: GIRAPH-36
> URL: https://issues.apache.org/jira/browse/GIRAPH-36
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Blocker
> Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex 
> (which extended MutableVertex extended BasicVertex).  Classes which wish to 
> have application specific data structures (ie. not a TreeMap>) 
> may need to extend either MutableVertex or BasicVertex.  Unfortunately 
> VertexRange extends ArrayList, and there are other places where the 
> assumption is that vertex classes are either Vertex, or at least 
> MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

2011-09-17 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107350#comment-13107350
 ] 

Avery Ching commented on GIRAPH-36:
---

There is an out-of-core part as well though (checkpointing).  Forcing the types 
(I, V, E, M) to implement Writable seemed like a nice easy way for Hadoop users 
to jump into Giraph.  Also, it provides us a lot of reusable objects 
(IntWritable, FloatWritable, DoubleWritable, etc.).  I am open to other ideas 
of course.  Let me know your thoughts.

> Ensure that subclassing BasicVertex is possible by user apps
> 
>
> Key: GIRAPH-36
> URL: https://issues.apache.org/jira/browse/GIRAPH-36
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
>Priority: Blocker
> Fix For: 0.70.0
>
>
> Original assumptions in Giraph were that all users would subclass Vertex 
> (which extended MutableVertex extended BasicVertex).  Classes which wish to 
> have application specific data structures (ie. not a TreeMap>) 
> may need to extend either MutableVertex or BasicVertex.  Unfortunately 
> VertexRange extends ArrayList, and there are other places where the 
> assumption is that vertex classes are either Vertex, or at least 
> MutableVertex.
> Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira