[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-12 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102682#comment-13102682
 ] 

Hyunsik Choi commented on GIRAPH-12:


Like the current PeerThread, initially each PeerConnection gets one established 
RPC proxy. These connections are kept during whole processing. So, there is no 
connection overhead. 

If you test this code on Yahoo!'s clusters, I'll appreciate your help. And, 
next week I can access to my lab's hadoop cluster. At that time, I'll also do 
some tests.

 Investigate communication improvements
 --

 Key: GIRAPH-12
 URL: https://issues.apache.org/jira/browse/GIRAPH-12
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Avery Ching
Assignee: Hyunsik Choi
Priority: Minor
 Attachments: GIRAPH-12_1.patch


 Currently every worker will start up a thread to communicate with every other 
 workers.  Hadoop RPC is used for communication.  For instance if there are 
 400 workers, each worker will create 400 threads.  This ends up using a lot 
 of memory, even with the option  
 -Dmapred.child.java.opts=-Xss64k.  
 It would be good to investigate using frameworks like Netty or custom roll 
 our own to improve this situation.  By moving away from Hadoop RPC, we would 
 also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Incubator report is due

2011-09-12 Thread Owen O'Malley
I'd propose:

Giraph

Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel
(BSP)-based graph
processing framework that runs on Hadoop. Giraph entered the incubator
in August 2011.

Project developments:

* Project website created.
* Confluence wiki created.
* Accounts were created for two of the committers.
* Project is entirely on Apache infrastructure.

Next steps:
* Adding new committers.
* Making a release.
* One of the initial committers still hasn't filed an ICLA. We either
need him to move forward or remove him.


[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102916#comment-13102916
 ] 

Hudson commented on GIRAPH-27:
--

Integrated in Giraph-trunk-Commit #4 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/4/])
GIRAPH-27: Fixed type parameter errors in Hudson (aching).

aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1169863
Files : 
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java


 Mutable static global state in Vertex.java should be refactored
 ---

 Key: GIRAPH-27
 URL: https://issues.apache.org/jira/browse/GIRAPH-27
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-27.patch, GIRAPH-27.patch


 Vertex.java has a bunch of static methods for getting/setting global graph 
 state (total number of vertices, edges, a reference to the GraphMapper, etc). 
  Refactoring this into a GraphState object, which every Vertex can hold onto 
 a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
 what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-12 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102986#comment-13102986
 ] 

Avery Ching commented on GIRAPH-27:
---

Oh, it's very annoying.  This problem has bitten me many times. =)

 Mutable static global state in Vertex.java should be refactored
 ---

 Key: GIRAPH-27
 URL: https://issues.apache.org/jira/browse/GIRAPH-27
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-27.patch, GIRAPH-27.patch


 Vertex.java has a bunch of static methods for getting/setting global graph 
 state (total number of vertices, edges, a reference to the GraphMapper, etc). 
  Refactoring this into a GraphState object, which every Vertex can hold onto 
 a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
 what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1310#comment-1310
 ] 

Jake Mannix commented on GIRAPH-28:
---

Ok, so I went ahead and made a 'straw-man' refactoring branch (on GitHub: 
https://github.com/jakemannix/giraph/tree/vertex_map_refactor ), removing the 
getDestEdgeMap() method, and having BasicVertex implement Iterable, as well as 
the random-access read method getEdgeValue(targetVertexId).

I got it passing tests, but ran into a few things we may want to consider:

testing for existence of a target vertex is no longer possible: 
getEdgeValue(targetVertexId) returns the *value* associated with the edge.  
Edges are allowed to have null values and still denote a connection between the 
source and target vertex, right?  Maybe we should just have an EdgeI, E 
getEdge(I targetVertexId) method instead?

Secondly, far less importantly, is we need to have getNumOutEdges(), because 
clients often want to know the out-degree of the vertex, and they used to call 
getDestEdgeMap().size().

Thirdly: for the same reason that getEdgeValue() can return superfluous nulls, 
removeEdge(), used as a boolean, can trick the caller into thinking there was 
no connection to the target (because removeEdge() returned null), but really 
it's because I was trying to be clever and return the *value* which could be 
null.  Having removeEdge() return the actual Edge fixes this.

I'll open another ticket for this stuff, as patching this patch seems a bit 
silly.

 Introduce new primitive-specific MutableVertex subclasses
 -

 Key: GIRAPH-28
 URL: https://issues.apache.org/jira/browse/GIRAPH-28
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-28.diff, GIRAPH-28.diff


 As discussed on the list, 
 MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for 
 example) could be highly optimized in its memory footprint if the vertex and 
 edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103338#comment-13103338
 ] 

Avery Ching commented on GIRAPH-28:
---

Jake, I agree with all 3 of your points.  The only question I have is regarding 
implementing Iterable.  Will the user expect to get back a sorted iterator or a 
non-sorted one?  I'm guessing we'll lose some functionality here?  I don't 
think we would want to add a switch of sorts to choose.  The other way we had 
discussed way the methods

IteratorEdgeI, E getOutEdgeIterator();
IteratorEdgeI, E getSortedOutEdgeIterator();

as opposed to Iterable.

 Introduce new primitive-specific MutableVertex subclasses
 -

 Key: GIRAPH-28
 URL: https://issues.apache.org/jira/browse/GIRAPH-28
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-28.diff, GIRAPH-28.diff


 As discussed on the list, 
 MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for 
 example) could be highly optimized in its memory footprint if the vertex and 
 edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods

2011-09-12 Thread Jake Mannix (JIRA)
Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. 
detail), replace with appropriate accessor methods
---

 Key: GIRAPH-31
 URL: https://issues.apache.org/jira/browse/GIRAPH-31
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix


As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an 
implementation detail which needs not be exposed to application developers - 
they need to iterate over the edges, and possibly access them one-by-one, and 
remove them (in the Mutable case), but they don't need the SortedMap, and 
creating primitive-optimized BasicVertex implementations is hampered by the 
fact that clients expect this Map to exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103339#comment-13103339
 ] 

Jake Mannix commented on GIRAPH-28:
---

I'm suggesting that iterator() be always sorted.  SortedMap implements Iterable 
(by way of Collection), and the iterator it returns is always in the sorted 
order.  We'd have BasicVertex do the same thing.

 Introduce new primitive-specific MutableVertex subclasses
 -

 Key: GIRAPH-28
 URL: https://issues.apache.org/jira/browse/GIRAPH-28
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-28.diff, GIRAPH-28.diff


 As discussed on the list, 
 MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for 
 example) could be highly optimized in its memory footprint if the vertex and 
 edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103341#comment-13103341
 ] 

Jake Mannix commented on GIRAPH-28:
---

Also, to contradict my 1st and 3rd points, Dmitriy pointed out (in an 
out-of-band chat) that if we don't want to expose Edge to the user, because a) 
don't want to store it in memory (as his test showed that even switching 
TreeMapI, EdgeI,E to TreeMapI, E reduced memory usage by a fair amount), 
and b) don't want to have to instantiate tons of useless objects by lazily 
creating them, we could instead just keep the getEdgeValue() and removeEdge() 
as they were, but also add a boolean hasEdge(I targetVertexId) to test for 
connection.  

Then you get everything you need without exposing the Edge class (which only 
gets used internally for its Writable nature):

if(vertex.hasEdge(targetVertexId)) { 
  E value = vertex.getEdgeValue(targetVertexId);
  vertex.removeEdge(targetVertexId);
}

etc...

 Introduce new primitive-specific MutableVertex subclasses
 -

 Key: GIRAPH-28
 URL: https://issues.apache.org/jira/browse/GIRAPH-28
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-28.diff, GIRAPH-28.diff


 As discussed on the list, 
 MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for 
 example) could be highly optimized in its memory footprint if the vertex and 
 edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103343#comment-13103343
 ] 

Avery Ching commented on GIRAPH-28:
---

Would we guarantee that iterator() would always be sorted?  For instance, what 
about something like TinyVertex?  A pair of arrays would be expensive to keep 
sorted...

Or are you suggesting that whether iterator is sorted or not depends on the 
implementation?  

I agree that implementing Iterable is pretty convenient for users.  Perhaps 
have both (implement Iterable and the two other methods)?  This would allow 
applications based on TinyVertex to be optimized when not requiring sorting.

 Introduce new primitive-specific MutableVertex subclasses
 -

 Key: GIRAPH-28
 URL: https://issues.apache.org/jira/browse/GIRAPH-28
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-28.diff, GIRAPH-28.diff


 As discussed on the list, 
 MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for 
 example) could be highly optimized in its memory footprint if the vertex and 
 edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103350#comment-13103350
 ] 

Avery Ching commented on GIRAPH-28:
---

boolean hasEdge(I targetVertexId) is fine as an alternative.  For the iterators 
though, we are returning an iterator of type EdgeI, E, so we will have to 
create those EdgeI, E objects on the fly for some implementations.  I suppose 
that hasEdge() adds a bit of work though to check before adding or removing as 
opposed to returning the full edge.  I think I'd lean a slight bit in the 
EdgeI, E methods for consistency and reduced code.  But I can be out-voted. =)



 Introduce new primitive-specific MutableVertex subclasses
 -

 Key: GIRAPH-28
 URL: https://issues.apache.org/jira/browse/GIRAPH-28
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-28.diff, GIRAPH-28.diff


 As discussed on the list, 
 MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for 
 example) could be highly optimized in its memory footprint if the vertex and 
 edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103357#comment-13103357
 ] 

Jake Mannix commented on GIRAPH-28:
---

The alternative to IterableEdgeI, E is IterableI, returning only the 
target vertices, and you can call getEdgeValue(targetVertexId) on any of these 
if you need it.  Again, many algorithms will simply do something like

for(I targetId : vertex) {
  sendMsg(targetId, someFunction(baseMsg, getEdgeValue(targetId));
}

which is maybe a little nicer looking (or at least not uglier) than:

for(EdgeI, E edge : vertex) {
  sendMsg(edge.getVertexId(), someFunction(baseMsg, edge.getValue());
}

And then there are no Edge objects hanging around.

Alternatively, Edge could act just like a typical Writable, and the 
IteratorEdgeI, E iterates over the *same* Edge object setting different 
values on it as next() is called.


 Introduce new primitive-specific MutableVertex subclasses
 -

 Key: GIRAPH-28
 URL: https://issues.apache.org/jira/browse/GIRAPH-28
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-28.diff, GIRAPH-28.diff


 As discussed on the list, 
 MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for 
 example) could be highly optimized in its memory footprint if the vertex and 
 edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103363#comment-13103363
 ] 

Jake Mannix commented on GIRAPH-28:
---

As for sorting, I'd imagine that assuming it always returns a sorted iterator 
is fine, but yes, some implementations I can imagine might not want to do that. 
 I'd lean against having multiple iterators until it was known that they were 
needed, and maybe just document the ones which return nonsorted ones so that 
things don't get messed up? 

Vertex subclasses are where the algorithms are implemented, right?  So a 
Vertex knows whether it has a sorted iterator or not... the only question would 
be: are there generic methods implemented in things like BspServiceWorker, or 
GraphMapper, which would be expected to need to do things to a sorted iterator? 
 Currently there are no such places that I can see.   Without such cases, we 
could easily leave Vertex implementations to decide whether they needed to 
return sorted iterators or not.

 Introduce new primitive-specific MutableVertex subclasses
 -

 Key: GIRAPH-28
 URL: https://issues.apache.org/jira/browse/GIRAPH-28
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-28.diff, GIRAPH-28.diff


 As discussed on the list, 
 MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for 
 example) could be highly optimized in its memory footprint if the vertex and 
 edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103365#comment-13103365
 ] 

Dmitriy V. Ryaboy commented on GIRAPH-28:
-

I'd caution against the approach of using a MutatorIterator (that's my name for 
that pattern. Like it? :)).
It's effective, but leads to extremely confusing bugs when people try to do 
things like take the first three edges, etc. Presenting a familiar interface 
but providing a tricky unintuitive implementation is not super friendly to 
developers; I don't think we want people to have to study the API to such an 
extent they have to know these details.

 Introduce new primitive-specific MutableVertex subclasses
 -

 Key: GIRAPH-28
 URL: https://issues.apache.org/jira/browse/GIRAPH-28
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-28.diff, GIRAPH-28.diff


 As discussed on the list, 
 MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for 
 example) could be highly optimized in its memory footprint if the vertex and 
 edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103369#comment-13103369
 ] 

Dmitriy V. Ryaboy commented on GIRAPH-28:
-

This:

bq. Alternatively, Edge could act just like a typical Writable, and the 
IteratorEdgeI, E iterates over the same Edge object setting different 
values on it as next() is called.

 Introduce new primitive-specific MutableVertex subclasses
 -

 Key: GIRAPH-28
 URL: https://issues.apache.org/jira/browse/GIRAPH-28
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
 Attachments: GIRAPH-28.diff, GIRAPH-28.diff


 As discussed on the list, 
 MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for 
 example) could be highly optimized in its memory footprint if the vertex and 
 edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira