[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102682#comment-13102682 ] Hyunsik Choi commented on GIRAPH-12: Like the current PeerThread, initially each PeerConnection gets one established RPC proxy. These connections are kept during whole processing. So, there is no connection overhead. If you test this code on Yahoo!'s clusters, I'll appreciate your help. And, next week I can access to my lab's hadoop cluster. At that time, I'll also do some tests. Investigate communication improvements -- Key: GIRAPH-12 URL: https://issues.apache.org/jira/browse/GIRAPH-12 Project: Giraph Issue Type: Improvement Components: bsp Reporter: Avery Ching Assignee: Hyunsik Choi Priority: Minor Attachments: GIRAPH-12_1.patch Currently every worker will start up a thread to communicate with every other workers. Hadoop RPC is used for communication. For instance if there are 400 workers, each worker will create 400 threads. This ends up using a lot of memory, even with the option -Dmapred.child.java.opts=-Xss64k. It would be good to investigate using frameworks like Netty or custom roll our own to improve this situation. By moving away from Hadoop RPC, we would also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Incubator report is due
I'd propose: Giraph Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework that runs on Hadoop. Giraph entered the incubator in August 2011. Project developments: * Project website created. * Confluence wiki created. * Accounts were created for two of the committers. * Project is entirely on Apache infrastructure. Next steps: * Adding new committers. * Making a release. * One of the initial committers still hasn't filed an ICLA. We either need him to move forward or remove him.
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102916#comment-13102916 ] Hudson commented on GIRAPH-27: -- Integrated in Giraph-trunk-Commit #4 (See [https://builds.apache.org/job/Giraph-trunk-Commit/4/]) GIRAPH-27: Fixed type parameter errors in Hudson (aching). aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVNview=revrev=1169863 Files : * /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java Mutable static global state in Vertex.java should be refactored --- Key: GIRAPH-27 URL: https://issues.apache.org/jira/browse/GIRAPH-27 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-27.patch, GIRAPH-27.patch Vertex.java has a bunch of static methods for getting/setting global graph state (total number of vertices, edges, a reference to the GraphMapper, etc). Refactoring this into a GraphState object, which every Vertex can hold onto a reference to (yes, a tiny bit more memory per Vertex, but in comparison to what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102986#comment-13102986 ] Avery Ching commented on GIRAPH-27: --- Oh, it's very annoying. This problem has bitten me many times. =) Mutable static global state in Vertex.java should be refactored --- Key: GIRAPH-27 URL: https://issues.apache.org/jira/browse/GIRAPH-27 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-27.patch, GIRAPH-27.patch Vertex.java has a bunch of static methods for getting/setting global graph state (total number of vertices, edges, a reference to the GraphMapper, etc). Refactoring this into a GraphState object, which every Vertex can hold onto a reference to (yes, a tiny bit more memory per Vertex, but in comparison to what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1310#comment-1310 ] Jake Mannix commented on GIRAPH-28: --- Ok, so I went ahead and made a 'straw-man' refactoring branch (on GitHub: https://github.com/jakemannix/giraph/tree/vertex_map_refactor ), removing the getDestEdgeMap() method, and having BasicVertex implement Iterable, as well as the random-access read method getEdgeValue(targetVertexId). I got it passing tests, but ran into a few things we may want to consider: testing for existence of a target vertex is no longer possible: getEdgeValue(targetVertexId) returns the *value* associated with the edge. Edges are allowed to have null values and still denote a connection between the source and target vertex, right? Maybe we should just have an EdgeI, E getEdge(I targetVertexId) method instead? Secondly, far less importantly, is we need to have getNumOutEdges(), because clients often want to know the out-degree of the vertex, and they used to call getDestEdgeMap().size(). Thirdly: for the same reason that getEdgeValue() can return superfluous nulls, removeEdge(), used as a boolean, can trick the caller into thinking there was no connection to the target (because removeEdge() returned null), but really it's because I was trying to be clever and return the *value* which could be null. Having removeEdge() return the actual Edge fixes this. I'll open another ticket for this stuff, as patching this patch seems a bit silly. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103338#comment-13103338 ] Avery Ching commented on GIRAPH-28: --- Jake, I agree with all 3 of your points. The only question I have is regarding implementing Iterable. Will the user expect to get back a sorted iterator or a non-sorted one? I'm guessing we'll lose some functionality here? I don't think we would want to add a switch of sorts to choose. The other way we had discussed way the methods IteratorEdgeI, E getOutEdgeIterator(); IteratorEdgeI, E getSortedOutEdgeIterator(); as opposed to Iterable. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-31) Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
Hide the SortedMapI, EdgeI,E in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix As discussed on the list, and on GIRAPH-28, the SortedMapI, EdgeI,E is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103339#comment-13103339 ] Jake Mannix commented on GIRAPH-28: --- I'm suggesting that iterator() be always sorted. SortedMap implements Iterable (by way of Collection), and the iterator it returns is always in the sorted order. We'd have BasicVertex do the same thing. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103341#comment-13103341 ] Jake Mannix commented on GIRAPH-28: --- Also, to contradict my 1st and 3rd points, Dmitriy pointed out (in an out-of-band chat) that if we don't want to expose Edge to the user, because a) don't want to store it in memory (as his test showed that even switching TreeMapI, EdgeI,E to TreeMapI, E reduced memory usage by a fair amount), and b) don't want to have to instantiate tons of useless objects by lazily creating them, we could instead just keep the getEdgeValue() and removeEdge() as they were, but also add a boolean hasEdge(I targetVertexId) to test for connection. Then you get everything you need without exposing the Edge class (which only gets used internally for its Writable nature): if(vertex.hasEdge(targetVertexId)) { E value = vertex.getEdgeValue(targetVertexId); vertex.removeEdge(targetVertexId); } etc... Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103343#comment-13103343 ] Avery Ching commented on GIRAPH-28: --- Would we guarantee that iterator() would always be sorted? For instance, what about something like TinyVertex? A pair of arrays would be expensive to keep sorted... Or are you suggesting that whether iterator is sorted or not depends on the implementation? I agree that implementing Iterable is pretty convenient for users. Perhaps have both (implement Iterable and the two other methods)? This would allow applications based on TinyVertex to be optimized when not requiring sorting. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103350#comment-13103350 ] Avery Ching commented on GIRAPH-28: --- boolean hasEdge(I targetVertexId) is fine as an alternative. For the iterators though, we are returning an iterator of type EdgeI, E, so we will have to create those EdgeI, E objects on the fly for some implementations. I suppose that hasEdge() adds a bit of work though to check before adding or removing as opposed to returning the full edge. I think I'd lean a slight bit in the EdgeI, E methods for consistency and reduced code. But I can be out-voted. =) Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103357#comment-13103357 ] Jake Mannix commented on GIRAPH-28: --- The alternative to IterableEdgeI, E is IterableI, returning only the target vertices, and you can call getEdgeValue(targetVertexId) on any of these if you need it. Again, many algorithms will simply do something like for(I targetId : vertex) { sendMsg(targetId, someFunction(baseMsg, getEdgeValue(targetId)); } which is maybe a little nicer looking (or at least not uglier) than: for(EdgeI, E edge : vertex) { sendMsg(edge.getVertexId(), someFunction(baseMsg, edge.getValue()); } And then there are no Edge objects hanging around. Alternatively, Edge could act just like a typical Writable, and the IteratorEdgeI, E iterates over the *same* Edge object setting different values on it as next() is called. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103363#comment-13103363 ] Jake Mannix commented on GIRAPH-28: --- As for sorting, I'd imagine that assuming it always returns a sorted iterator is fine, but yes, some implementations I can imagine might not want to do that. I'd lean against having multiple iterators until it was known that they were needed, and maybe just document the ones which return nonsorted ones so that things don't get messed up? Vertex subclasses are where the algorithms are implemented, right? So a Vertex knows whether it has a sorted iterator or not... the only question would be: are there generic methods implemented in things like BspServiceWorker, or GraphMapper, which would be expected to need to do things to a sorted iterator? Currently there are no such places that I can see. Without such cases, we could easily leave Vertex implementations to decide whether they needed to return sorted iterators or not. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103365#comment-13103365 ] Dmitriy V. Ryaboy commented on GIRAPH-28: - I'd caution against the approach of using a MutatorIterator (that's my name for that pattern. Like it? :)). It's effective, but leads to extremely confusing bugs when people try to do things like take the first three edges, etc. Presenting a familiar interface but providing a tricky unintuitive implementation is not super friendly to developers; I don't think we want people to have to study the API to such an extent they have to know these details. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103369#comment-13103369 ] Dmitriy V. Ryaboy commented on GIRAPH-28: - This: bq. Alternatively, Edge could act just like a typical Writable, and the IteratorEdgeI, E iterates over the same Edge object setting different values on it as next() is called. Introduce new primitive-specific MutableVertex subclasses - Key: GIRAPH-28 URL: https://issues.apache.org/jira/browse/GIRAPH-28 Project: Giraph Issue Type: New Feature Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix Attachments: GIRAPH-28.diff, GIRAPH-28.diff As discussed on the list, MutableVertexLongWritable,DoubleWritable,FloatWritable,DoubleWritable (for example) could be highly optimized in its memory footprint if the vertex and edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira