Incubator report is due
Hey Guys, Our Giraph Incubator report is due: http://wiki.apache.org/incubator/September2011 If I don't see any activity in the next 24 hours, I'll add a stub report. Cheers, Chris ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] [Commented] (GIRAPH-12) Investigate communication improvements
[ https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102682#comment-13102682 ] Hyunsik Choi commented on GIRAPH-12: Like the current PeerThread, initially each PeerConnection gets one established RPC proxy. These connections are kept during whole processing. So, there is no connection overhead. If you test this code on Yahoo!'s clusters, I'll appreciate your help. And, next week I can access to my lab's hadoop cluster. At that time, I'll also do some tests. > Investigate communication improvements > -- > > Key: GIRAPH-12 > URL: https://issues.apache.org/jira/browse/GIRAPH-12 > Project: Giraph > Issue Type: Improvement > Components: bsp >Reporter: Avery Ching >Assignee: Hyunsik Choi >Priority: Minor > Attachments: GIRAPH-12_1.patch > > > Currently every worker will start up a thread to communicate with every other > workers. Hadoop RPC is used for communication. For instance if there are > 400 workers, each worker will create 400 threads. This ends up using a lot > of memory, even with the option > -Dmapred.child.java.opts="-Xss64k". > It would be good to investigate using frameworks like Netty or custom roll > our own to improve this situation. By moving away from Hadoop RPC, we would > also make compatibility of different Hadoop versions easier. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Incubator report is due
I'd propose: Giraph Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework that runs on Hadoop. Giraph entered the incubator in August 2011. Project developments: * Project website created. * Confluence wiki created. * Accounts were created for two of the committers. * Project is entirely on Apache infrastructure. Next steps: * Adding new committers. * Making a release. * One of the initial committers still hasn't filed an ICLA. We either need him to move forward or remove him.
Re: Incubator report is due
Sounds good to me. I'll reach out to Arun and see if he can fill out out the ICLA. Avery On Sep 12, 2011, at 8:44 AM, Owen O'Malley wrote: > I'd propose: > > Giraph > > Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel > (BSP)-based graph > processing framework that runs on Hadoop. Giraph entered the incubator > in August 2011. > > Project developments: > > * Project website created. > * Confluence wiki created. > * Accounts were created for two of the committers. > * Project is entirely on Apache infrastructure. > > Next steps: > * Adding new committers. > * Making a release. > * One of the initial committers still hasn't filed an ICLA. We either > need him to move forward or remove him.
[jira] [Updated] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated GIRAPH-28: -- Attachment: GIRAPH-28.diff newly regenerated patch, should apply cleanly against trunk. Tests still pass (yay), but still not terribly clean code. I would argue that until we nail down the getOutpuEdgeMap() API, this patch should be held off on merging. But let's play with it more, and I think I'll turn Dmitriy's TinyVertex into another option in here, as it's a common use-case, and even smaller still than the current primitive container class. > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102916#comment-13102916 ] Hudson commented on GIRAPH-27: -- Integrated in Giraph-trunk-Commit #4 (See [https://builds.apache.org/job/Giraph-trunk-Commit/4/]) GIRAPH-27: Fixed type parameter errors in Hudson (aching). aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1169863 Files : * /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102937#comment-13102937 ] Jake Mannix commented on GIRAPH-27: --- Why are you using redundant type parameters? Type inference gets you the type for free, and it compiles fine without it, and is unambiguous and more compact... > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102970#comment-13102970 ] Avery Ching commented on GIRAPH-27: --- Jake, please see https://builds.apache.org/job/Giraph-trunk-Commit/3/console [ERROR] /x1/jenkins/jenkins-slave/workspace/Giraph-trunk-Commit/trunk/src/main/java/org/apache/giraph/graph/Vertex.java:[145,38] type parameters of I cannot be determined; no unique maximal instance exists for type variable I with upper bounds I,org.apache.hadoop.io.WritableComparable [ERROR] /x1/jenkins/jenkins-slave/workspace/Giraph-trunk-Commit/trunk/src/main/java/org/apache/giraph/graph/Vertex.java:[150,42] type parameters of V cannot be determined; no unique maximal instance exists for type variable V with upper bounds V,org.apache.hadoop.io.Writable [ERROR] /x1/jenkins/jenkins-slave/workspace/Giraph-trunk-Commit/trunk/src/main/java/org/apache/giraph/graph/Vertex.java:[163,43] type parameters of M cannot be determined; no unique maximal instance exists for type variable M with upper bounds M,org.apache.hadoop.io.Writable to see the errors without explicit types. This seems to be a known issue (http://stackoverflow.com/questions/2640060/inconsistency-between-sun-jre-javac-and-eclipse-java-compiler). After the changes (build #4), it passed. > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102984#comment-13102984 ] Jake Mannix commented on GIRAPH-27: --- oh that's super annoying. I'll have to configure my IDE to make sure it is "bug-compatible" with Sun JRE. :\ > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored
[ https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102986#comment-13102986 ] Avery Ching commented on GIRAPH-27: --- Oh, it's very annoying. This problem has bitten me many times. =) > Mutable static global state in Vertex.java should be refactored > --- > > Key: GIRAPH-27 > URL: https://issues.apache.org/jira/browse/GIRAPH-27 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-27.patch, GIRAPH-27.patch > > > Vertex.java has a bunch of static methods for getting/setting global graph > state (total number of vertices, edges, a reference to the GraphMapper, etc). > Refactoring this into a GraphState object, which every Vertex can hold onto > a reference to (yes, a tiny bit more memory per Vertex, but in comparison to > what's already in there...) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Incubator report is due
+1 from me. I added it (and signed off from me and owen -- took the liberty to sign owen since he wrote it ;) ) to here: http://wiki.apache.org/incubator/September2011 Cheers, Chris On Sep 12, 2011, at 11:44 AM, Owen O'Malley wrote: > I'd propose: > > Giraph > > Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel > (BSP)-based graph > processing framework that runs on Hadoop. Giraph entered the incubator > in August 2011. > > Project developments: > > * Project website created. > * Confluence wiki created. > * Accounts were created for two of the committers. > * Project is entirely on Apache infrastructure. > > Next steps: > * Adding new committers. > * Making a release. > * One of the initial committers still hasn't filed an ICLA. We either > need him to move forward or remove him. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1310#comment-1310 ] Jake Mannix commented on GIRAPH-28: --- Ok, so I went ahead and made a 'straw-man' refactoring branch (on GitHub: https://github.com/jakemannix/giraph/tree/vertex_map_refactor ), removing the getDestEdgeMap() method, and having BasicVertex implement Iterable, as well as the random-access read method getEdgeValue(targetVertexId). I got it passing tests, but ran into a few things we may want to consider: testing for existence of a target vertex is no longer possible: getEdgeValue(targetVertexId) returns the *value* associated with the edge. Edges are allowed to have null values and still denote a connection between the source and target vertex, right? Maybe we should just have an Edge getEdge(I targetVertexId) method instead? Secondly, far less importantly, is we need to have getNumOutEdges(), because clients often want to know the out-degree of the vertex, and they used to call getDestEdgeMap().size(). Thirdly: for the same reason that getEdgeValue() can return superfluous nulls, removeEdge(), used as a boolean, can trick the caller into thinking there was no connection to the target (because removeEdge() returned null), but really it's because I was trying to be clever and return the *value* which could be null. Having removeEdge() return the actual Edge fixes this. I'll open another ticket for this stuff, as patching this patch seems a bit silly. > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103338#comment-13103338 ] Avery Ching commented on GIRAPH-28: --- Jake, I agree with all 3 of your points. The only question I have is regarding implementing Iterable. Will the user expect to get back a sorted iterator or a non-sorted one? I'm guessing we'll lose some functionality here? I don't think we would want to add a "switch" of sorts to choose. The other way we had discussed way the methods Iterator> getOutEdgeIterator(); Iterator> getSortedOutEdgeIterator(); as opposed to Iterable. > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods --- Key: GIRAPH-31 URL: https://issues.apache.org/jira/browse/GIRAPH-31 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Jake Mannix Assignee: Jake Mannix As discussed on the list, and on GIRAPH-28, the SortedMap> is an implementation detail which needs not be exposed to application developers - they need to iterate over the edges, and possibly access them one-by-one, and remove them (in the Mutable case), but they don't need the SortedMap, and creating primitive-optimized BasicVertex implementations is hampered by the fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103339#comment-13103339 ] Jake Mannix commented on GIRAPH-28: --- I'm suggesting that iterator() be always sorted. SortedMap implements Iterable (by way of Collection), and the iterator it returns is always in the sorted order. We'd have BasicVertex do the same thing. > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103341#comment-13103341 ] Jake Mannix commented on GIRAPH-28: --- Also, to contradict my 1st and 3rd points, Dmitriy pointed out (in an out-of-band chat) that if we don't want to expose Edge to the user, because a) don't want to store it in memory (as his test showed that even switching TreeMap> to TreeMap reduced memory usage by a fair amount), and b) don't want to have to instantiate tons of useless objects by lazily creating them, we could instead just keep the getEdgeValue() and removeEdge() as they were, but also add a boolean hasEdge(I targetVertexId) to test for connection. Then you get everything you need without exposing the Edge class (which only gets used internally for its Writable nature): if(vertex.hasEdge(targetVertexId)) { E value = vertex.getEdgeValue(targetVertexId); vertex.removeEdge(targetVertexId); } etc... > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103343#comment-13103343 ] Avery Ching commented on GIRAPH-28: --- Would we guarantee that iterator() would always be sorted? For instance, what about something like TinyVertex? A pair of arrays would be expensive to keep sorted... Or are you suggesting that whether iterator is sorted or not depends on the implementation? I agree that implementing Iterable is pretty convenient for users. Perhaps have both (implement Iterable and the two other methods)? This would allow applications based on TinyVertex to be optimized when not requiring sorting. > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103350#comment-13103350 ] Avery Ching commented on GIRAPH-28: --- boolean hasEdge(I targetVertexId) is fine as an alternative. For the iterators though, we are returning an iterator of type Edge, so we will have to create those Edge objects on the fly for some implementations. I suppose that hasEdge() adds a bit of work though to check before adding or removing as opposed to returning the full edge. I think I'd lean a slight bit in the Edge methods for consistency and reduced code. But I can be out-voted. =) > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103357#comment-13103357 ] Jake Mannix commented on GIRAPH-28: --- The alternative to Iterable> is Iterable, returning only the target vertices, and you can call getEdgeValue(targetVertexId) on any of these if you need it. Again, many algorithms will simply do something like for(I targetId : vertex) { sendMsg(targetId, someFunction(baseMsg, getEdgeValue(targetId)); } which is maybe a little nicer looking (or at least not uglier) than: for(Edge edge : vertex) { sendMsg(edge.getVertexId(), someFunction(baseMsg, edge.getValue()); } And then there are no Edge objects hanging around. Alternatively, Edge could act just like a typical Writable, and the Iterator> iterates over the *same* Edge object setting different values on it as next() is called. > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103354#comment-13103354 ] Dmitriy V. Ryaboy commented on GIRAPH-28: - Technically you shouldn't *have* to use hasEdge when adding and removing if you don't care. removeEdge() can return null ambiguously (value was null, or no such edge existed), and if you care, you can use hasEdge(), but if you don't, you don't. addEdge() can be an upsert. > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103363#comment-13103363 ] Jake Mannix commented on GIRAPH-28: --- As for sorting, I'd imagine that assuming it always returns a sorted iterator is fine, but yes, some implementations I can imagine might not want to do that. I'd lean against having multiple iterators until it was known that they were needed, and maybe just document the ones which return nonsorted ones so that things don't get messed up? Vertex subclasses are where the "algorithms" are implemented, right? So a Vertex knows whether it has a sorted iterator or not... the only question would be: are there generic methods implemented in things like BspServiceWorker, or GraphMapper, which would be expected to need to do things to a sorted iterator? Currently there are no such places that I can see. Without such cases, we could easily leave Vertex implementations to decide whether they needed to return sorted iterators or not. > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103365#comment-13103365 ] Dmitriy V. Ryaboy commented on GIRAPH-28: - I'd caution against the approach of using a MutatorIterator (that's my name for that pattern. Like it? :)). It's effective, but leads to extremely confusing bugs when people try to do things like take the first three edges, etc. Presenting a familiar interface but providing a tricky unintuitive implementation is not super friendly to developers; I don't think we want people to have to study the API to such an extent they have to know these details. > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103367#comment-13103367 ] Jake Mannix commented on GIRAPH-28: --- What do you mean by the "MutatorIterator" pattern? Not being clear about whether it's sorted or not? Or forcing iterator() to always be sorted? Or something else, about the X-Men series? > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103369#comment-13103369 ] Dmitriy V. Ryaboy commented on GIRAPH-28: - This: bq. Alternatively, Edge could act just like a typical Writable, and the Iterator> iterates over the same Edge object setting different values on it as next() is called. > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses
[ https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103372#comment-13103372 ] Jake Mannix commented on GIRAPH-28: --- ah, yes. That can be a nasty pit of snakes for new developers, no matter how commonly it's found in Hadoop-land. So I'll put in my vote for Iterable, with your offline suggestion of boolean hasSortedIterator() (defaulting in BasicVertex to "return true;", overrideable in subclasses). And I'll put in a patch on GIRAPH-31 so my code'll be where my mouth is (and we can continue this discussion on a ticket with a shorter thread [so far]). > Introduce new primitive-specific MutableVertex subclasses > - > > Key: GIRAPH-28 > URL: https://issues.apache.org/jira/browse/GIRAPH-28 > Project: Giraph > Issue Type: New Feature > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-28.diff, GIRAPH-28.diff > > > As discussed on the list, > MutableVertex (for > example) could be highly optimized in its memory footprint if the vertex and > edge data were held in a form which minimized Java object usage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jake Mannix updated GIRAPH-31: -- Attachment: GIRAPH-31.diff This patch is also an attempt at hiding the Edge class from external-facing APIs. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods
[ https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103399#comment-13103399 ] Jake Mannix commented on GIRAPH-31: --- But I didn't do anything about the uses of the Edge class in VertexChanges, VertexMutations, or hence MutableVertex#addEdgeRequest(). Not sure about their uses, so those are left alone. > Hide the SortedMap> in Vertex from client visibility (impl. > detail), replace with appropriate accessor methods > --- > > Key: GIRAPH-31 > URL: https://issues.apache.org/jira/browse/GIRAPH-31 > Project: Giraph > Issue Type: Improvement > Components: graph >Affects Versions: 0.70.0 >Reporter: Jake Mannix >Assignee: Jake Mannix > Attachments: GIRAPH-31.diff > > > As discussed on the list, and on GIRAPH-28, the SortedMap> is an > implementation detail which needs not be exposed to application developers - > they need to iterate over the edges, and possibly access them one-by-one, and > remove them (in the Mutable case), but they don't need the SortedMap, and > creating primitive-optimized BasicVertex implementations is hampered by the > fact that clients expect this Map to exist. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira