Incubator report is due

2011-09-12 Thread Mattmann, Chris A (388J)
Hey Guys,

Our Giraph Incubator report is due:

http://wiki.apache.org/incubator/September2011

If I don't see any activity in the next 24 hours, I'll add a 
stub report.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



[jira] [Commented] (GIRAPH-12) Investigate communication improvements

2011-09-12 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-12?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102682#comment-13102682
 ] 

Hyunsik Choi commented on GIRAPH-12:


Like the current PeerThread, initially each PeerConnection gets one established 
RPC proxy. These connections are kept during whole processing. So, there is no 
connection overhead. 

If you test this code on Yahoo!'s clusters, I'll appreciate your help. And, 
next week I can access to my lab's hadoop cluster. At that time, I'll also do 
some tests.

> Investigate communication improvements
> --
>
> Key: GIRAPH-12
> URL: https://issues.apache.org/jira/browse/GIRAPH-12
> Project: Giraph
>  Issue Type: Improvement
>  Components: bsp
>Reporter: Avery Ching
>Assignee: Hyunsik Choi
>Priority: Minor
> Attachments: GIRAPH-12_1.patch
>
>
> Currently every worker will start up a thread to communicate with every other 
> workers.  Hadoop RPC is used for communication.  For instance if there are 
> 400 workers, each worker will create 400 threads.  This ends up using a lot 
> of memory, even with the option  
> -Dmapred.child.java.opts="-Xss64k".  
> It would be good to investigate using frameworks like Netty or custom roll 
> our own to improve this situation.  By moving away from Hadoop RPC, we would 
> also make compatibility of different Hadoop versions easier.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Incubator report is due

2011-09-12 Thread Owen O'Malley
I'd propose:

Giraph

Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel
(BSP)-based graph
processing framework that runs on Hadoop. Giraph entered the incubator
in August 2011.

Project developments:

* Project website created.
* Confluence wiki created.
* Accounts were created for two of the committers.
* Project is entirely on Apache infrastructure.

Next steps:
* Adding new committers.
* Making a release.
* One of the initial committers still hasn't filed an ICLA. We either
need him to move forward or remove him.


Re: Incubator report is due

2011-09-12 Thread Avery Ching
Sounds good to me.  I'll reach out to Arun and see if he can fill out out the 
ICLA.

Avery

On Sep 12, 2011, at 8:44 AM, Owen O'Malley wrote:

> I'd propose:
> 
> Giraph
> 
> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel
> (BSP)-based graph
> processing framework that runs on Hadoop. Giraph entered the incubator
> in August 2011.
> 
> Project developments:
> 
> * Project website created.
> * Confluence wiki created.
> * Accounts were created for two of the committers.
> * Project is entirely on Apache infrastructure.
> 
> Next steps:
> * Adding new committers.
> * Making a release.
> * One of the initial committers still hasn't filed an ICLA. We either
> need him to move forward or remove him.



[jira] [Updated] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Mannix updated GIRAPH-28:
--

Attachment: GIRAPH-28.diff

newly regenerated patch, should apply cleanly against trunk.  Tests still pass 
(yay), but still not terribly clean code.  I would argue that until we nail 
down the getOutpuEdgeMap()  API, this patch should be held off on merging.  

But let's play with it more, and I think I'll turn Dmitriy's TinyVertex into 
another option in here, as it's a common use-case, and even smaller still than 
the current primitive container class.

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-12 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102916#comment-13102916
 ] 

Hudson commented on GIRAPH-27:
--

Integrated in Giraph-trunk-Commit #4 (See 
[https://builds.apache.org/job/Giraph-trunk-Commit/4/])
GIRAPH-27: Fixed type parameter errors in Hudson (aching).

aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1169863
Files : 
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java


> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102937#comment-13102937
 ] 

Jake Mannix commented on GIRAPH-27:
---

Why are you using redundant type parameters?  Type inference gets you the type 
for free, and it compiles fine without it, and is unambiguous and more 
compact...

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-12 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102970#comment-13102970
 ] 

Avery Ching commented on GIRAPH-27:
---

Jake, please see

https://builds.apache.org/job/Giraph-trunk-Commit/3/console

[ERROR] 
/x1/jenkins/jenkins-slave/workspace/Giraph-trunk-Commit/trunk/src/main/java/org/apache/giraph/graph/Vertex.java:[145,38]
 type parameters of I cannot be determined; no unique maximal instance 
exists for type variable I with upper bounds 
I,org.apache.hadoop.io.WritableComparable
[ERROR] 
/x1/jenkins/jenkins-slave/workspace/Giraph-trunk-Commit/trunk/src/main/java/org/apache/giraph/graph/Vertex.java:[150,42]
 type parameters of V cannot be determined; no unique maximal instance 
exists for type variable V with upper bounds V,org.apache.hadoop.io.Writable
[ERROR] 
/x1/jenkins/jenkins-slave/workspace/Giraph-trunk-Commit/trunk/src/main/java/org/apache/giraph/graph/Vertex.java:[163,43]
 type parameters of M cannot be determined; no unique maximal instance 
exists for type variable M with upper bounds M,org.apache.hadoop.io.Writable

to see the errors without explicit types.  This seems to be a known issue 
(http://stackoverflow.com/questions/2640060/inconsistency-between-sun-jre-javac-and-eclipse-java-compiler).

After the changes (build #4), it passed.


> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102984#comment-13102984
 ] 

Jake Mannix commented on GIRAPH-27:
---

oh that's super annoying.  I'll have to configure my IDE to make sure it is 
"bug-compatible" with Sun JRE. :\

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-27) Mutable static global state in Vertex.java should be refactored

2011-09-12 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-27?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13102986#comment-13102986
 ] 

Avery Ching commented on GIRAPH-27:
---

Oh, it's very annoying.  This problem has bitten me many times. =)

> Mutable static global state in Vertex.java should be refactored
> ---
>
> Key: GIRAPH-27
> URL: https://issues.apache.org/jira/browse/GIRAPH-27
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-27.patch, GIRAPH-27.patch
>
>
> Vertex.java has a bunch of static methods for getting/setting global graph 
> state (total number of vertices, edges, a reference to the GraphMapper, etc). 
>  Refactoring this into a GraphState object, which every Vertex can hold onto 
> a reference to (yes, a tiny bit more memory per Vertex, but in comparison to 
> what's already in there...)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Incubator report is due

2011-09-12 Thread Mattmann, Chris A (388J)
+1 from me.

I added it (and signed off from me and owen -- took the liberty to sign owen 
since he wrote it ;) ) to here:

http://wiki.apache.org/incubator/September2011

Cheers,
Chris

On Sep 12, 2011, at 11:44 AM, Owen O'Malley wrote:

> I'd propose:
> 
> Giraph
> 
> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel
> (BSP)-based graph
> processing framework that runs on Hadoop. Giraph entered the incubator
> in August 2011.
> 
> Project developments:
> 
> * Project website created.
> * Confluence wiki created.
> * Accounts were created for two of the committers.
> * Project is entirely on Apache infrastructure.
> 
> Next steps:
> * Adding new committers.
> * Making a release.
> * One of the initial committers still hasn't filed an ICLA. We either
> need him to move forward or remove him.


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1310#comment-1310
 ] 

Jake Mannix commented on GIRAPH-28:
---

Ok, so I went ahead and made a 'straw-man' refactoring branch (on GitHub: 
https://github.com/jakemannix/giraph/tree/vertex_map_refactor ), removing the 
getDestEdgeMap() method, and having BasicVertex implement Iterable, as well as 
the random-access read method getEdgeValue(targetVertexId).

I got it passing tests, but ran into a few things we may want to consider:

testing for existence of a target vertex is no longer possible: 
getEdgeValue(targetVertexId) returns the *value* associated with the edge.  
Edges are allowed to have null values and still denote a connection between the 
source and target vertex, right?  Maybe we should just have an Edge 
getEdge(I targetVertexId) method instead?

Secondly, far less importantly, is we need to have getNumOutEdges(), because 
clients often want to know the out-degree of the vertex, and they used to call 
getDestEdgeMap().size().

Thirdly: for the same reason that getEdgeValue() can return superfluous nulls, 
removeEdge(), used as a boolean, can trick the caller into thinking there was 
no connection to the target (because removeEdge() returned null), but really 
it's because I was trying to be clever and return the *value* which could be 
null.  Having removeEdge() return the actual Edge fixes this.

I'll open another ticket for this stuff, as patching this patch seems a bit 
silly.

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103338#comment-13103338
 ] 

Avery Ching commented on GIRAPH-28:
---

Jake, I agree with all 3 of your points.  The only question I have is regarding 
implementing Iterable.  Will the user expect to get back a sorted iterator or a 
non-sorted one?  I'm guessing we'll lose some functionality here?  I don't 
think we would want to add a "switch" of sorts to choose.  The other way we had 
discussed way the methods

Iterator> getOutEdgeIterator();
Iterator> getSortedOutEdgeIterator();

as opposed to Iterable.

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods

2011-09-12 Thread Jake Mannix (JIRA)
Hide the SortedMap> in Vertex from client visibility (impl. 
detail), replace with appropriate accessor methods
---

 Key: GIRAPH-31
 URL: https://issues.apache.org/jira/browse/GIRAPH-31
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix


As discussed on the list, and on GIRAPH-28, the SortedMap> is an 
implementation detail which needs not be exposed to application developers - 
they need to iterate over the edges, and possibly access them one-by-one, and 
remove them (in the Mutable case), but they don't need the SortedMap, and 
creating primitive-optimized BasicVertex implementations is hampered by the 
fact that clients expect this Map to exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103339#comment-13103339
 ] 

Jake Mannix commented on GIRAPH-28:
---

I'm suggesting that iterator() be always sorted.  SortedMap implements Iterable 
(by way of Collection), and the iterator it returns is always in the sorted 
order.  We'd have BasicVertex do the same thing.

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103341#comment-13103341
 ] 

Jake Mannix commented on GIRAPH-28:
---

Also, to contradict my 1st and 3rd points, Dmitriy pointed out (in an 
out-of-band chat) that if we don't want to expose Edge to the user, because a) 
don't want to store it in memory (as his test showed that even switching 
TreeMap> to TreeMap reduced memory usage by a fair amount), 
and b) don't want to have to instantiate tons of useless objects by lazily 
creating them, we could instead just keep the getEdgeValue() and removeEdge() 
as they were, but also add a boolean hasEdge(I targetVertexId) to test for 
connection.  

Then you get everything you need without exposing the Edge class (which only 
gets used internally for its Writable nature):

if(vertex.hasEdge(targetVertexId)) { 
  E value = vertex.getEdgeValue(targetVertexId);
  vertex.removeEdge(targetVertexId);
}

etc...

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103343#comment-13103343
 ] 

Avery Ching commented on GIRAPH-28:
---

Would we guarantee that iterator() would always be sorted?  For instance, what 
about something like TinyVertex?  A pair of arrays would be expensive to keep 
sorted...

Or are you suggesting that whether iterator is sorted or not depends on the 
implementation?  

I agree that implementing Iterable is pretty convenient for users.  Perhaps 
have both (implement Iterable and the two other methods)?  This would allow 
applications based on TinyVertex to be optimized when not requiring sorting.

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103350#comment-13103350
 ] 

Avery Ching commented on GIRAPH-28:
---

boolean hasEdge(I targetVertexId) is fine as an alternative.  For the iterators 
though, we are returning an iterator of type Edge, so we will have to 
create those Edge objects on the fly for some implementations.  I suppose 
that hasEdge() adds a bit of work though to check before adding or removing as 
opposed to returning the full edge.  I think I'd lean a slight bit in the 
Edge methods for consistency and reduced code.  But I can be out-voted. =)



> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103357#comment-13103357
 ] 

Jake Mannix commented on GIRAPH-28:
---

The alternative to Iterable> is Iterable, returning only the 
target vertices, and you can call getEdgeValue(targetVertexId) on any of these 
if you need it.  Again, many algorithms will simply do something like

for(I targetId : vertex) {
  sendMsg(targetId, someFunction(baseMsg, getEdgeValue(targetId));
}

which is maybe a little nicer looking (or at least not uglier) than:

for(Edge edge : vertex) {
  sendMsg(edge.getVertexId(), someFunction(baseMsg, edge.getValue());
}

And then there are no Edge objects hanging around.

Alternatively, Edge could act just like a typical Writable, and the 
Iterator> iterates over the *same* Edge object setting different 
values on it as next() is called.


> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103354#comment-13103354
 ] 

Dmitriy V. Ryaboy commented on GIRAPH-28:
-

Technically you shouldn't *have* to use hasEdge when adding and removing if you 
don't care. removeEdge() can return null ambiguously (value was null, or no 
such edge existed), and if you care, you can use hasEdge(), but if you don't, 
you don't. addEdge() can be an upsert.

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103363#comment-13103363
 ] 

Jake Mannix commented on GIRAPH-28:
---

As for sorting, I'd imagine that assuming it always returns a sorted iterator 
is fine, but yes, some implementations I can imagine might not want to do that. 
 I'd lean against having multiple iterators until it was known that they were 
needed, and maybe just document the ones which return nonsorted ones so that 
things don't get messed up? 

Vertex subclasses are where the "algorithms" are implemented, right?  So a 
Vertex knows whether it has a sorted iterator or not... the only question would 
be: are there generic methods implemented in things like BspServiceWorker, or 
GraphMapper, which would be expected to need to do things to a sorted iterator? 
 Currently there are no such places that I can see.   Without such cases, we 
could easily leave Vertex implementations to decide whether they needed to 
return sorted iterators or not.

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103365#comment-13103365
 ] 

Dmitriy V. Ryaboy commented on GIRAPH-28:
-

I'd caution against the approach of using a MutatorIterator (that's my name for 
that pattern. Like it? :)).
It's effective, but leads to extremely confusing bugs when people try to do 
things like take the first three edges, etc. Presenting a familiar interface 
but providing a tricky unintuitive implementation is not super friendly to 
developers; I don't think we want people to have to study the API to such an 
extent they have to know these details.

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103367#comment-13103367
 ] 

Jake Mannix commented on GIRAPH-28:
---

What do you mean by the "MutatorIterator" pattern?  Not being clear about 
whether it's sorted or not?  Or forcing iterator() to always be sorted?  Or 
something else, about the X-Men series?

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103369#comment-13103369
 ] 

Dmitriy V. Ryaboy commented on GIRAPH-28:
-

This:

bq. Alternatively, Edge could act just like a typical Writable, and the 
Iterator> iterates over the same Edge object setting different 
values on it as next() is called.

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-28) Introduce new primitive-specific MutableVertex subclasses

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-28?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103372#comment-13103372
 ] 

Jake Mannix commented on GIRAPH-28:
---

ah, yes.  That can be a nasty pit of snakes for new developers, no matter how 
commonly it's found in Hadoop-land.

So I'll put in my vote for Iterable, with your offline suggestion of boolean 
hasSortedIterator() (defaulting in BasicVertex to "return true;", overrideable 
in subclasses).

And I'll put in a patch on GIRAPH-31 so my code'll be where my mouth is (and we 
can continue this discussion on a ticket with a shorter thread [so far]).

> Introduce new primitive-specific MutableVertex subclasses
> -
>
> Key: GIRAPH-28
> URL: https://issues.apache.org/jira/browse/GIRAPH-28
> Project: Giraph
>  Issue Type: New Feature
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-28.diff, GIRAPH-28.diff
>
>
> As discussed on the list, 
> MutableVertex (for 
> example) could be highly optimized in its memory footprint if the vertex and 
> edge data were held in a form which minimized Java object usage.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods

2011-09-12 Thread Jake Mannix (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jake Mannix updated GIRAPH-31:
--

Attachment: GIRAPH-31.diff

This patch is also an attempt at hiding the Edge class from external-facing 
APIs.

> Hide the SortedMap> in Vertex from client visibility (impl. 
> detail), replace with appropriate accessor methods
> ---
>
> Key: GIRAPH-31
> URL: https://issues.apache.org/jira/browse/GIRAPH-31
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-31.diff
>
>
> As discussed on the list, and on GIRAPH-28, the SortedMap> is an 
> implementation detail which needs not be exposed to application developers - 
> they need to iterate over the edges, and possibly access them one-by-one, and 
> remove them (in the Mutable case), but they don't need the SortedMap, and 
> creating primitive-optimized BasicVertex implementations is hampered by the 
> fact that clients expect this Map to exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-31) Hide the SortedMap> in Vertex from client visibility (impl. detail), replace with appropriate accessor methods

2011-09-12 Thread Jake Mannix (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-31?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13103399#comment-13103399
 ] 

Jake Mannix commented on GIRAPH-31:
---

But I didn't do anything about the uses of the Edge class in VertexChanges, 
VertexMutations, or hence MutableVertex#addEdgeRequest().  Not sure about their 
uses, so those are left alone.

> Hide the SortedMap> in Vertex from client visibility (impl. 
> detail), replace with appropriate accessor methods
> ---
>
> Key: GIRAPH-31
> URL: https://issues.apache.org/jira/browse/GIRAPH-31
> Project: Giraph
>  Issue Type: Improvement
>  Components: graph
>Affects Versions: 0.70.0
>Reporter: Jake Mannix
>Assignee: Jake Mannix
> Attachments: GIRAPH-31.diff
>
>
> As discussed on the list, and on GIRAPH-28, the SortedMap> is an 
> implementation detail which needs not be exposed to application developers - 
> they need to iterate over the edges, and possibly access them one-by-one, and 
> remove them (in the Mutable case), but they don't need the SortedMap, and 
> creating primitive-optimized BasicVertex implementations is hampered by the 
> fact that clients expect this Map to exist.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira