[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-25 Thread Brian Femiano (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261728#comment-13261728
 ] 

Brian Femiano commented on GIRAPH-153:
--

Updated contrib confluence wiki entry for clarity. 

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.1.patch, GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-25 Thread Claudio Martella (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261774#comment-13261774
 ] 

Claudio Martella commented on GIRAPH-185:
-

The performance of concurrentlinkedqueue is going to be faster than a 
synchronized block as it's just a CAS operation on the tail pointer, at least 
for the add() method which adds to the tail of the queue. Also, arrayList in 
any case should be slower on adding elements as it requires the memory 
expansion and copying when the allocated memory is exhausted.
Iteration could indeed be a bit slower than an arrayList because of cache.

The memory overhead of each entry of the queue is indeed something that should 
be investigated. Worst case, one might think of copying the 
concurrentlinkedqueue implementation and remove the prev pointer which we 
don't need.

 Improve concurrency of putMsg / putMsgList
 --

 Key: GIRAPH-185
 URL: https://issues.apache.org/jira/browse/GIRAPH-185
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.2.0
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 0.2.0

 Attachments: GIRAPH-185.patch, GIRAPH-185.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently in putMsg / putMsgList, a synchronized closure is used to protect 
 the whole transientInMessages when adding the new message. This lock prevents 
 other concurrent calls to putMsg/putMsgList and increases the response time. 
 We should use fine-grain locks to allow high concurrency in message 
 communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-25 Thread Claudio Martella (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261787#comment-13261787
 ] 

Claudio Martella commented on GIRAPH-185:
-

Actually I checked the source and there's no prev pointer, each Node has just a 
pointer to the payload and to the next node. The memory overhead should be 
small.

 Improve concurrency of putMsg / putMsgList
 --

 Key: GIRAPH-185
 URL: https://issues.apache.org/jira/browse/GIRAPH-185
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.2.0
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 0.2.0

 Attachments: GIRAPH-185.patch, GIRAPH-185.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently in putMsg / putMsgList, a synchronized closure is used to protect 
 the whole transientInMessages when adding the new message. This lock prevents 
 other concurrent calls to putMsg/putMsgList and increases the response time. 
 We should use fine-grain locks to allow high concurrency in message 
 communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-25 Thread Bo Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261830#comment-13261830
 ] 

Bo Wang commented on GIRAPH-185:


I checked the source and found the same thing. I think LinkedList should be ok 
in terms of space. ArrayList also has to keep empty space in the array to 
future insertion. Should we close this issue?

 Improve concurrency of putMsg / putMsgList
 --

 Key: GIRAPH-185
 URL: https://issues.apache.org/jira/browse/GIRAPH-185
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.2.0
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 0.2.0

 Attachments: GIRAPH-185.patch, GIRAPH-185.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently in putMsg / putMsgList, a synchronized closure is used to protect 
 the whole transientInMessages when adding the new message. This lock prevents 
 other concurrent calls to putMsg/putMsgList and increases the response time. 
 We should use fine-grain locks to allow high concurrency in message 
 communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-25 Thread Claudio Martella (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261854#comment-13261854
 ] 

Claudio Martella commented on GIRAPH-185:
-

Personally, I'd like to see some benchmarking on this issue. If we commit this, 
we should have an idea of the impact.

 Improve concurrency of putMsg / putMsgList
 --

 Key: GIRAPH-185
 URL: https://issues.apache.org/jira/browse/GIRAPH-185
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.2.0
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 0.2.0

 Attachments: GIRAPH-185.patch, GIRAPH-185.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently in putMsg / putMsgList, a synchronized closure is used to protect 
 the whole transientInMessages when adding the new message. This lock prevents 
 other concurrent calls to putMsg/putMsgList and increases the response time. 
 We should use fine-grain locks to allow high concurrency in message 
 communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-25 Thread Avery Ching (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13261906#comment-13261906
 ] 

Avery Ching commented on GIRAPH-185:


I agree that a benchmark should be done, although I expect the impact to be 
very small.  We should at least show it's not slower. =)

 Improve concurrency of putMsg / putMsgList
 --

 Key: GIRAPH-185
 URL: https://issues.apache.org/jira/browse/GIRAPH-185
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.2.0
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 0.2.0

 Attachments: GIRAPH-185.patch, GIRAPH-185.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently in putMsg / putMsgList, a synchronized closure is used to protect 
 the whole transientInMessages when adding the new message. This lock prevents 
 other concurrent calls to putMsg/putMsgList and increases the response time. 
 We should use fine-grain locks to allow high concurrency in message 
 communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-25 Thread Hyunsik Choi (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13262278#comment-13262278
 ] 

Hyunsik Choi commented on GIRAPH-185:
-

If there is a trade-off relationship between the performance and memory 
consumption, the memory consumption seems more important in the current giraph 
implementation. Also, I agree that some benchmarks are necessary.

 Improve concurrency of putMsg / putMsgList
 --

 Key: GIRAPH-185
 URL: https://issues.apache.org/jira/browse/GIRAPH-185
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.2.0
Reporter: Bo Wang
Assignee: Bo Wang
 Fix For: 0.2.0

 Attachments: GIRAPH-185.patch, GIRAPH-185.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently in putMsg / putMsgList, a synchronized closure is used to protect 
 the whole transientInMessages when adding the new message. This lock prevents 
 other concurrent calls to putMsg/putMsgList and increases the response time. 
 We should use fine-grain locks to allow high concurrency in message 
 communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira