[jira] [Commented] (GIRAPH-43) GiraphJob: Counter Limit is not set

2011-09-30 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-43?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13118635#comment-13118635
 ] 

Jakob Homan commented on GIRAPH-43:
---

Thinking about this some more, 120 is a pretty low limit, particularly since 
this new limitation was created after people were creating more than a million 
counters per job.  I'd probably suggest adding a note to the documentation that 
when using 203 or up to make sure the limit is up'ed to a reasonable value.  
I'll open a new JIRA for it.

 GiraphJob: Counter Limit is not set
 ---

 Key: GIRAPH-43
 URL: https://issues.apache.org/jira/browse/GIRAPH-43
 Project: Giraph
  Issue Type: Bug
  Components: graph
Affects Versions: 0.70.0
Reporter: Severin Corsten
Priority: Minor

 I have got a problem regarding Counters: 
 For testing and benchmarking I added a lot of timers bud now the task fails 
 with this Exception:
 org.apache.hadoop.mapred.Counters$CountersExceededException: Error: Exceeded 
 limits on number of counters - Counters=120 Limit=120
 I don't understand why this happens. Within GiraphJob the counter limit is 
 set to 512.
 In the cluster configuration there is no value specified. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-49) Create automated multi-build test

2011-10-07 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-49?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123307#comment-13123307
 ] 

Jakob Homan commented on GIRAPH-49:
---

If we can't get to it, I'm not sure it belongs in the Apache source, in that 
case.  I'm not sure how good of an idea it is to include code not intended to 
be used by the open Apache community...

 Create automated multi-build test
 -

 Key: GIRAPH-49
 URL: https://issues.apache.org/jira/browse/GIRAPH-49
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan

 At the moment, we're supporting three flavors of Hadoop: non-secure, secure 
 and fb's distro via a painful munging process.  As part of the patch 
 submission process, it would be nice if Hudson would test the patch against 
 all three of these automatically, verifying all the usual requirements 
 against each distro.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-52) There should be a scheme to limit the counter

2011-10-13 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127235#comment-13127235
 ] 

Jakob Homan commented on GIRAPH-52:
---

If you'd like to come up with a way around the limit, this would be the correct 
JIRA.  Perhaps just output statistics for the last n iterations? Or every m 
iteration, skipping enough to stay under the limit?  One could check for the 
presence of the conf value with reasonable certainty that if it's not there 
this behavior would be appropriate (this isn't full proof since the value could 
be set on the server but not the client, but this would work 99% of the time 
and an error message, if it doesn't, could suggest that possibility).

 There should be a scheme to limit the counter
 -

 Key: GIRAPH-52
 URL: https://issues.apache.org/jira/browse/GIRAPH-52
 Project: Giraph
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.70.0
Reporter: Zhiwei Gu
 Fix For: 0.70.0


 For hadoop version above 0.20.203.0., the cluster-wise configuration 
 mapreduce.job.counters.limit cannot be overrided, while the superstep 
 iterations is not deterministic, the job might run several hundreds or even 
 thousand of supersteps, it will always kill the job. This will limit the 
 usage of Giraph and is to bad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-52) There should be a scheme to limit the counter

2011-10-13 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-52?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13127239#comment-13127239
 ] 

Jakob Homan commented on GIRAPH-52:
---

above, last as in final, not previous.

 There should be a scheme to limit the counter
 -

 Key: GIRAPH-52
 URL: https://issues.apache.org/jira/browse/GIRAPH-52
 Project: Giraph
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.70.0
Reporter: Zhiwei Gu
 Fix For: 0.70.0


 For hadoop version above 0.20.203.0., the cluster-wise configuration 
 mapreduce.job.counters.limit cannot be overrided, while the superstep 
 iterations is not deterministic, the job might run several hundreds or even 
 thousand of supersteps, it will always kill the job. This will limit the 
 usage of Giraph and is to bad.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-56) Create a CSV TextOutputFormat

2011-10-25 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135553#comment-13135553
 ] 

Jakob Homan commented on GIRAPH-56:
---

I'm thinking along the lines of Arun's comment, as the equivalent output format 
for GIRAPH-62.

 Create a CSV TextOutputFormat
 -

 Key: GIRAPH-56
 URL: https://issues.apache.org/jira/browse/GIRAPH-56
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
  Labels: newbie

 Right now we've got an outputformat that spits out Base64-encoded text.  It 
 would be nice to one that just did regular text, for testing or small graphs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-13) Port Giraph to YARN

2011-10-25 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13135554#comment-13135554
 ] 

Jakob Homan commented on GIRAPH-13:
---

This is coming along, but there are a lot of pre-req issues that need to be 
done first.  I'll start creating issues and linking them here.  Grab any you'd 
like.  GIRAPH-37 and its related issues are definitely blockers for this effort.

 Port Giraph to YARN
 ---

 Key: GIRAPH-13
 URL: https://issues.apache.org/jira/browse/GIRAPH-13
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan
Assignee: Jakob Homan

 Now that YARN (aka MR2 aka MAPREDUCE-279) has been merged into the Hadoop 
 trunk, we should think about what it would take to separate out the graph 
 processing bits of Giraph from the MR1-specific code so as to take advantage 
 of the less-MR centric aspects of YARN, while still supporting both over the 
 medium term.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-66) Add presentations section to website

2011-10-26 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136505#comment-13136505
 ] 

Jakob Homan commented on GIRAPH-66:
---

No problem. Do you want to upload the HW to slideshare and link to that 
instead?  It'd probably be good to link to the video from the summit, though.

 Add presentations section to website
 

 Key: GIRAPH-66
 URL: https://issues.apache.org/jira/browse/GIRAPH-66
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan
 Attachments: 2011.10.14.hortonworks.pptx, GIRAPH-66.patch


 We should list presentations we and others are giving on Giraph on the 
 website, to provide a central place for newcomers to find resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-66) Add presentations section to website

2011-10-26 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136548#comment-13136548
 ] 

Jakob Homan commented on GIRAPH-66:
---

bq. Seems like a confluence wiki would be a better place.
My goal is to make it as easy as possible for people to stumble upon the 
website and come up to speed as quickly as possible.  In the longer term, a 
wiki is probably reasonable, although I find Apache wikis are usually not too 
well maintained.  

I'm thinking it may be reasonable to relax the RTC process for website updates, 
though.  There's a lot of ceremony required for quick changes.  In Hadoop, we 
don't require JIRAs for website changes.  What do others think?

 Add presentations section to website
 

 Key: GIRAPH-66
 URL: https://issues.apache.org/jira/browse/GIRAPH-66
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan
 Attachments: 2011.10.14.hortonworks.pptx, GIRAPH-66-2.patch, 
 GIRAPH-66.patch


 We should list presentations we and others are giving on Giraph on the 
 website, to provide a central place for newcomers to find resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-66) Add presentations section to website

2011-10-26 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13136588#comment-13136588
 ] 

Jakob Homan commented on GIRAPH-66:
---

bq. We don't need to make a release to make the site changes go out, do we? 
These things are decoupled?
Not that I know of.  One of the first things incubator projects do is to set up 
a website, so I assume not.

 Add presentations section to website
 

 Key: GIRAPH-66
 URL: https://issues.apache.org/jira/browse/GIRAPH-66
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan
 Attachments: 2011.10.14.hortonworks.pptx, GIRAPH-66-2.patch, 
 GIRAPH-66.patch


 We should list presentations we and others are giving on Giraph on the 
 website, to provide a central place for newcomers to find resources.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

2011-11-02 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142573#comment-13142573
 ] 

Jakob Homan commented on GIRAPH-36:
---

Great work, Jake. One thing I notice while reading through Vertex is muddling 
of the state the Vertex is responsible for on a per-application basis and the 
state Giraph manages for the vertex.  I think we may being ill-served by 
inheritance here and should instead rely on composition to hold this state.  
I'm thinking that messages in/out, edge state and mutation, facilities for 
sending messages, current superstep, etc. Would it be better in the long term 
to move these out to a context object (ala MR)?  This would simplify Vertex 
significantly, make it much easier to test (by mocking out the dependency) and 
future proof the evolution of Vertex as there will be fewer moving parts to 
keep track of.

Does changing compute and {pre|post}{Superstep|Application} took their external 
state as parameters with sound like a good approach to explore?

 Ensure that subclassing BasicVertex is possible by user apps
 

 Key: GIRAPH-36
 URL: https://issues.apache.org/jira/browse/GIRAPH-36
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
Priority: Blocker
 Fix For: 0.70.0

 Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff, 
 GIRAPH-36.diff.warnings


 Original assumptions in Giraph were that all users would subclass Vertex 
 (which extended MutableVertex extended BasicVertex).  Classes which wish to 
 have application specific data structures (ie. not a TreeMapI, EdgeI,E) 
 may need to extend either MutableVertex or BasicVertex.  Unfortunately 
 VertexRange extends ArrayListVertex, and there are other places where the 
 assumption is that vertex classes are either Vertex, or at least 
 MutableVertex.
 Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-64) Create VertexRunner to make it easier to run users' computations

2011-11-02 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142608#comment-13142608
 ] 

Jakob Homan commented on GIRAPH-64:
---

Not that I know of.  I'm allergic to rb, but I've never seen one.  In Hive land 
we have to manually create an instance for each issue.  I just tried to create 
one for this and, while Giraph exists as a target, I get a 500 exception 
(something broke! - gee, that's helpful).

 Create VertexRunner to make it easier to run users' computations
 

 Key: GIRAPH-64
 URL: https://issues.apache.org/jira/browse/GIRAPH-64
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan
Assignee: Jakob Homan
 Attachments: GIRAPH-64.patch


 Currently, if a user wants to implement a Giraph algorithm by extending 
 {{Vertex}} they must also write all the boilerplate around the {{Tool}} 
 interface and bundle it with the Giraph jar (or get Giraph on the classpath 
 and playing nice with the implementation).  For example, what is included in 
 the PageRankBenchmark and what Kohei has done: 
 https://github.com/smly/java-Giraph-LabelPropagation  It would be better if 
 we had perhaps a Vertex implementation to be subclassed that already had all 
 the standard Tooling included such that all one had to run would be (assuming 
 the Giraph jar was already on the classpath):
 {noformat}hadoop jar my-awesome-vertex.jar my.awesome.vertex -i jazz_input -o 
 jazz_output -if org.apache.giraph.lib.in.text.adjacency-list.LongDoubleDouble 
 -of org.apache.giraph.lib.out.text.adjacency-list.LongDoubleDouble{noformat} 
 This wouldn't work with every algorithm, but would be useful in a large 
 number of cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-36) Ensure that subclassing BasicVertex is possible by user apps

2011-11-02 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13142615#comment-13142615
 ] 

Jakob Homan commented on GIRAPH-36:
---

bq. Can GIRAPH-47 be a start for what you have in mind?
Yeah, that's close to what I'm hoping for.  I'll comment there. 

bq. I can certainly imagine this being very nice and clean.
Inheritance is going to be brittle over the long term, so the cleaner we can 
make this now, the easier it will be to attract a user base confident in 
investing in the new platform.  This is particularly true given how much we're 
at the mercy of Java's poor generic system; users already have to grok things 
like DoubleDoubleFloat in type names.

 Ensure that subclassing BasicVertex is possible by user apps
 

 Key: GIRAPH-36
 URL: https://issues.apache.org/jira/browse/GIRAPH-36
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Jake Mannix
Assignee: Jake Mannix
Priority: Blocker
 Fix For: 0.70.0

 Attachments: GIRAPH-36.diff, GIRAPH-36.diff, GIRAPH-36.diff, 
 GIRAPH-36.diff.warnings


 Original assumptions in Giraph were that all users would subclass Vertex 
 (which extended MutableVertex extended BasicVertex).  Classes which wish to 
 have application specific data structures (ie. not a TreeMapI, EdgeI,E) 
 may need to extend either MutableVertex or BasicVertex.  Unfortunately 
 VertexRange extends ArrayListVertex, and there are other places where the 
 assumption is that vertex classes are either Vertex, or at least 
 MutableVertex.
 Let's make sure the internal APIs allow for BasicVertex to be the base class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-47) Export Worker's Context/State to vertices through pre/post/Application/Superstep

2011-11-06 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13145133#comment-13145133
 ] 

Jakob Homan commented on GIRAPH-47:
---

This looks good but I have some comments when not typing on glass. Avery - did 
you do a git clean before applying? Sounds like you still have the new files 
from 64 in the tree.

 Export Worker's Context/State to vertices through 
 pre/post/Application/Superstep
 

 Key: GIRAPH-47
 URL: https://issues.apache.org/jira/browse/GIRAPH-47
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
 Attachments: GIRAPH-47.diff, GIRAPH-47.diff


 It would be quite useful for vertices to reach some worker-related 
 information stored i.e. in the GraphState class.
 This information could be exported as a parameter to 
 pre/post/Application/Superstep like this:
 public void preApplication(Configurable workerObject);
 public void postApplication(Configurable workerObject);
 public void preSuperstep(Configurable workerObject);
 public void postSuperstep(Configurable workerObject);
 public Configurable getWorkerObject();
 Another possibility is to add a Context inner class to BasicVertex to store 
 this information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-47) Export Worker's Context/State to vertices through pre/post/Application/Superstep

2011-11-08 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13146340#comment-13146340
 ] 

Jakob Homan commented on GIRAPH-47:
---

Avery, can attach the final diff that you committed to the tree, for 
bookkeeping purposes?

 Export Worker's Context/State to vertices through 
 pre/post/Application/Superstep
 

 Key: GIRAPH-47
 URL: https://issues.apache.org/jira/browse/GIRAPH-47
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
 Attachments: GIRAPH-47.diff, GIRAPH-47.diff, GIRAPH-47.diff


 It would be quite useful for vertices to reach some worker-related 
 information stored i.e. in the GraphState class.
 This information could be exported as a parameter to 
 pre/post/Application/Superstep like this:
 public void preApplication(Configurable workerObject);
 public void postApplication(Configurable workerObject);
 public void preSuperstep(Configurable workerObject);
 public void postSuperstep(Configurable workerObject);
 public Configurable getWorkerObject();
 Another possibility is to add a Context inner class to BasicVertex to store 
 this information.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-63) Typo in PageRankBenchmark

2011-11-11 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13148592#comment-13148592
 ] 

Jakob Homan commented on GIRAPH-63:
---

Ah, thanks.  Actually, for some reason, JIRA hides this option with its latest 
upgrade. On More Actions there's an Attach File option that will upload the 
patch file itself.  Then we can download this and apply it to the tree.  Looks 
good though, from what was pasted in!

 Typo in PageRankBenchmark
 -

 Key: GIRAPH-63
 URL: https://issues.apache.org/jira/browse/GIRAPH-63
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Assignee: Shaunak Kashyap
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-63.diff


 {code:title=PageRankBenchmark.java|borderStyle=solid}
 if (!cmd.hasOption('s')) {
   System.out.println(Need to set the number of supesteps (-s));
   return -1;
 }{code}
 supesteps - supersteps

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-63) Typo in PageRankBenchmark

2011-11-11 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-63?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13148598#comment-13148598
 ] 

Jakob Homan commented on GIRAPH-63:
---

Perfect, thanks! +1 on the patch.  Will commit shortly.

 Typo in PageRankBenchmark
 -

 Key: GIRAPH-63
 URL: https://issues.apache.org/jira/browse/GIRAPH-63
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Assignee: Shaunak Kashyap
Priority: Trivial
  Labels: newbie
 Attachments: GIRAPH-63.diff


 {code:title=PageRankBenchmark.java|borderStyle=solid}
 if (!cmd.hasOption('s')) {
   System.out.println(Need to set the number of supesteps (-s));
   return -1;
 }{code}
 supesteps - supersteps

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-75) Create sections on how to get involved and how to generate patches on website

2011-11-13 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-75?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149431#comment-13149431
 ] 

Jakob Homan commented on GIRAPH-75:
---

bq. How about making a separate page for these sections?
Sounds good. I'll comment on GIRAPH-79.

 Create sections on how to get involved and how to generate patches on website
 -

 Key: GIRAPH-75
 URL: https://issues.apache.org/jira/browse/GIRAPH-75
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.70.0

 Attachments: GIRAPH-75-2.patch, GIRAPH-75.patch


 We've had several questions lately on how to get started. It would be good to 
 document this on the site.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-51) Provide unit testing tool for Giraph algorithms

2011-11-14 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13149805#comment-13149805
 ] 

Jakob Homan commented on GIRAPH-51:
---

Looks great.  A few comments:
* It may make sense to move InternalVertexRunner to the src tree rather than 
test tree, since it's a user-facing class rather than something for Giraph's 
internal testing.  I can imagine us generating a separate test jar soon and 
we'd want this class in the regular jar we ship to end users.
* Is it necessary to specify the input and output formats and to write data out 
to the file system? In general a vertex implementation should be able to work 
with reasonable vertices from any input source (part of the GIRAPH-64 work). If 
the internal vertex runner just fed the values into the compute method we'd 
save file io and coupling of specific formats.
* Can you add javadoc for the public methods?
* It looks like the ZooKeeper exceptions are probably race conditions.  I see 
similar ones during regular test execution.  It would be nice to remove the 
need for ZooKeeper on these types of tests: if one is spinning up ZK, it's not 
really a unit test any more, and it should be possible to test vertex 
implementations without it.  One should be able to just feed input state 
(vertices, edges, superstep #, etc.) and verify the output state without every 
actually spinning up any of the distributed infrastructure.  But that's 
probably best done in another JIRA.  I don't think the ZK exceptions are 
something to be concerned about.

 Provide unit testing tool for Giraph algorithms
 ---

 Key: GIRAPH-51
 URL: https://issues.apache.org/jira/browse/GIRAPH-51
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
 Attachments: GIRAPH-51.patch


 It would be nice to have a little tool, similar to MRUnit, that would allow 
 Giraph application writers to quickly unit test their algorithms.  The tool 
 could take a Vertex implementation, a set of input and expected output and 
 verify that after the specified number of supersteps, we've gotten what we 
 expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-11-14 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13150095#comment-13150095
 ] 

Jakob Homan commented on GIRAPH-11:
---

I think this is ready to go.  Avery, just out of curiosity, beyond the MR 
unittests, have you run any test vertices on this?  

 Improve the graph distribution of Giraph
 

 Key: GIRAPH-11
 URL: https://issues.apache.org/jira/browse/GIRAPH-11
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-11.2.diff, GIRAPH-11.3.diff, GIRAPH-11.4.diff, 
 GIRAPH-11.diff


 Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
  If the user data is not sorted by the vertex id, they must first run a 
 MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
 inconvenient.
 Giraph graph partitioning is currently range based and there are some 
 advantages and disadvantages of this approach.  The proposal of this JIRA 
 would be to allow for both range and hash based partitioning and provide more 
 flexibility to the user.
 Design goals for the graph distribution:
 * Allow vertices to be unordered or unordered
 * Ability to repartition
 * Select the partitioning scheme based on user needs (i.e. hash or range 
 based)
 * Ability to provide user-specific hints about partitions
 Hash-based partitioning
 * Good vertex balancing across ranges for random data
 * Bad at vertex id locality
 Range-based partitioning
 * Good at vertex id locality
 * Ability to split ranges easily
 * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-11) Improve the graph distribution of Giraph

2011-11-14 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-11?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13150099#comment-13150099
 ] 

Jakob Homan commented on GIRAPH-11:
---

+1

 Improve the graph distribution of Giraph
 

 Key: GIRAPH-11
 URL: https://issues.apache.org/jira/browse/GIRAPH-11
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-11.2.diff, GIRAPH-11.3.diff, GIRAPH-11.4.diff, 
 GIRAPH-11.diff


 Currently, Giraph assumes that the data from the VertexInputFormat is sorted. 
  If the user data is not sorted by the vertex id, they must first run a 
 MapReduce or Pig job to generate a sorted dataset.  This is often a bit 
 inconvenient.
 Giraph graph partitioning is currently range based and there are some 
 advantages and disadvantages of this approach.  The proposal of this JIRA 
 would be to allow for both range and hash based partitioning and provide more 
 flexibility to the user.
 Design goals for the graph distribution:
 * Allow vertices to be unordered or unordered
 * Ability to repartition
 * Select the partitioning scheme based on user needs (i.e. hash or range 
 based)
 * Ability to provide user-specific hints about partitions
 Hash-based partitioning
 * Good vertex balancing across ranges for random data
 * Bad at vertex id locality
 Range-based partitioning
 * Good at vertex id locality
 * Ability to split ranges easily
 * Can cause hotspots for hot ranges

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-87) Simplify boolean expression in BspService::checkpointFrequencyMet

2011-11-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13150679#comment-13150679
 ] 

Jakob Homan commented on GIRAPH-87:
---

correct.  This optimization would be to keep the first if statement and then 
return the result of evaluating the else if argument directly ({{return 
(((superstep - ...}})

 Simplify boolean expression in BspService::checkpointFrequencyMet
 -

 Key: GIRAPH-87
 URL: https://issues.apache.org/jira/browse/GIRAPH-87
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
  Labels: newbie

 {noformat}if (superstep  firstCheckpoint) {
 return false;
 } else if (((superstep - firstCheckpoint) % checkpointFrequency) == 
 0) {
 return true;
 } else {
 return false;
 }{noformat}
 can be simplified to just return the result of the else if evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2011-11-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13150683#comment-13150683
 ] 

Jakob Homan commented on GIRAPH-85:
---

For this newbie issue the idea is just to simplify the repeated pattern 
{noformat}CommunicationsInterface proxy = code to get proxy;
return proxy;{noformat}
to the equivalent
{noformat}return code to get proxy;{noformat}, removing the temporary 
variable.

 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
  Labels: newbie

 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13150732#comment-13150732
 ] 

Jakob Homan commented on GIRAPH-45:
---

Yeah, this would involve changing the save all the messages up and send them 
at the end of the superstep model to a more streaming model for outgoing 
messages, which I think may provide more throughput overall.  If we send 
messages as they are generated (or in bunches intermittently), we'll decrease 
the time spent shuffling them at the end of step.  This is important as the 
shuffle is a peer-to-peer process with every worker potentially sending 
something to every other worker and, in a chatty computation, may put pressure 
on the network.

I'm skeptical how often combiners will be both possible and actually 
implemented.  If a combiner is not defined, is there any benefit at all to not 
immediately sending out the non-local messages? One approach is when memory 
starts getting tight (or on a regular schedule), run the combiner and then ship 
out non-local-bound messages to their workers.  Similarly, the receiving worker 
can run its combiner on the messages as they come in, spilling when necessary.  
It will probably be quite expensive to un-spill to disk to re-run the combiner, 
assuming the combiner managed to have an effect.

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13150755#comment-13150755
 ] 

Jakob Homan commented on GIRAPH-45:
---

@Jake
I'm suggesting that messages should only be spilled to disk on their target 
local disk.  Spilling to disk would still be the main way to relieve memory 
pressure (along with combiners, of course).  The question is should you be 
responsible to spilling messages that you won't be processing in the next 
superstep? If so, we're spending more time writing and reading the messages on 
the originator (and then, possibly, writing and reading again on the 
destination worker).  
bq. But now this is a much more complex computation model: to send out 
messages, you typically need to know what you've recieved in the previous step.
I'm afraid I don't follow.  This is still consistent with the BSP model. In a 
compute iteration, I can send a message to Z and it can either be held until 
everyone else is done computing, or immediately sent to the worker responsible 
for Z.  Either way it gets stored in a container until the next superstep 
starts.  No actual processing is done on the message itself.

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13150757#comment-13150757
 ] 

Jakob Homan commented on GIRAPH-45:
---

bq. I'm not sure whether sending messages in a streamy way would actually 
diminish any kind of memory pressure.
Right. As above, the total amount of bytes that would have to be stored 
*somewhere* would be the same (absent any advantage we get from having more 
combine-able messages in one place to be combined).  We'd be potentially saving 
(very slow) disk bandwidth by putting them on the network immediately where, 
depending on the destination's memory situation, it may or may not make sense 
to spill to disk, to combine, etc.

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13150847#comment-13150847
 ] 

Jakob Homan commented on GIRAPH-45:
---

yes, it does. I'm trying to learn from the mistakes of the past, not repeat 
them. :)

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-88) Message count not updated properly after GIRAPH-11

2011-11-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-88?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13150858#comment-13150858
 ] 

Jakob Homan commented on GIRAPH-88:
---

+1

 Message count not updated properly after GIRAPH-11
 --

 Key: GIRAPH-88
 URL: https://issues.apache.org/jira/browse/GIRAPH-88
 Project: Giraph
  Issue Type: Bug
Reporter: Avery Ching
 Attachments: GIRAPH-88.diff


 Email from s...@apache.org
 Hi,
 I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to
 continue to work on GIRAPH-51 where I use a small toy graph to test
 SimpleShortestPathVertex.
 Unfortunately my code did not work anymore and I guess I tracked it down
 to the fact that vertex that voted to halt are not reacted anymore when
 new messages arrive.
 In SimpleShortestPathVertex every vertex always votes to halt and only
 gets reactivated when a shorter path to it has been found. However my
 test run always finished after superstep 0.
 I don't know too much about Giraph's internals yet, but my guess is that
 the number of sent messages is not tracked correctly anymore. Therefore
 giraph finishes the algorithm (as all vertices voted to halt) although
 there should still be messages in the pipeline.
 I think I tracked it down to this behavior:
 GraphMapper declares a variable workerSentMessages = 0 and never
 increases it. This variable is given to
 BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses
 it to compute the GlobalStats afterwards, which are used to decide
 whether a new superstep has to be scheduled. As it has never been
 increased, the algorithm will always stop when all vertices voted to halt.
 It would be great if someone could confirm/disprove this speculation and
 help me to continue work on GIRAPH-51

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-91?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151600#comment-13151600
 ] 

Jakob Homan commented on GIRAPH-91:
---

can you attach the patch to the jira, for non-rb review?

 Large-memory improvements (Memory reduced vertex implementation, fast 
 failure, added settings) 
 ---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching

 Current vertex implementation uses a HashMap for storing the edges, which is 
 quite memory heavy for large graphs.  The default settings in Giraph need to 
 be improved for large graphs and heaps of 20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-93) Hive input / output format

2011-11-16 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151805#comment-13151805
 ] 

Jakob Homan commented on GIRAPH-93:
---

Do you mean RCFile specifically?  Hive can handle data in any format there's a 
serde for.  I've been meaning to open a jira for handling Avro-encoded data as 
well (and possibly specifying a graph schema for it).  For directly loading 
tables in/out of Hive, it may be better to target HCatalog, as that will also 
give access to Pig (and whatever else HCatalog eventually supports)

 Hive input / output format
 --

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching

 It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-76) Refactor worker logic from GraphMapper

2011-11-16 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-76?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13151860#comment-13151860
 ] 

Jakob Homan commented on GIRAPH-76:
---

I've not.  Please go for it.

 Refactor worker logic from GraphMapper
 --

 Key: GIRAPH-76
 URL: https://issues.apache.org/jira/browse/GIRAPH-76
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Jakob Homan

 The plumbing around executing vertices is hosted within the mapper, but could 
 be extracted to its own class and executed from the Mapper directly.  This 
 would ease testing and make it easier to host in the new YARN infrastructure. 
  There's nothing mapper specific about this code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-86) Simplify boolean expressions in ZooKeeperExt::createExt

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152328#comment-13152328
 ] 

Jakob Homan commented on GIRAPH-86:
---

+1.  I've committed this.  Not sure why you ended up with a reference to Hadoop 
0.21.  That's an unstable, unsupported version.  I can't find a reference to it 
in our pom.  In terms of the rat check, rather than making rat ignore the file 
we should really just fix GIRAPH-20, which is triggering this.  Thanks for the 
contribution, Attila!

 Simplify boolean expressions in ZooKeeperExt::createExt
 ---

 Key: GIRAPH-86
 URL: https://issues.apache.org/jira/browse/GIRAPH-86
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Attila Csordas
  Labels: newbie
 Fix For: 0.70.0

 Attachments: GIRAPH-86.patch, pom.diff


 In ZooKeeperExt::createExt there are two instances of {{recursive==false}} 
 that can be simplified to !recursive.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152348#comment-13152348
 ] 

Jakob Homan commented on GIRAPH-77:
---

I've got a bit of code, but enough to block progress.  I can see this as 
separate work from 76. I was looking at starting up a webserver on the 
coordinator to track all the info it's normally difficult to get during job 
execution. If this comes up as part of 76, go for it. 

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152356#comment-13152356
 ] 

Jakob Homan commented on GIRAPH-83:
---

bq. I think it should probably actually be called Vertex, because everything 
is a BasicVertex currently, so it makes sense instead to say everything is 
a Vertex.
Absolutely agreed.

bq. We can factor off lots of stuff into other classes, but the question comes 
down to how does the user writing their algorithm get access to them? How is it 
all wired together? You want compute() to get passed some state that you have 
right when you need it, instead of either going with inheritance or 
composition? That could be nice, I think, as long as we package it all up into 
a minimal set of *Context-like objects to carry around.
Correct, this is what I'm getting at.

bq. In what way are the out edges of a vertex managed by the framework 
currently?
In that Vertex is responsible for maintaining the destEdgeMap for an 
implementation of Vertex, rather than implementers having to do this 
themselves.  For each compute invocation, the vertex shouldn't assume anything 
about its outgoing edges, as they may have been mutated since the last call.

 Is Vertex correct yet?
 --

 Key: GIRAPH-83
 URL: https://issues.apache.org/jira/browse/GIRAPH-83
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan

 I'm seeing a number of people run into oddities with Vertex and am thinking 
 we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152360#comment-13152360
 ] 

Jakob Homan commented on GIRAPH-45:
---

On a side note, is it worth considering messages to be immutable (or provide a 
separate annotation for these)? This would help with message de-duplication, 
which could be a significant help in some algorithms.  One would only need to 
keep one copy of the message going to a particular worker, regardless of the 
number of vertices it is bound for.

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-83) Is Vertex correct yet?

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152383#comment-13152383
 ] 

Jakob Homan commented on GIRAPH-83:
---

I'm saying we should be responsible for maintaining it (since we have to mutate 
it), but that _maybe_ it shouldn't be in Vertex itself, just to have a cleaner 
delineation. But Avery makes a good point and I'm not completely sold on this 
aspect myself.   How many different memory efficient implementations of Vertex 
can we expect to have?

 Is Vertex correct yet?
 --

 Key: GIRAPH-83
 URL: https://issues.apache.org/jira/browse/GIRAPH-83
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan

 I'm seeing a number of people run into oddities with Vertex and am thinking 
 we may not have it quite correct yet...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2011-11-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152386#comment-13152386
 ] 

Jakob Homan commented on GIRAPH-77:
---

s/enough/not enough/g. oy.

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-98) Add Claudio Martella to site

2011-11-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-98?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13152967#comment-13152967
 ] 

Jakob Homan commented on GIRAPH-98:
---

Cool. Actually, we don't need to do the instructions from 35 anymore. Check the 
wiki for the latest way to update the site: 
https://cwiki.apache.org/confluence/display/GIRAPH/Committer+notes

 Add Claudio Martella to site
 

 Key: GIRAPH-98
 URL: https://issues.apache.org/jira/browse/GIRAPH-98
 Project: Giraph
  Issue Type: Task
Reporter: Claudio Martella
 Attachments: GIRAPH-98.diff




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-84) Simplify boolean expressions in BspRecordReader

2011-11-20 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153951#comment-13153951
 ] 

Jakob Homan commented on GIRAPH-84:
---

The patch *must* be attached to the JIRA.  We need the little icon that says 
the contributor has given it to Apache.  Reviewboard is optional; the patch 
should always be attached to the JIRA first.

 Simplify boolean expressions in BspRecordReader
 ---

 Key: GIRAPH-84
 URL: https://issues.apache.org/jira/browse/GIRAPH-84
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Shaunak Kashyap
  Labels: newbie

 Twice in BspRecordReader boolean expressions are evaluated with == and can be 
 simplified to just one liners or variable evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-99) Make AdjacencyListVertexReader and its constructor public

2011-11-22 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155234#comment-13155234
 ] 

Jakob Homan commented on GIRAPH-99:
---

+1. Looks good.

 Make AdjacencyListVertexReader and its constructor public
 -

 Key: GIRAPH-99
 URL: https://issues.apache.org/jira/browse/GIRAPH-99
 Project: Giraph
  Issue Type: Wish
  Components: lib
Reporter: Kohei Ozaki
Priority: Minor
  Labels: patch
 Attachments: GIRAPH-99.diff


 Hi,
 I'd like to write a class inherited from AdjacencyListVertexReader
 to make a library using Giraph (like git.io/ALVR),
 but AdjacencyListVertexReader is a private class and its constructor are 
 private.
 I guess making it public is useful to handle a more complex input format
 specified by the data structure of algorithms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-28 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158734#comment-13158734
 ] 

Jakob Homan commented on GIRAPH-45:
---

It's certainly an intriguing idea to go with something like leveldb. This is 
obviously an area for lots of exploration and experimentation. As such, 
probably the best idea is to make the interface pluggable and keep a 
good-enough Java version as default.  It's probably time for a giraph-site.xml 
file to track these configuration possibilities.

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-51) Provide unit testing tool for Giraph algorithms

2011-11-28 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158743#comment-13158743
 ] 

Jakob Homan commented on GIRAPH-51:
---

+1.

 Provide unit testing tool for Giraph algorithms
 ---

 Key: GIRAPH-51
 URL: https://issues.apache.org/jira/browse/GIRAPH-51
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Sebastian Schelter
 Attachments: GIRAPH-51-2.patch, GIRAPH-51-3.patch, GIRAPH-51.patch


 It would be nice to have a little tool, similar to MRUnit, that would allow 
 Giraph application writers to quickly unit test their algorithms.  The tool 
 could take a Vertex implementation, a set of input and expected output and 
 verify that after the specified number of supersteps, we've gotten what we 
 expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements

2011-12-01 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161213#comment-13161213
 ] 

Jakob Homan commented on GIRAPH-100:


Please avoid formatting changes as part of code change patches.  They blow up 
the size of the patch and introduce a lot of what's the difference between 
these lines? Did anything change that needs to be reviewed?  For instance, 
most of the changes in SimpleCheckpointVertex appear to be spurious.

* What's the point of the changes in TextVertexInputFormat method visibility? 
Are they related to this patch?
* We're throwing a lot of Stringly typed exceptions. For more robust error 
handling and recovery, it may be good to strongly type these instead.
* re: SuperstepHashPartitionerFactory. Moving it out of test and into the 
example directory seems a bit counterproductive to me.  It's a pathological 
implementation; wouldn't it be better to provide a more useful example, rather 
than one that's explicitly not meant to be used?





 Data input sampling and testing improvements
 

 Key: GIRAPH-100
 URL: https://issues.apache.org/jira/browse/GIRAPH-100
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-100.patch


 It would be really nice to help debug an application by limiting the input 
 data (% of input splits, max vertices per input split).  Also, it would be 
 nice for the workers to provide a little more debugging info on how far along 
 they are with processing the input data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements

2011-12-01 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161236#comment-13161236
 ] 

Jakob Homan commented on GIRAPH-100:


bq. Which exceptions are you referring to?
{noformat}+throw new IllegalStateException(
+prepareSuperstep: Impossible that this worker  +
+service.getWorkerInfo() +  was sent  +
+entry.getValue().size() +  message(s) with  +
+vertex id  + entry.getKey() +
+ when it does not own this partition.  It should  +
+have gone to partition owner  +
+service.getVertexPartitionOwner(entry.getKey()) +
+.  The partition owners are  +
+service.getPartitionOwners());{noformat}
{noformat}+throw new IllegalStateException(
+prepareSuperstep: Impossible to not remove  +
+vertex);{noformat}
{noformat}+throw new IllegalStateException(
+coordinateSuperstep: Worker failed during input split  +
+(currently not supported));{noformat}
{noformat}+throw new IllegalStateException(
+barrierOnWorkerList: KeeperException -  +
+Couldn't get  + workerInfoHealthyPath, e);{noformat}
{noformat}+throw new IllegalStateException(
+loadVertices: KeeperException on  +
+inputSplitFinishedPath, e);{noformat}
etc. These are all specific types of exceptions being wrapped in 
IllegalStateException.  We'll likely want to catch and handle them later in an 
effort to be more robust. It'd be better if these existed as their own types, 
so we don't have to try to tease out the details later.
bq. Can we do that in another issue? I agree that it isn't a good example, but 
it's still a good test since it guarantees partition movement between workers.
I have trouble putting something that we agree is a bad example into the 
example directory. The issue is that it's not actually a unit test, since it 
involves Hadoop.  That makes it an integration test.  The best answer is to 
have integration tests in their own directory (and either bundled with the main 
jar or a separate integration test directory).  Since this verifies important 
behavior, the basic test itself should run without Hadoop, via mocking, and the 
ability to run it as an integration test under a real Hadoop maintained.

 Data input sampling and testing improvements
 

 Key: GIRAPH-100
 URL: https://issues.apache.org/jira/browse/GIRAPH-100
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-100.patch


 It would be really nice to help debug an application by limiting the input 
 data (% of input splits, max vertices per input split).  Also, it would be 
 nice for the workers to provide a little more debugging info on how far along 
 they are with processing the input data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements

2011-12-01 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161264#comment-13161264
 ] 

Jakob Homan commented on GIRAPH-100:


bq. In the future, I'll file separate issues for formatting cleanup.
Great. This also gives us a steady supply of newbie JIRAs, since the latest 
batch is almost used up.

bq.We should file another JIRA to create a GiraphException and the various 
types I suppose. Or do you want me to do it in this JIRA?
Either in this JIRA, or put the current ones in with FIXME/TODO annotations so 
we can back and fix them easily, and then immediately open a new JIRA.

bq. but not sure how mocking can verify the behavior in this case.
It may end up being a challenge, but it's a strong guard against building up a 
huge number of integration tests, calling them unit tests and then having tests 
that run for four, six or nine hours (see: every other Hadoop ecosystem 
project).  Being able to swap out the backing dependency from a mock to a real 
Hadoop cluster is a great way to test quickly (ie, often) as well as test 
reality (ie, against a real cluster).  I'll take a look at making sure we have 
infrastructure that is amenable to this.

bq. we should file a separate JIRA for separating the tests into unittests 
(mocking, individual class tests) and integration tests, but integration tests 
can still be run locally and/or remote.
Can we go ahead and create test/integration as part of this JIRA and put 
SuperstepHashPartitionerFactory there? That way it doesn't go into the 
inappropriate examples directory but can still be bundled as part of the jar.  
The remaining partitioning can probably be done as part of GIRAPH-22.


 Data input sampling and testing improvements
 

 Key: GIRAPH-100
 URL: https://issues.apache.org/jira/browse/GIRAPH-100
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-100.patch


 It would be really nice to help debug an application by limiting the input 
 data (% of input splits, max vertices per input split).  Also, it would be 
 nice for the workers to provide a little more debugging info on how far along 
 they are with processing the input data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-57) Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together

2011-12-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170605#comment-13170605
 ] 

Jakob Homan commented on GIRAPH-57:
---

Can we post the final patch, along with the I give this to Apache button?

 Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together
 

 Key: GIRAPH-57
 URL: https://issues.apache.org/jira/browse/GIRAPH-57
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Avery Ching
 Attachments: GIRAPH-57.diff


 Right now messages are sent to a vertex one at a time.  It would be good to 
 have a putMsgs call that could send messages to multiple vertices (all hosted 
 on the same worker).  We'd save a huge number of individual RPC calls at the 
 expense of having smaller calls with larger payloads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

2011-12-20 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173537#comment-13173537
 ] 

Jakob Homan commented on GIRAPH-111:


bq. I'm not clear on why this is necessary.
I agree.  Hadoop's file formats, etc. are designed to be exceedingly forgiving 
and flexible as to the underlying storage mechanism.  Can you point to where 
they're lacking for Mesos' case?

bq. We could also copy out the relevant Hadoop I/O classes (InputFormat, 
OutputFormat, etc) into Giraph, rename their packages, and begin reworking them 
in an appropriate way to better suit Giraph.
-1.  Therein lies madness.


 Refactor I/O to be independent of Map/Reduce
 

 Key: GIRAPH-111
 URL: https://issues.apache.org/jira/browse/GIRAPH-111
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Ed Kohlwey

 The I/O mechanisms should probably be abstracted entirely from Map/Reduce in 
 order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-110) Add guide to setup the enviroment for running the unit tests in a pseudo-distributed hadoop instance

2011-12-20 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173631#comment-13173631
 ] 

Jakob Homan commented on GIRAPH-110:


Sorry to be late on this one, but I'd been meaning to ask if we should retire 
most of the README content in favor of the site documentation?  The content 
between the two was originally duplicated and is starting to drift...

 Add guide to setup the enviroment for running the unit tests in a 
 pseudo-distributed hadoop instance
 

 Key: GIRAPH-110
 URL: https://issues.apache.org/jira/browse/GIRAPH-110
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
 Fix For: 0.70.0

 Attachments: GIRAPH-110.2.patch, GIRAPH-110.patch


 Giraph should provide a small guide for setting up the local environment to 
 run the unit tests in a pseudo-distributed hadoop instance as there are some 
 non-obvious hurdles to take.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-113) Change cast to Vertex used in prepareSuperstep() to BasicVertex

2011-12-20 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173921#comment-13173921
 ] 

Jakob Homan commented on GIRAPH-113:


+1 (grumbling about GIRAPH-83)

 Change cast to Vertex used in prepareSuperstep() to BasicVertex
 ---

 Key: GIRAPH-113
 URL: https://issues.apache.org/jira/browse/GIRAPH-113
 Project: Giraph
  Issue Type: Bug
Reporter: Yuanyuan Tian
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-113.patch


 Hi,
 I decided to use LongDoubleFloatDoubleVertex in a graph algorithm because it 
 uses more compact and efficient mahout collections. However I run into an 
 error when running the algorithm:
 java.lang.ClassCastException: 
 org.apache.giraph.graph.LongDoubleFloatDoubleVertex cannot be cast to 
 org.apache.giraph.graph.Vertex
 at 
 org.apache.giraph.comm.BasicRPCCommunications.prepareSuperstep(BasicRPCCommunications.java:1016)
 at 
 org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:843)
 at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:569)
 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:728)
 ... 7 more
 Basically, the problem is that in BasicRPCCommunications.prepareSuperStep(), 
 the LongDoubleFloatDoubleVertex are cast to Vertex in the following code 
 fragment. But LongDoubleFloatDoubleVertex inherits from BasicVertex instead 
 of Vertex.
 if (vertex != null) {
((MutableVertexI, V, E, M) vertex).setVertexId(vertexIndex);
partition.putVertex((VertexI, V, E, M) vertex);
 } else if (originalVertex != null) {
   partition.removeVertex(originalVertex.getVertexId());
 }
 I did a simple change: cast LongDoubleFloatDoubleVertex to BasicVertex. The 
 problem went away, and the algorithm finished without any error. But I am not 
 sure this change has any implication to other parts of the code. So, I hope 
 to get some comments from the Giraph developers.
 if (vertex != null) {
((MutableVertexI, V, E, M) vertex).setVertexId(vertexIndex);
partition.putVertex((BasicVertexI, V, E, M) vertex);
 } else if (originalVertex != null) {
   partition.removeVertex(originalVertex.getVertexId());
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-119) VertexCombiner should work on IterableM instead of ListM

2012-01-06 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181522#comment-13181522
 ] 

Jakob Homan commented on GIRAPH-119:


+1.  The change log isn't usually modified as part of the patch, but as part of 
the commit, although I don't see a reason it would hurt, except perhaps 
conflicts in the file? 

 VertexCombiner should work on IterableM instead of ListM
 

 Key: GIRAPH-119
 URL: https://issues.apache.org/jira/browse/GIRAPH-119
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
 Attachments: GIRAPH-119.diff


 Currently VertexCombiner expects a ListM. It should be refactored to 
 IterableM to sync with Iterable-based BasicVertex messages logics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-122) Roll version back to 0.1

2012-01-09 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182721#comment-13182721
 ] 

Jakob Homan commented on GIRAPH-122:


yep, looks like I could.  

 Roll version back to 0.1
 

 Key: GIRAPH-122
 URL: https://issues.apache.org/jira/browse/GIRAPH-122
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.1.0

 Attachments: GIRAPH-122.patch


 Per the vote on the list, we're going to roll Giraph back to 0.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-123) the wiki is not publicly accessible

2012-01-11 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184167#comment-13184167
 ] 

Jakob Homan commented on GIRAPH-123:


I've opened INFRA-4318 to have this fixed.  I'll close this when that gets 
resolved.

 the wiki is not publicly accessible
 ---

 Key: GIRAPH-123
 URL: https://issues.apache.org/jira/browse/GIRAPH-123
 Project: Giraph
  Issue Type: Bug
  Components: documentation
Reporter: André Kelpe
Priority: Minor

 When I try to read the documentation on the wiki I end up on a login screen. 
 Can you please make the wiki open for the public.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java

2012-01-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188608#comment-13188608
 ] 

Jakob Homan commented on GIRAPH-126:


yeah, good catch:
{noformat}scala list3.add(42)
java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:131)
at java.util.AbstractList.add(AbstractList.java:91)
at .init(console:7)
at .clinit(console)
at RequestResult$.init(console:9)
at RequestResult$.clinit(console)
at RequestResult$scala_repl_result(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
scala.tools.nsc.Interpreter$Request$$anonfun$loadAndRun$1$$anonfun$apply$18.apply(Interpreter.scala:981)
at 
scala.tools.nsc.Interpreter$Request$$anonfun$loadAndRun$1$$anonfun$apply$...
{noformat}

 Use Collections.emptyList() in BasicRPCCommunications.java
 --

 Key: GIRAPH-126
 URL: https://issues.apache.org/jira/browse/GIRAPH-126
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-126.patch, GIRAPH-126.patch, GIRAPH-126.patch


 I am doing some tests with giraph and I am having some memory problems. While 
 I was browsing through the codebase I saw that you are allocating a new 
 ArrayList (which has an underlying array of 10 elements) for each Vertex, 
 that has no Messages to be delivered. That's a waste of memory and time. This 
 patch replaces it with the EMPTY_LIST of the Collections utility class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java

2012-01-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188666#comment-13188666
 ] 

Jakob Homan commented on GIRAPH-126:


bq. I think we are moving to guava. Much nicer in my opinion.
+1


 Use Collections.emptyList() in BasicRPCCommunications.java
 --

 Key: GIRAPH-126
 URL: https://issues.apache.org/jira/browse/GIRAPH-126
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-126.patch, GIRAPH-126.patch, GIRAPH-126.patch


 I am doing some tests with giraph and I am having some memory problems. While 
 I was browsing through the codebase I saw that you are allocating a new 
 ArrayList (which has an underlying array of 10 elements) for each Vertex, 
 that has no Messages to be delivered. That's a waste of memory and time. This 
 patch replaces it with the EMPTY_LIST of the Collections utility class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-129) enable creation of javadoc and sources jars

2012-01-24 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192430#comment-13192430
 ] 

Jakob Homan commented on GIRAPH-129:


In contrast to {{mvn javadoc:jar}} and {{mvn source:jar}}.  One can call those 
directly, with this change one gets them each time one builds the regular jar.

 enable creation of javadoc and sources jars
 ---

 Key: GIRAPH-129
 URL: https://issues.apache.org/jira/browse/GIRAPH-129
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-129.patch


 It is pretty useful to enable the creation if javadoc and sources jars during 
 the build, so that people using IDEs like eclipse can easily jump into the 
 code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-129) enable creation of javadoc and sources jars

2012-01-25 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193282#comment-13193282
 ] 

Jakob Homan commented on GIRAPH-129:


ok, sounds good.  +1 on the patch.  I'll commit it.

 enable creation of javadoc and sources jars
 ---

 Key: GIRAPH-129
 URL: https://issues.apache.org/jira/browse/GIRAPH-129
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-129.patch


 It is pretty useful to enable the creation if javadoc and sources jars during 
 the build, so that people using IDEs like eclipse can easily jump into the 
 code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-129) enable creation of javadoc and sources jars

2012-01-25 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193308#comment-13193308
 ] 

Jakob Homan commented on GIRAPH-129:


bq. (P.S.: Shouldn't the version of giraph be something like 0.1-SNAPSHOT? That 
would make it easier to introduce releases via the maven-release plugin later 
on.)
Yep.  Wanna spin up a quick patch?

 enable creation of javadoc and sources jars
 ---

 Key: GIRAPH-129
 URL: https://issues.apache.org/jira/browse/GIRAPH-129
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Fix For: 0.1.0

 Attachments: GIRAPH-129.patch


 It is pretty useful to enable the creation if javadoc and sources jars during 
 the build, so that people using IDEs like eclipse can easily jump into the 
 code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-131) enable creation of test-jars to simplify testing in downstream projects

2012-01-27 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195258#comment-13195258
 ] 

Jakob Homan commented on GIRAPH-131:


+1.  Tested patch and verified all the tests and infrastructure are now in the 
new jar.  Adding -SNAPSHOT makes a few more files break the 100-char path limit 
and we get more warnings, but this is expected.

 enable creation of test-jars to simplify testing in downstream projects
 ---

 Key: GIRAPH-131
 URL: https://issues.apache.org/jira/browse/GIRAPH-131
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Priority: Minor
 Attachments: GIRAPH-131.patch


 Attached patch enables the creation of test-jars, which are the tests 
 packaged in a separate jar file. This makes it possible to use the 
 super-useful test infrastructure in MockUtils in downstream projects. If you 
 add the patch, you will get a ${giraph.version}-tests.jar, which can be used 
 for downstream testing like this:
 dependency
   groupIdorg.apache.giraph/groupId
   artifactIdgiraph/artifactId
   version${giraph.version}/version
   typetest-jar/type
   scopetest/scope
 /dependency
 P.S.: The patch also resets the version to 0.1-SNAPSHOT as discussed in 
 GIRAPH-129

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-27 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195347#comment-13195347
 ] 

Jakob Homan commented on GIRAPH-128:


Any reason the question about mocks/extending the class wasn't addressed?

 RPC port from BasicRPCCommunications should be only a starting port, and 
 retried
 

 Key: GIRAPH-128
 URL: https://issues.apache.org/jira/browse/GIRAPH-128
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch


 Currently Giraph uses a basic port + the task partition to get the RPC port.  
 This doesn't work well for when there are multiple Giraph jobs running 
 simultaneously in the same Hadoop cluster (port conflict).  At the same time, 
 it is nice to use this simple algorithm because it makes it very easy to 
 debug problems (you can find the troublesome mapper from the RPC port name).  
 I will be proposing a simple scheme to retry with another port.  I will round 
 the total number of mappers up to the nearest power of 10 (let's that that 
 number Z).  Then I will increment the port number by Z, retrying up to 20 
 tries.  If you have enough ports, this scheme would guarantee that up to 20 
 mappers / node would be supported.  It should be sufficient for most 
 clusters.  At the same time, we still maintain the easy debugging method 
 since you it's still easy to figure out the mapper partition from the port 
 (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-27 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195382#comment-13195382
 ] 

Jakob Homan commented on GIRAPH-128:


Great, thanks.  +1.

 RPC port from BasicRPCCommunications should be only a starting port, and 
 retried
 

 Key: GIRAPH-128
 URL: https://issues.apache.org/jira/browse/GIRAPH-128
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch, 
 GIRAPH-128.4.patch


 Currently Giraph uses a basic port + the task partition to get the RPC port.  
 This doesn't work well for when there are multiple Giraph jobs running 
 simultaneously in the same Hadoop cluster (port conflict).  At the same time, 
 it is nice to use this simple algorithm because it makes it very easy to 
 debug problems (you can find the troublesome mapper from the RPC port name).  
 I will be proposing a simple scheme to retry with another port.  I will round 
 the total number of mappers up to the nearest power of 10 (let's that that 
 number Z).  Then I will increment the port number by Z, retrying up to 20 
 tries.  If you have enough ports, this scheme would guarantee that up to 20 
 mappers / node would be supported.  It should be sufficient for most 
 clusters.  At the same time, we still maintain the easy debugging method 
 since you it's still easy to figure out the mapper partition from the port 
 (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-136) Erorr message for bin/giraph could be improved

2012-02-02 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199482#comment-13199482
 ] 

Jakob Homan commented on GIRAPH-136:


@Avery - how does this one look?

 Erorr message for bin/giraph could be improved
 --

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-136-b.patch, GIRAPH-136.patch


 Currently when one just runs bin/giraph without the required jar, the message 
 isn't very helpful:
 {noformat}[tardis giraph-0.1]$ bin/giraph
 Can't find user jar to execute.{noformat}
 It would be better to have a more in-depth message explaining Giraph and what 
 is expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

2012-02-08 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203958#comment-13203958
 ] 

Jakob Homan commented on GIRAPH-139:


How about I add back in the main and run as deprecated, leave it in for 
developers, and change the wiki to use bin/giraph for the example, with an eye 
to removing it as soon as the example jar is set up?

 Change PageRankBenchmark to be accessible via bin/giraph
 

 Key: GIRAPH-139
 URL: https://issues.apache.org/jira/browse/GIRAPH-139
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-139.patch


 Currently the PageRankBenchmark has its own main and tool implementation and 
 is difficult to access from the bin/giraph script.  It would be better if 
 everything were accessible via bin/giraph.  The benchmark is particularly 
 problematic because it uses inner classes for its two actual Vertex 
 implementations, which have to be specified on the command line as their 
 .class name(ie 
 org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather 
 than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-146) Maven is running the tests twice during builds

2012-02-09 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205131#comment-13205131
 ] 

Jakob Homan commented on GIRAPH-146:


From a run of {{mvn site:site}}.  Other targets have this too.
{noformat}
 grep -n -A 10 T E S T S huh.txt
152: T E S T S
153
154-Running org.apache.giraph.examples.SimpleShortestPathVertexTest
155-12/02/09 17:12:09 INFO server.ZooKeeperServerMain: Starting server
156-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
157-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:host.name=jhoman-mn.linkedin.biz
158-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:java.version=1.6.0_22
159-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:java.vendor=Apple Inc.
160-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:java.home=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
161-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:java.class.path=/Users/jhoman/huh/huh2/g142/target/test-classes:/Users/jhoman/huh/huh2/g142/target/classes:/Users/jhoman/.m2/repository/junit/junit/3.8.1/junit-3.8.1.jar:/Users/jhoman/.m2/repository/org/apache/hadoop/hadoop-core/0.20.203.0/hadoop-core-0.20.203.0.jar:/Users/jhoman/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/Users/jhoman/.m2/repository/commons-httpclient/commons-httpclient/3.0.1/commons-httpclient-3.0.1.jar:/Users/jhoman/.m2/repository/commons-logging/commons-logging/1.0.3/commons-logging-1.0.3.jar:/Users/jhoman/.m2/repository/commons-codec/commons-codec/1.4/commons-codec-1.4.jar:/Users/jhoman/.m2/repository/org/apache/commons/commons-math/2.1/commons-math-2.1.jar:/Users/jhoman/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/Users/jhoman/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/Users/jhoman/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/Users/jhoman/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/Users/jhoman/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/Users/jhoman/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/Users/jhoman/.m2/repository/commons-net/commons-net/1.4.1/commons-net-1.4.1.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/servlet-api/2.5-20081211/servlet-api-2.5-20081211.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/Users/jhoman/.m2/repository/tomcat/jasper-runtime/5.5.12/jasper-runtime-5.5.12.jar:/Users/jhoman/.m2/repository/tomcat/jasper-compiler/5.5.12/jasper-compiler-5.5.12.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/jsp-api-2.1/6.1.14/jsp-api-2.1-6.1.14.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/servlet-api-2.5/6.1.14/servlet-api-2.5-6.1.14.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/jsp-2.1/6.1.14/jsp-2.1-6.1.14.jar:/Users/jhoman/.m2/repository/ant/ant/1.6.5/ant-1.6.5.jar:/Users/jhoman/.m2/repository/commons-el/commons-el/1.0/commons-el-1.0.jar:/Users/jhoman/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/Users/jhoman/.m2/repository/net/sf/kosmosfs/kfs/0.3/kfs-0.3.jar:/Users/jhoman/.m2/repository/hsqldb/hsqldb/1.8.0.10/hsqldb-1.8.0.10.jar:/Users/jhoman/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/Users/jhoman/.m2/repository/org/eclipse/jdt/core/3.1.1/core-3.1.1.jar:/Users/jhoman/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.8.0/jackson-core-asl-1.8.0.jar:/Users/jhoman/.m2/repository/org/apache/mahout/mahout-collections/1.0/mahout-collections-1.0.jar:/Users/jhoman/.m2/repository/com/google/guava/guava/r09/guava-r09.jar:/Users/jhoman/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.8.0/jackson-mapper-asl-1.8.0.jar:/Users/jhoman/.m2/repository/org/apache/zookeeper/zookeeper/3.3.3/zookeeper-3.3.3.jar:/Users/jhoman/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar:/Users/jhoman/.m2/repository/javax/mail/mail/1.4/mail-1.4.jar:/Users/jhoman/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar:/Users/jhoman/.m2/repository/jline/jline/0.9.94/jline-0.9.94.jar:/Users/jhoman/.m2/repository/org/apache/commons/commons-io/1.3.2/commons-io-1.3.2.jar:/Users/jhoman/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/Users/jhoman/.m2/repository/net/iharder/base64/2.3.8/base64-2.3.8.jar:/Users/jhoman/.m2/repository/org/json/json/20090211/json-20090211.jar:/Users/jhoman/.m2/repository/org/mockito/mockito-all/1.8.5/mockito-all-1.8.5.jar:
162-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 

[jira] [Commented] (GIRAPH-146) Maven is running the tests twice during builds

2012-02-09 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205137#comment-13205137
 ] 

Jakob Homan commented on GIRAPH-146:


This might be a hiccup on my side.  The double run from site is to generate the 
test coverage data, and I can't get a second run now on package.  I'll keep 
poking it.

 Maven is running the tests twice during builds
 --

 Key: GIRAPH-146
 URL: https://issues.apache.org/jira/browse/GIRAPH-146
 Project: Giraph
  Issue Type: Bug
  Components: build
Reporter: Jakob Homan

 I had a feeling the build time had jumped significantly... 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-10 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205986#comment-13205986
 ] 

Jakob Homan commented on GIRAPH-148:


This is copied from hdfs-site.xml (because I'm a lazy, lazy man), so it's 
known-good in Apache and xml.  Does the formatting matter?

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-13 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207185#comment-13207185
 ] 

Jakob Homan commented on GIRAPH-40:
---

bq. The below examples are what Checkstyle wants to have us do.
So does that mean code not in that hideous style will be flagged by Checkstyle? 
I'm confused by the next example you posted, which says Checkstyle won't 
enforce indenting post line wrap...


 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-13 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207195#comment-13207195
 ] 

Jakob Homan commented on GIRAPH-40:
---

bq. So for the first example, we need to follow that format, or else checkstyle 
will mark it an error.
Blech. -0.9... That's a big change from what we agreed on earlier.  Can that 
particular check be turned off?

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-13 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207205#comment-13207205
 ] 

Jakob Homan commented on GIRAPH-40:
---

OK.  If we can fix it later, it'll be less traumatic than the patch coming 
today since it'll just apply to method signatures...

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208731#comment-13208731
 ] 

Jakob Homan commented on GIRAPH-40:
---

ok, looks good to me.  +1.  However, since no one can do a full review of this 
patch, I'd like another committer to +1 it as well before committing.  This 
helps us to explain away not actually doing a full review.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.2.patch, GIRAPH-40.patch, GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208750#comment-13208750
 ] 

Jakob Homan commented on GIRAPH-40:
---

Actually, I have a concern:
 Compiles will now fail if checkstyle guidelines are not met.
I tested this and it's true.  This means that if you have an extra space in an 
if statement, you can't compile, even if you're planning to clean up the code 
later.  This is going to be a huge problem.  During development the code has to 
*always* pass checkstyle, not just when submitting a patch.  Is there a way to 
turn this off for compile and just run checkstyle during a specific run? This 
would mean that it would be up to the submitted and committer to verify 
correctness, exactly as is required currently with rat... I have to withdraw my 
-1.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.2.patch, GIRAPH-40.patch, GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208786#comment-13208786
 ] 

Jakob Homan commented on GIRAPH-40:
---

Thanks.  +1 on latest patch, while still hoping to get another committer to 
take a look.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.2.patch, GIRAPH-40.3.patch, GIRAPH-40.patch, 
 GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-147) Add Blueprints Tinkerpop support

2012-02-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208902#comment-13208902
 ] 

Jakob Homan commented on GIRAPH-147:


I'd be reluctant to add the blueprints support at that deep of a level; it 
would be better to have a vertex and edge combo that implements the blueprints 
model higher up.  I'm reluctant to commit to another project at that 
fundamental of a position in our definitions.

 Add Blueprints Tinkerpop support
 

 Key: GIRAPH-147
 URL: https://issues.apache.org/jira/browse/GIRAPH-147
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Priority: Minor

 Got this issue on the old Giraph GitHub (deprecated).  Moving it here.
 jeffg2k opened this issue 2 hours ago
 Hoping that Giraph might add TinkerPop Blueprint support. :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-87) Simplify boolean expression in BspService::checkpointFrequencyMet

2012-02-24 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215955#comment-13215955
 ] 

Jakob Homan commented on GIRAPH-87:
---

Looks good except it fails checkstyle:
{noformat}file 
name=/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/BspService.java
error line=587 severity=error message=Line matches the illegal pattern 
apos;Trailing whitespaceapos;. 
source=com.puppycrawl.tools.checkstyle.checks.RegexpCheck/
error line=587 column=5 severity=error message=apos;}apos; should be 
on the same line. 
source=com.puppycrawl.tools.checkstyle.checks.blocks.RightCurlyCheck/
error line=588 severity=error message=Line matches the illegal pattern 
apos;Trailing whitespaceapos;. 
source=com.puppycrawl.tools.checkstyle.checks.RegexpCheck/
/file{noformat}
Kill the trailing spaces and move the else to the same line and we're good to 
go.

 Simplify boolean expression in BspService::checkpointFrequencyMet
 -

 Key: GIRAPH-87
 URL: https://issues.apache.org/jira/browse/GIRAPH-87
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie
 Attachments: GIRAPH-87.patch


 {noformat}if (superstep  firstCheckpoint) {
 return false;
 } else if (((superstep - firstCheckpoint) % checkpointFrequency) == 
 0) {
 return true;
 } else {
 return false;
 }{noformat}
 can be simplified to just return the result of the else if evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-03-23 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237139#comment-13237139
 ] 

Jakob Homan commented on GIRAPH-85:
---

Let's go ahead and the suppresswarnings.  Eli, can you update the patch and 
re-upload? Thanks.

 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85.patch, GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-25 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237959#comment-13237959
 ] 

Jakob Homan commented on GIRAPH-153:


bq. I'm concerned with how fat the jar becomes once the HBase core files are 
coalesced into the Giraph jar. 
This is a great effort, but will have to be done in some other way than just 
including a direct dependency on hbase into Giraph.  Lots of sites already have 
a different HBase installed and this will just cause headaches for them.  
Alternatively, for those sites that don't use HBase (and may not want it on 
their clusters) these jars as part of Giraph isn't a viable option.  Basically, 
making Giraph depend on HBase is a non-starter.

Can maven modules help us out here? Can we have a separate artifact, 
giraph-hbase-formats.jar or something, we can publish that those that wish this 
functionality can pull in?  That jar can depend on both hbase and giraph with 
no extra requirement on either of those projects.

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-02 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244972#comment-13244972
 ] 

Jakob Homan commented on GIRAPH-153:


bq. I have a subproject 'giraph-formats-contrib'
This sounds like a good name as we can also stash the Hive work Avery has done 
there.

bq. Not this is not a maven submodule that builds as a dependency. It's 
entirely standalone. 
What are the advantages of this approach compard to a maven submodule (keeping 
in mind that I'm a Maven moron)? 

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246478#comment-13246478
 ] 

Jakob Homan commented on GIRAPH-168:


My understanding was that the RPC changes FB had made were backports of changes 
that are in later versions, so I'm not sure if OldRPC is the correct 
description.  Also, within the Hadoop world there's not really talk of old 
versus new RPC (except for the PB-based stuff, which will make this really 
confusing...).  Hadoop security is API-incompatible with Hadoop non-security 
(due to changes in UGI) and FB's distro is insecure and API incompatible due to 
new APIs backported from more modern versions.

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246489#comment-13246489
 ] 

Jakob Homan commented on GIRAPH-153:


Sounds good to me as well.  I'm fine with devs having to build/test against 
this subproject/module; this ensures we don't get out of synch with our 
adapters.  My mail goal is to make sure anyone wanting just Giraph doesn't need 
the hbase/accumulo stuff and it sounds like this does that.  Thanks for the 
hard work, Brian.

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246496#comment-13246496
 ] 

Jakob Homan commented on GIRAPH-168:


bq. except for the PB-based stuf
Where PB = ProtocolBuffers and != FB because this isn't quite confusing enough.

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246594#comment-13246594
 ] 

Jakob Homan commented on GIRAPH-77:
---

bq. Do you or Jakob have a favorite stack to do that?
Nope.  My code was using Scalatra as a learning exercise (and a trojan horse to 
get Scala into the project) and I was liking it a lot.  That may be worth 
taking a look at.

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247019#comment-13247019
 ] 

Jakob Homan commented on GIRAPH-77:
---

The code I've got is a bunch of messing with Scalatra and a few lines to bring 
in a new server per worker, but it's probably gone out of date.  It's not worth 
your time really.  I've got experience with integrating Scala into Java 
projects via Maven.  Let me spin up a quick patch to demonstrate that, probably 
in the next day or so.

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-04-07 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249339#comment-13249339
 ] 

Jakob Homan commented on GIRAPH-85:
---

bq. I would like to throw the idea out there that assigning to the proxy and 
other variables for a moment DOES have a clarity benefit
Generally, I agree with you in all cases except for
{noformat}X x = z.getX()
return x;{noformat}
which is what we've got here.  Anything more complicated like
{noformat}X x = z.getFoo(){noformat}
or
{noformat}X x = z.getX()/2{noformat}
is probably worth keeping by itself.  The patch looks good, but we need to have 
you bless its inclusion into Apache. Can you re-upload #3, with the Apache 
button checked?  Thanks.  I'll commit it thereafter.

 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85-3.patch, GIRAPH-85.patch, GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-04-09 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1324#comment-1324
 ] 

Jakob Homan commented on GIRAPH-85:
---

+1. I've committed this.  Thanks, Eli!


 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85-3.patch, GIRAPH-85-3.patch, GIRAPH-85.patch, 
 GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

2012-04-11 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251884#comment-13251884
 ] 

Jakob Homan commented on GIRAPH-182:


Hey Pradeep. Thanks for the contribution.
Review:
* Apache prohibits author tags to ensure that all the code is viewed as the 
whole community's responsiblity.
* SimpleSequenceFileVertexOutputFormat: We've thus far had the convention of 
using the type names in the in/outputformats. This is a bit verbose and may not 
be the right approach, but it's probably best to keep it in this patch.  Also 
can you provide javadoc for it?
* SequenceFileVertexOutputFormat: Any reason not to use the more standard M 
type variable? Some Javadoc for the class would be nice here too.
* Is it possible to add a unit test just to verify we get out from the file 
what we put in?




 Provide SequenceFileVertexOutputFormat as an available OutputFormat
 ---

 Key: GIRAPH-182
 URL: https://issues.apache.org/jira/browse/GIRAPH-182
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Reporter: Pradeep Gollakota
Assignee: Pradeep Gollakota
Priority: Minor
 Attachments: GIRAPH-182-1.patch


 SequenceFile's are heavily used in Hadoop. We should provide 
 SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is 
 already provided, it makes sense to also provide a mirroring OutputFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-184) Upgrade to junit4

2012-04-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256115#comment-13256115
 ] 

Jakob Homan commented on GIRAPH-184:


We can dramatically shrink this patch with static imports to make this type of 
change unnecessary:
{code}-assertFalse(ComparisonUtils.equal(one, two));
-assertFalse(ComparisonUtils.equal(two, one));
+Assert.assertFalse(ComparisonUtils.equal(one, two));
+Assert.assertFalse(ComparisonUtils.equal(two, one));{code}

 Upgrade to junit4
 -

 Key: GIRAPH-184
 URL: https://issues.apache.org/jira/browse/GIRAPH-184
 Project: Giraph
  Issue Type: Bug
Reporter: Devaraj K
 Attachments: GIRAPH-184-1.patch, GIRAPH-184.patch


 Presently Giraph uses JUnit 3.8.1. We can upgrade to JUnit 4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-180) Publish SNAPSHOTs and released artifacts in the Maven repository

2012-04-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256660#comment-13256660
 ] 

Jakob Homan commented on GIRAPH-180:


bq. The only question I would have though is would we publish different jars 
for every version of hadoop?
Yep. If this can be automated, it may be a reasonable thing to do. If not, 
we're probably better off spending the effort kicking our munging habit.

 Publish SNAPSHOTs and released artifacts in the Maven repository
 

 Key: GIRAPH-180
 URL: https://issues.apache.org/jira/browse/GIRAPH-180
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: Paolo Castagna
Priority: Minor
   Original Estimate: 4h
  Remaining Estimate: 4h

 Currently Giraph uses Maven to drive its build.
 However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven 
 repository or Maven central.
 It would be useful to have Apache Giraph artifacts and SNAPSHOTs published 
 and enable people to use Giraph without recompiling themselves.
 Right now users can checkout Giraph, mvn install it and use this for their 
 dependency:
 dependency
   groupIdorg.apache.giraph/groupId
   artifactIdgiraph/artifactId
   version0.2-SNAPSHOT/version
 /dependency
 So, it's not that bad, but it can be better. :-)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256836#comment-13256836
 ] 

Jakob Homan commented on GIRAPH-153:


bq. Per Keith Turner's comments in HAMA-153 would it make more sense to host 
this submodule on github?
I've spent lots of time doing this with the Avro connector for Hive and wish I 
hadn't.  It's quite easy for the connector code to drift from the main code and 
have users bear the brunt of the impact.

bq. I prefer to have it with Giraph directly. Anyone else?
+1. If these connectors should exist (and I think they should), they should 
work all the time and be maintained.  The best way to ensure this is to host 
them inside one or the other project and since Giraph would sit above HBase (or 
MR), we should host them.  This way the connectors get tested all the time with 
the rest of the code. If there comes a time when we don't have the ability or 
support to keep them maintained, then I'd recommend just deleting them entirely 
from the tree, on the assumption that releasing poorly maintained, 
non-compatible or buggy code is worse than no code at all.  Of course, I doubt 
this will happen and instead expect we'll always have a volunteer with 
hbase/accumulo knowledge to keep the code up to date.

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-184) Upgrade to junit4

2012-04-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256874#comment-13256874
 ] 

Jakob Homan commented on GIRAPH-184:


+1.  There are a couple other changes in terms of simplifying {{boolean == 
true}}, but that's fine.

 Upgrade to junit4
 -

 Key: GIRAPH-184
 URL: https://issues.apache.org/jira/browse/GIRAPH-184
 Project: Giraph
  Issue Type: Bug
Reporter: Devaraj K
 Attachments: GIRAPH-184-1.patch, GIRAPH-184-2.patch, GIRAPH-184.patch


 Presently Giraph uses JUnit 3.8.1. We can upgrade to JUnit 4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira