Please welcome our newest committer and PMC member, Eugene!

2012-05-01 Thread Jakob Homan
I'm happy to announce that the Giraph PMC has voted Eugene Koontz in
as a committer and PMC member.  Eugene has been pitching in with great
patches that have been very useful, such as helping us sort out our
terrifying munging situation (GIRAPH-168).

Welcome aboard, Eugene!

-Jakob


[jira] [Resolved] (GIRAPH-23) Giraph causes capacity scheduler to report crazy statistics

2012-04-23 Thread Jakob Homan (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-23?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved GIRAPH-23.
---

Resolution: Not A Problem

We tracked this down to the mapred.job.map.memory.mb and 
mapred.job.reducer.memory.mb settings which have to be set for 1.0 but not for 
0.20.  Setting them causes the 0.20 JobTracker to go a bit crazy while the job 
is running, but this is a Hadoop problem, not a Giraph one.

 Giraph causes capacity scheduler to report crazy statistics
 ---

 Key: GIRAPH-23
 URL: https://issues.apache.org/jira/browse/GIRAPH-23
 Project: Giraph
  Issue Type: Bug
 Environment: Hadoop 20.2, non-secure with capacity scheduler
Reporter: Jakob Homan

 Not sure why, but all our Giraph jobs create crazy values for the scheduler 
 in terms of number of mappers:
 {noformat}51 running map tasks using -52224 map slots, 0 running reduce tasks 
 using 0 reduce slots. {noformat}
 and this trickles out to the whole cluster:
 {noformat}Used capacity: -58229 (-12468.7% of Capacity){noformat}
 These numbers don't appear to affect the job and the correct themselves a 
 short time after the Giraph job finishes running.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-180) Publish SNAPSHOTs and released artifacts in the Maven repository

2012-04-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256660#comment-13256660
 ] 

Jakob Homan commented on GIRAPH-180:


bq. The only question I would have though is would we publish different jars 
for every version of hadoop?
Yep. If this can be automated, it may be a reasonable thing to do. If not, 
we're probably better off spending the effort kicking our munging habit.

 Publish SNAPSHOTs and released artifacts in the Maven repository
 

 Key: GIRAPH-180
 URL: https://issues.apache.org/jira/browse/GIRAPH-180
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: Paolo Castagna
Priority: Minor
   Original Estimate: 4h
  Remaining Estimate: 4h

 Currently Giraph uses Maven to drive its build.
 However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven 
 repository or Maven central.
 It would be useful to have Apache Giraph artifacts and SNAPSHOTs published 
 and enable people to use Giraph without recompiling themselves.
 Right now users can checkout Giraph, mvn install it and use this for their 
 dependency:
 dependency
   groupIdorg.apache.giraph/groupId
   artifactIdgiraph/artifactId
   version0.2-SNAPSHOT/version
 /dependency
 So, it's not that bad, but it can be better. :-)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256836#comment-13256836
 ] 

Jakob Homan commented on GIRAPH-153:


bq. Per Keith Turner's comments in HAMA-153 would it make more sense to host 
this submodule on github?
I've spent lots of time doing this with the Avro connector for Hive and wish I 
hadn't.  It's quite easy for the connector code to drift from the main code and 
have users bear the brunt of the impact.

bq. I prefer to have it with Giraph directly. Anyone else?
+1. If these connectors should exist (and I think they should), they should 
work all the time and be maintained.  The best way to ensure this is to host 
them inside one or the other project and since Giraph would sit above HBase (or 
MR), we should host them.  This way the connectors get tested all the time with 
the rest of the code. If there comes a time when we don't have the ability or 
support to keep them maintained, then I'd recommend just deleting them entirely 
from the tree, on the assumption that releasing poorly maintained, 
non-compatible or buggy code is worse than no code at all.  Of course, I doubt 
this will happen and instead expect we'll always have a volunteer with 
hbase/accumulo knowledge to keep the code up to date.

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: GIRAPH-153.patch


 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-184) Upgrade to junit4

2012-04-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256874#comment-13256874
 ] 

Jakob Homan commented on GIRAPH-184:


+1.  There are a couple other changes in terms of simplifying {{boolean == 
true}}, but that's fine.

 Upgrade to junit4
 -

 Key: GIRAPH-184
 URL: https://issues.apache.org/jira/browse/GIRAPH-184
 Project: Giraph
  Issue Type: Bug
Reporter: Devaraj K
 Attachments: GIRAPH-184-1.patch, GIRAPH-184-2.patch, GIRAPH-184.patch


 Presently Giraph uses JUnit 3.8.1. We can upgrade to JUnit 4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [DISCUSS] Giraph Graduation (was Re: Giraph status (Was: [Incubator Wiki] Update of April2012 by OwenOmalley))

2012-04-18 Thread Jakob Homan
It's been a week and we definitely have consensus towards graduation.
Let's get started on the vote and other tasks.  I'd recommend
nominating Avery for the first chair; he's been the main driver and
tireless in handling user questions.  I'd also recommend rotating the
chair once a year to make sure there's a wide field of experience.  Do
we need the resolution before starting the vote?

On Sat, Apr 14, 2012 at 12:44 AM, Ashish paliwalash...@gmail.com wrote:
 On Fri, Apr 13, 2012 at 11:02 PM, Eugene Koontz ekoo...@hiro-tan.org wrote:
 +1
 Looking forward to seeing Giraph grow!


 +1


[jira] [Commented] (GIRAPH-184) Upgrade to junit4

2012-04-17 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256115#comment-13256115
 ] 

Jakob Homan commented on GIRAPH-184:


We can dramatically shrink this patch with static imports to make this type of 
change unnecessary:
{code}-assertFalse(ComparisonUtils.equal(one, two));
-assertFalse(ComparisonUtils.equal(two, one));
+Assert.assertFalse(ComparisonUtils.equal(one, two));
+Assert.assertFalse(ComparisonUtils.equal(two, one));{code}

 Upgrade to junit4
 -

 Key: GIRAPH-184
 URL: https://issues.apache.org/jira/browse/GIRAPH-184
 Project: Giraph
  Issue Type: Bug
Reporter: Devaraj K
 Attachments: GIRAPH-184-1.patch, GIRAPH-184.patch


 Presently Giraph uses JUnit 3.8.1. We can upgrade to JUnit 4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

2012-04-11 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251884#comment-13251884
 ] 

Jakob Homan commented on GIRAPH-182:


Hey Pradeep. Thanks for the contribution.
Review:
* Apache prohibits author tags to ensure that all the code is viewed as the 
whole community's responsiblity.
* SimpleSequenceFileVertexOutputFormat: We've thus far had the convention of 
using the type names in the in/outputformats. This is a bit verbose and may not 
be the right approach, but it's probably best to keep it in this patch.  Also 
can you provide javadoc for it?
* SequenceFileVertexOutputFormat: Any reason not to use the more standard M 
type variable? Some Javadoc for the class would be nice here too.
* Is it possible to add a unit test just to verify we get out from the file 
what we put in?




 Provide SequenceFileVertexOutputFormat as an available OutputFormat
 ---

 Key: GIRAPH-182
 URL: https://issues.apache.org/jira/browse/GIRAPH-182
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Reporter: Pradeep Gollakota
Assignee: Pradeep Gollakota
Priority: Minor
 Attachments: GIRAPH-182-1.patch


 SequenceFile's are heavily used in Hadoop. We should provide 
 SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is 
 already provided, it makes sense to also provide a mirroring OutputFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-04-09 Thread Jakob Homan (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved GIRAPH-85.
---

Resolution: Fixed

 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85-3.patch, GIRAPH-85-3.patch, GIRAPH-85.patch, 
 GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-04-09 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1324#comment-1324
 ] 

Jakob Homan commented on GIRAPH-85:
---

+1. I've committed this.  Thanks, Eli!


 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85-3.patch, GIRAPH-85-3.patch, GIRAPH-85.patch, 
 GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-172) Javadoc for BasicVertex:compute link to compute is broken

2012-04-07 Thread Jakob Homan (Created) (JIRA)
Javadoc for BasicVertex:compute link to compute is broken
-

 Key: GIRAPH-172
 URL: https://issues.apache.org/jira/browse/GIRAPH-172
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Priority: Trivial


In BasicVertex the JavaDoc link to #compute can't be resolved:
{code} /**
   * Release unnecessary resources (will be called after vertex returns from
   * {@link #compute()})
   */
  abstract void releaseResources();{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-173) BspCase:getNumWorkers javadoc refers to non-existent parameter

2012-04-07 Thread Jakob Homan (Created) (JIRA)
BspCase:getNumWorkers javadoc refers to non-existent parameter
--

 Key: GIRAPH-173
 URL: https://issues.apache.org/jira/browse/GIRAPH-173
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Priority: Trivial


{code}  /**
   * Get the number of workers used in the BSP application
   *
   * @param numProcs number of processes to use
   */
  public int getNumWorkers() {
return numWorkers;
  }{code}
numProcs is a lie...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-176) BasicRPCCommunications has unnecessary cast of Vertex

2012-04-07 Thread Jakob Homan (Created) (JIRA)
BasicRPCCommunications has unnecessary cast of Vertex
-

 Key: GIRAPH-176
 URL: https://issues.apache.org/jira/browse/GIRAPH-176
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Minor


BasicRPCCommunications.java, 1224:
{code}  BasicVertexI, V, E, M vertex =
  vertexResolver.resolve(vertexIndex,
  originalVertex,
  vertexMutations,
  messages);{code}
and then a few lines later at 1248:
{code}partition.putVertex((BasicVertexI, V, E, M) vertex);{code}
vertex gets cast to its own type. This cast can be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-177) SimplePageRankVertex has two redundant casts

2012-04-07 Thread Jakob Homan (Created) (JIRA)
SimplePageRankVertex has two redundant casts


 Key: GIRAPH-177
 URL: https://issues.apache.org/jira/browse/GIRAPH-177
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial


{code}DoubleWritable maxPagerank =
(DoubleWritable) maxAggreg.getAggregatedValue();
LOG.info(aggregatedMaxPageRank= + maxPagerank.get());
DoubleWritable minPagerank =
(DoubleWritable) minAggreg.getAggregatedValue();
LOG.info(aggregatedMinPageRank= + minPagerank.get());{code}
Both MinAggregator and MaxAggregator are already parameterized on 
DoubleWritable, so it's not necessary to cast their functions' results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-179) BspServiceMaster's PathFilter can be simplified

2012-04-07 Thread Jakob Homan (Created) (JIRA)
BspServiceMaster's PathFilter can be simplified
---

 Key: GIRAPH-179
 URL: https://issues.apache.org/jira/browse/GIRAPH-179
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial


{code}  /**
   * Only get the finalized checkpoint files
   */
  public static class FinalizedCheckpointPathFilter implements PathFilter {
@Override
public boolean accept(Path path) {
  if (path.getName().endsWith(
  BspService.CHECKPOINT_FINALIZED_POSTFIX)) {
return true;
  }
  return false;
}
  }{code}
we can simplify this, eliminating the if statement and just returning the 
result of {{endsWith()}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: On helping new contributors pitch in quickly...

2012-04-07 Thread Jakob Homan
Sorry, took a couple days to get some time, but have now created 8 new
newbie JIRAs.  This should be enough for our new contributors to each
do a couple to get used the hang of contributing to Giraph.  Thanks
Paolo for the reminder!
-Jakob


On Thu, Apr 5, 2012 at 11:43 AM, Dan Brickley dan...@danbri.org wrote:
 On 5 April 2012 17:05, Avery Ching ach...@apache.org wrote:
 Dan, you're definitely right that this has been mentioned a few times.  The
 multigraph issue is one part of it, but a helper VertexInputFormat (and
 maybe VertexOutputFormat) would certainly still help as you mention.  Can
 you please open a JIRA (and help if you have time)?

 Here you go: https://issues.apache.org/jira/browse/GIRAPH-170

 I've tried to summarise discussion from here and elsewhere.

 Dan


[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-04-07 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249339#comment-13249339
 ] 

Jakob Homan commented on GIRAPH-85:
---

bq. I would like to throw the idea out there that assigning to the proxy and 
other variables for a moment DOES have a clarity benefit
Generally, I agree with you in all cases except for
{noformat}X x = z.getX()
return x;{noformat}
which is what we've got here.  Anything more complicated like
{noformat}X x = z.getFoo(){noformat}
or
{noformat}X x = z.getX()/2{noformat}
is probably worth keeping by itself.  The patch looks good, but we need to have 
you bless its inclusion into Apache. Can you re-upload #3, with the Apache 
button checked?  Thanks.  I'll commit it thereafter.

 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85-3.patch, GIRAPH-85.patch, GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246478#comment-13246478
 ] 

Jakob Homan commented on GIRAPH-168:


My understanding was that the RPC changes FB had made were backports of changes 
that are in later versions, so I'm not sure if OldRPC is the correct 
description.  Also, within the Hadoop world there's not really talk of old 
versus new RPC (except for the PB-based stuff, which will make this really 
confusing...).  Hadoop security is API-incompatible with Hadoop non-security 
(due to changes in UGI) and FB's distro is insecure and API incompatible due to 
new APIs backported from more modern versions.

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246489#comment-13246489
 ] 

Jakob Homan commented on GIRAPH-153:


Sounds good to me as well.  I'm fine with devs having to build/test against 
this subproject/module; this ensures we don't get out of synch with our 
adapters.  My mail goal is to make sure anyone wanting just Giraph doesn't need 
the hbase/accumulo stuff and it sounds like this does that.  Thanks for the 
hard work, Brian.

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than HADOOP_FACEBOOK) and remove usage of HADOOP

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246496#comment-13246496
 ] 

Jakob Homan commented on GIRAPH-168:


bq. except for the PB-based stuf
Where PB = ProtocolBuffers and != FB because this isn't quite confusing enough.

 Simplify munge directive usage with new munge flag HADOOP_SECURE (rather than 
 HADOOP_FACEBOOK) and remove usage of HADOOP
 -

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-168.patch, GIRAPH-168.patch, GIRAPH-168.patch


 This JIRA relates to the mail thread here: 
 http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser
 Currently we check for the munge flags HADOOP, HADOOP_FACEBOOK and 
 HADOOP_NON_SECURE when using munge in a few places. Hopefully we can 
 eliminate usage of munge in the future, but until then, we can mitigate the 
 complexity by consolidating the number of flags checked. This JIRA renames 
 HADOOP_FACEBOOK to HADOOP_SECURE, and removes usages of HADOOP, to handle the 
 same conditional compilation requirements. It also makes it easier to add 
 more maven profiles so that we can easily increase our hadoop version 
 coverage.
 This patch modifies the existing hadoop_facebook profile to use the new 
 HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.
 It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
 HADOOP_SECURE. 
 Finally, it adds a default profile, hadoop_0.20.203. This is needed so that 
 we can specify its dependencies separately from hadoop_trunk, because the 
 hadoop dependencies have changed between trunk and 0.205.0 - the former 
 requires hadoop-common, hadoop-mapreduce-client-core, and 
 hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 
 With this patch, the following passes:
 {code}
 mvn clean verify  mvn -Phadoop_trunk clean verify  mvn -Phadoop_0.20.203 
 clean verify
 {code}
 Current problems: 
 * I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
 that uses this is hadoop_non_secure, which fails to compile on trunk: 
 https://issues.apache.org/jira/browse/GIRAPH-167 .
 * I couldn't get -Phadoop_facebook to work; does this work outside of 
 Facebook?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13246594#comment-13246594
 ] 

Jakob Homan commented on GIRAPH-77:
---

bq. Do you or Jakob have a favorite stack to do that?
Nope.  My code was using Scalatra as a learning exercise (and a trojan horse to 
get Scala into the project) and I was liking it a lot.  That may be worth 
taking a look at.

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: On helping new contributors pitch in quickly...

2012-04-04 Thread Jakob Homan
Ack!, I suck.  Sorry.  I hadn't realized we'd gone through most of
them, which itself is a good thing.  I'll get some new ones added
first thing in the morning.  Sorry.
-Jakob


On Wed, Apr 4, 2012 at 9:45 PM, Paolo Castagna
castagna.li...@googlemail.com wrote:
 
 To help new contributors pitch in quickly, we maintain a set of JIRAs [1] that
 focus on getting new contributors started with the mechanics of generating a
 patch — downloading the source, changing a couple lines, creating a patch,
 verifying its correctness, uploading it to JIRA and working with the 
 community —
 rather that deep technical issues within Giraph itself. These are good issues
 with which to join the community.
 

 This is nice, good idea indeed.

 Put more issues there (even if, at the moment, there does not seems to be much
 simple stuff that will get people started around). Things such as port 
 Giraph
 to YARN or a new RPC layer are a bit scary for those just starting (like 
 me). :-)

 Perhaps, another option is to increase number of examples. You already have a
 few interesting one, do you have one or two ideas on a couple of examples 
 which
 could be added to Giraph?

 Paolo

  [1] http://bit.ly/newbie_apache_giraph_issues


Re: Giraph as Whirr service, see WHIRR-530

2012-04-04 Thread Jakob Homan
This is interesting.  Whirr can already spin up Hadoop MR clusters,
which can then run the Giraph jobs.  Once Giraph is bootstrapped onto
YARN, this will make more sense as a Whirr service.

On Wed, Apr 4, 2012 at 9:43 PM, Avery Ching ach...@apache.org wrote:
 I don't use Whirr...I haven't heard it mentioned on this forum yet.  Anyone?

 Avery


 On 4/4/12 9:30 PM, Paolo Castagna wrote:

 Hi,
 seen this?

   WHIRR-530 - Add Giraph as a service
   https://issues.apache.org/jira/browse/WHIRR-530

 This could be quite useful for users who want to give Giraph a spin on
 cloud
 infrastructure, just for testing or to run a few small experiments.
 My experience with Whirr an small 10-20 nodes clusters has be quite
 positive.
 Less so for larger clusters, but it more a problem/limit with the cloud
 provider rather than Whirr itself. I think.

 Whirr makes extremely easy and pleasant deploy stuff on-demand.

 ... and Whirr already supports YARN:
 https://issues.apache.org/jira/browse/WHIRR-391

 Is any Giraph developers/users here also a Whirr user?

 Paolo




[jira] [Commented] (GIRAPH-77) Coordinator should expose a web interface with progress, vertex region assignments, etc.

2012-04-04 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247019#comment-13247019
 ] 

Jakob Homan commented on GIRAPH-77:
---

The code I've got is a bunch of messing with Scalatra and a few lines to bring 
in a new server per worker, but it's probably gone out of date.  It's not worth 
your time really.  I've got experience with integrating Scala into Java 
projects via Maven.  Let me spin up a quick patch to demonstrate that, probably 
in the next day or so.

 Coordinator should expose a web interface with progress, vertex region 
 assignments, etc.
 

 Key: GIRAPH-77
 URL: https://issues.apache.org/jira/browse/GIRAPH-77
 Project: Giraph
  Issue Type: New Feature
Reporter: Jakob Homan

 It would be nice if the coordinator worker had a web interface that showed 
 progress, splits, etc. during job execution. Right now it would duplicate 
 information currently being exposed through task status, but with the move to 
 YARN, it will be a necessity.  It would be great if we could do this in a 
 modern way to avoid the screen-scraping, etc. currently used to get 
 information from most other Hadoop project's web interfaces.  The coordinator 
 could announce its address at the beginning or via status updates.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Status report

2012-04-03 Thread Jakob Homan
 Is it worth mentioning the UC Irvine connection?
... ? Is that the low-budget sequel to the classic Gene Hackman film?

On Mon, Apr 2, 2012 at 10:20 PM, Avery Ching ach...@apache.org wrote:
 Looks good to me as well.

 Avery


 On 4/2/12 10:17 PM, Owen O'Malley wrote:

 That looks great, Jakob. I've put that into the wiki for now until we
 have further edits.

 -- Owen




Re: Status report

2012-04-02 Thread Jakob Homan
I'll do it tonight.

On Mon, Apr 2, 2012 at 4:14 PM, Owen O'Malley omal...@apache.org wrote:
 All,
  We need a status report for the last quarter by Wednesday. Anyone
 want to take the first shot at it?

 -- Owen


[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-04-02 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244972#comment-13244972
 ] 

Jakob Homan commented on GIRAPH-153:


bq. I have a subproject 'giraph-formats-contrib'
This sounds like a good name as we can also stash the Hive work Avery has done 
there.

bq. Not this is not a maven submodule that builds as a dependency. It's 
entirely standalone. 
What are the advantages of this approach compard to a maven submodule (keeping 
in mind that I'm a Maven moron)? 

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-25 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237959#comment-13237959
 ] 

Jakob Homan commented on GIRAPH-153:


bq. I'm concerned with how fat the jar becomes once the HBase core files are 
coalesced into the Giraph jar. 
This is a great effort, but will have to be done in some other way than just 
including a direct dependency on hbase into Giraph.  Lots of sites already have 
a different HBase installed and this will just cause headaches for them.  
Alternatively, for those sites that don't use HBase (and may not want it on 
their clusters) these jars as part of Giraph isn't a viable option.  Basically, 
making Giraph depend on HBase is a non-starter.

Can maven modules help us out here? Can we have a separate artifact, 
giraph-hbase-formats.jar or something, we can publish that those that wish this 
functionality can pull in?  That jar can depend on both hbase and giraph with 
no extra requirement on either of those projects.

 HBase/Accumulo Input and Output formats
 ---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano

 Four abstract classes that wrap their respective delegate input/output 
 formats for
 easy hooks into vertex input format subclasses. I've included some sample 
 programs that show two very simple graph
 algorithms. I have a graph generator that builds out a very simple directed 
 structure, starting with a few 'root' nodes.
 Root nodes are defined as nodes which are not listed as a child anywhere in 
 the graph. 
 Algorithm 1) AccumuloRootMarker.java  -- Accumulo as read/write source. 
 Every vertex starts thinking it's a root. At superstep 0, send a message down 
 to each
 child as a non-root notification. After superstep 1, only root nodes will 
 have never been messaged. 
 Algorithm 2) TableRootMarker -- HBase as read/write source. Expands on A1 by 
 bundling the notification logic followed by root node propagation. Once we've 
 marked the appropriate nodes as roots, tell every child which roots it can be 
 traced back to via one or more spanning trees. This will take N + 2 
 supersteps where N is the maximum number of hops from any root to any leaf, 
 plus 2 supersteps for the initial root flagging. 
 I've included all relevant code plus DistributedCacheHelper.java for 
 recursive cache file and archive searches. It is more hadoop centric than 
 giraph, but these jobs use it so I figured why not commit here. 
 These have been tested through local JobRunner, pseudo-distributed on the 
 aforementioned hardware, and full distributed on EC2. More details in the 
 comments.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-85) Simplify return expression in RPCCommunications::getRPCProxy

2012-03-23 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13237139#comment-13237139
 ] 

Jakob Homan commented on GIRAPH-85:
---

Let's go ahead and the suppresswarnings.  Eli, can you update the patch and 
re-upload? Thanks.

 Simplify return expression in RPCCommunications::getRPCProxy
 

 Key: GIRAPH-85
 URL: https://issues.apache.org/jira/browse/GIRAPH-85
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
  Labels: newbie
 Fix For: 0.2.0

 Attachments: GIRAPH-85.patch, GIRAPH-85.patch


 Twice in RPCCommunications::getRPCProxy a local variable, proxy, is created 
 and immediately returned.  We can simplify this to just return the value.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-160) Vertex reader that reads adjacency lists with no vertex and edge values associated

2012-03-23 Thread Jakob Homan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan reassigned GIRAPH-160:
--

Assignee: Dionysios Logothetis

 Vertex reader that reads adjacency lists with no vertex and edge values 
 associated
 --

 Key: GIRAPH-160
 URL: https://issues.apache.org/jira/browse/GIRAPH-160
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Affects Versions: 0.1.0
Reporter: Dionysios Logothetis
Assignee: Dionysios Logothetis
Priority: Minor
  Labels: features
 Fix For: 0.2.0

 Attachments: GIRAPH-160.patch


 A very common format of graphs is adjacency lists with no values associated 
 to edges or vertices. For instance a line in the input can be of the type:
 1 2 3
 which represents a vertex with id 1 that has edges to vertices 2 and 3 with 
 no values associated.
 I've created a vertex reader named AdjacencyListVertexReader which is 
 essentially a copy of the AdjacencyListVertexReader modified to handle this 
 format. It's an abstract class and subclasses can override the 
 defaultVertexValue() and defaultEdgeValue() methods to provide default values 
 for vertices and edges correspondingly (otherwise values are initialized to 
 null).
 I've also created an example subclass.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: How to contribute page

2012-03-15 Thread Jakob Homan
That's fine.  Can we update the site to point to the wiki (and
harmonize the content), so we don't have duplicate, soon-to-diverage
information?  If so, I'll try to do this pretty soon.

On Wed, Mar 14, 2012 at 11:37 PM, Avery Ching ach...@apache.org wrote:
 Main differences are the 'mvn verify' and running singe node unittest tests.
  It's easier for us to manage on confluence compared to maintaining the site
 =).

 Avery


 On 3/14/12 11:59 AM, Jakob Homan wrote:

 This page looks very similar in content to the Generating Patches and
 Getting Invovled sections on the main site:
 https://incubator.apache.org/giraph/  Are there any significant
 differences?

 On Wed, Mar 14, 2012 at 10:25 AM, Sebastian Schelters...@apache.org
  wrote:

 I added the 'Be involved' part from Mahout's [1] 'How to contribute'
 page. Maybe we could even copy a little more from there :)

 Best,
 Sebastian

 [1] https://cwiki.apache.org/MAHOUT/how-to-contribute.html

 On 14.03.2012 17:39, Avery Ching wrote:

 Yes, that is thanks to Sebastian.  We should probably make that another
 confluence page though based on his notes.  Anyone want to do it? =)

 Avery

 On 3/14/12 7:43 AM, Benjamin Heitmann wrote:

 On 14 Mar 2012, at 07:08, Avery Ching wrote:

 I just added a How to contribute page.

 https://cwiki.apache.org/confluence/display/GIRAPH/How+to+Contribute

 Thanks for setting up this page!

 Also, the link about running giraph's unit test in pseudo distributed
 mode [1] is very interesting.



 [1]
 http://ssc.io/running-giraphs-unit-tests-in-pseudo-distributed-mode/




[jira] [Assigned] (GIRAPH-87) Simplify boolean expression in BspService::checkpointFrequencyMet

2012-02-24 Thread Jakob Homan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan reassigned GIRAPH-87:
-

Assignee: Eli Reisman

 Simplify boolean expression in BspService::checkpointFrequencyMet
 -

 Key: GIRAPH-87
 URL: https://issues.apache.org/jira/browse/GIRAPH-87
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie

 {noformat}if (superstep  firstCheckpoint) {
 return false;
 } else if (((superstep - firstCheckpoint) % checkpointFrequency) == 
 0) {
 return true;
 } else {
 return false;
 }{noformat}
 can be simplified to just return the result of the else if evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-87) Simplify boolean expression in BspService::checkpointFrequencyMet

2012-02-24 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13215955#comment-13215955
 ] 

Jakob Homan commented on GIRAPH-87:
---

Looks good except it fails checkstyle:
{noformat}file 
name=/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/BspService.java
error line=587 severity=error message=Line matches the illegal pattern 
apos;Trailing whitespaceapos;. 
source=com.puppycrawl.tools.checkstyle.checks.RegexpCheck/
error line=587 column=5 severity=error message=apos;}apos; should be 
on the same line. 
source=com.puppycrawl.tools.checkstyle.checks.blocks.RightCurlyCheck/
error line=588 severity=error message=Line matches the illegal pattern 
apos;Trailing whitespaceapos;. 
source=com.puppycrawl.tools.checkstyle.checks.RegexpCheck/
/file{noformat}
Kill the trailing spaces and move the else to the same line and we're good to 
go.

 Simplify boolean expression in BspService::checkpointFrequencyMet
 -

 Key: GIRAPH-87
 URL: https://issues.apache.org/jira/browse/GIRAPH-87
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: newbie
 Attachments: GIRAPH-87.patch


 {noformat}if (superstep  firstCheckpoint) {
 return false;
 } else if (((superstep - firstCheckpoint) % checkpointFrequency) == 
 0) {
 return true;
 } else {
 return false;
 }{noformat}
 can be simplified to just return the result of the else if evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208731#comment-13208731
 ] 

Jakob Homan commented on GIRAPH-40:
---

ok, looks good to me.  +1.  However, since no one can do a full review of this 
patch, I'd like another committer to +1 it as well before committing.  This 
helps us to explain away not actually doing a full review.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.2.patch, GIRAPH-40.patch, GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208750#comment-13208750
 ] 

Jakob Homan commented on GIRAPH-40:
---

Actually, I have a concern:
 Compiles will now fail if checkstyle guidelines are not met.
I tested this and it's true.  This means that if you have an extra space in an 
if statement, you can't compile, even if you're planning to clean up the code 
later.  This is going to be a huge problem.  During development the code has to 
*always* pass checkstyle, not just when submitting a patch.  Is there a way to 
turn this off for compile and just run checkstyle during a specific run? This 
would mean that it would be up to the submitted and committer to verify 
correctness, exactly as is required currently with rat... I have to withdraw my 
-1.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.2.patch, GIRAPH-40.patch, GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208786#comment-13208786
 ] 

Jakob Homan commented on GIRAPH-40:
---

Thanks.  +1 on latest patch, while still hoping to get another committer to 
take a look.

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.2.patch, GIRAPH-40.3.patch, GIRAPH-40.patch, 
 GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-147) Add Blueprints Tinkerpop support

2012-02-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208902#comment-13208902
 ] 

Jakob Homan commented on GIRAPH-147:


I'd be reluctant to add the blueprints support at that deep of a level; it 
would be better to have a vertex and edge combo that implements the blueprints 
model higher up.  I'm reluctant to commit to another project at that 
fundamental of a position in our definitions.

 Add Blueprints Tinkerpop support
 

 Key: GIRAPH-147
 URL: https://issues.apache.org/jira/browse/GIRAPH-147
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Priority: Minor

 Got this issue on the old Giraph GitHub (deprecated).  Moving it here.
 jeffg2k opened this issue 2 hours ago
 Hoping that Giraph might add TinkerPop Blueprint support. :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-13 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207185#comment-13207185
 ] 

Jakob Homan commented on GIRAPH-40:
---

bq. The below examples are what Checkstyle wants to have us do.
So does that mean code not in that hideous style will be flagged by Checkstyle? 
I'm confused by the next example you posted, which says Checkstyle won't 
enforce indenting post line wrap...


 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-13 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207195#comment-13207195
 ] 

Jakob Homan commented on GIRAPH-40:
---

bq. So for the first example, we need to follow that format, or else checkstyle 
will mark it an error.
Blech. -0.9... That's a big change from what we agreed on earlier.  Can that 
particular check be turned off?

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-40) Adding checkstyle enforcement of Giraph code conventions

2012-02-13 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-40?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13207205#comment-13207205
 ] 

Jakob Homan commented on GIRAPH-40:
---

OK.  If we can fix it later, it'll be less traumatic than the patch coming 
today since it'll just apply to method signatures...

 Adding checkstyle enforcement of Giraph code conventions
 

 Key: GIRAPH-40
 URL: https://issues.apache.org/jira/browse/GIRAPH-40
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-40.patch


 Now that we have some code conventions (see GIRAPH-21), we should enforce 
 them with a maven checkstyle plugin.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-13 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-148:
---

Attachment: GIRAPH-148-b.patch

Here's one copied and pasted from our pom.xml

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148-b.patch, GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-10 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-148:
---

Attachment: GIRAPH-148.patch

Quick patch...

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-10 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-148:
---

Summary: giraph-site.xml needs Apache header  (was: giraph-site.xml needs 
Apache head)

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-148) giraph-site.xml needs Apache header

2012-02-10 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205986#comment-13205986
 ] 

Jakob Homan commented on GIRAPH-148:


This is copied from hdfs-site.xml (because I'm a lazy, lazy man), so it's 
known-good in Apache and xml.  Does the formatting matter?

 giraph-site.xml needs Apache header
 ---

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-148.patch


 I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-145) Change partition request log level to debug rather than info

2012-02-09 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-145:
---

Attachment: GIRAPH-145.patch

Quick patch to go down to debug level.  Verified with tests and cluster run.

 Change partition request log level to debug rather than info
 

 Key: GIRAPH-145
 URL: https://issues.apache.org/jira/browse/GIRAPH-145
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-145.patch


 {code:title=BasicRPCCommunications.java|borderStyle=solid}
 if (LOG.isInfoEnabled()) {
 LOG.info(sendPartitionReq: Sending to  + rpcProxy.getName() +
+ addr +  from  + workerInfo +
  , with partition  + partition);
 }{code}
 is too chatty.  We're seeing thousands and sounds of these lines for larger 
 graphs.  This should be at debug level...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-142) _hadoopBsp should be prefixable via configuration

2012-02-09 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-142:
---

Attachment: GIRAPH-142.patch

Patch to add new config value, giraph.zkBaseZNode, that is the top-level for 
all giraph-created content on the ZK server.  New unit test.  Verified on 
running cluster as well.

 _hadoopBsp should be prefixable via configuration
 -

 Key: GIRAPH-142
 URL: https://issues.apache.org/jira/browse/GIRAPH-142
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-142.patch


 In multitennant zookeeper clusters, it would be good to be able to specify 
 the base directory that's created for the _hadoopBsp znodes.  This would also 
 fix the issue we have with creating that directory in the source root during 
 tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-146) Maven is running the tests twice during builds

2012-02-09 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205131#comment-13205131
 ] 

Jakob Homan commented on GIRAPH-146:


From a run of {{mvn site:site}}.  Other targets have this too.
{noformat}
 grep -n -A 10 T E S T S huh.txt
152: T E S T S
153
154-Running org.apache.giraph.examples.SimpleShortestPathVertexTest
155-12/02/09 17:12:09 INFO server.ZooKeeperServerMain: Starting server
156-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011 22:27 GMT
157-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:host.name=jhoman-mn.linkedin.biz
158-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:java.version=1.6.0_22
159-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:java.vendor=Apple Inc.
160-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:java.home=/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home
161-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:java.class.path=/Users/jhoman/huh/huh2/g142/target/test-classes:/Users/jhoman/huh/huh2/g142/target/classes:/Users/jhoman/.m2/repository/junit/junit/3.8.1/junit-3.8.1.jar:/Users/jhoman/.m2/repository/org/apache/hadoop/hadoop-core/0.20.203.0/hadoop-core-0.20.203.0.jar:/Users/jhoman/.m2/repository/xmlenc/xmlenc/0.52/xmlenc-0.52.jar:/Users/jhoman/.m2/repository/commons-httpclient/commons-httpclient/3.0.1/commons-httpclient-3.0.1.jar:/Users/jhoman/.m2/repository/commons-logging/commons-logging/1.0.3/commons-logging-1.0.3.jar:/Users/jhoman/.m2/repository/commons-codec/commons-codec/1.4/commons-codec-1.4.jar:/Users/jhoman/.m2/repository/org/apache/commons/commons-math/2.1/commons-math-2.1.jar:/Users/jhoman/.m2/repository/commons-configuration/commons-configuration/1.6/commons-configuration-1.6.jar:/Users/jhoman/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/Users/jhoman/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/Users/jhoman/.m2/repository/commons-digester/commons-digester/1.8/commons-digester-1.8.jar:/Users/jhoman/.m2/repository/commons-beanutils/commons-beanutils/1.7.0/commons-beanutils-1.7.0.jar:/Users/jhoman/.m2/repository/commons-beanutils/commons-beanutils-core/1.8.0/commons-beanutils-core-1.8.0.jar:/Users/jhoman/.m2/repository/commons-net/commons-net/1.4.1/commons-net-1.4.1.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/servlet-api/2.5-20081211/servlet-api-2.5-20081211.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/Users/jhoman/.m2/repository/tomcat/jasper-runtime/5.5.12/jasper-runtime-5.5.12.jar:/Users/jhoman/.m2/repository/tomcat/jasper-compiler/5.5.12/jasper-compiler-5.5.12.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/jsp-api-2.1/6.1.14/jsp-api-2.1-6.1.14.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/servlet-api-2.5/6.1.14/servlet-api-2.5-6.1.14.jar:/Users/jhoman/.m2/repository/org/mortbay/jetty/jsp-2.1/6.1.14/jsp-2.1-6.1.14.jar:/Users/jhoman/.m2/repository/ant/ant/1.6.5/ant-1.6.5.jar:/Users/jhoman/.m2/repository/commons-el/commons-el/1.0/commons-el-1.0.jar:/Users/jhoman/.m2/repository/net/java/dev/jets3t/jets3t/0.7.1/jets3t-0.7.1.jar:/Users/jhoman/.m2/repository/net/sf/kosmosfs/kfs/0.3/kfs-0.3.jar:/Users/jhoman/.m2/repository/hsqldb/hsqldb/1.8.0.10/hsqldb-1.8.0.10.jar:/Users/jhoman/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar:/Users/jhoman/.m2/repository/org/eclipse/jdt/core/3.1.1/core-3.1.1.jar:/Users/jhoman/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.8.0/jackson-core-asl-1.8.0.jar:/Users/jhoman/.m2/repository/org/apache/mahout/mahout-collections/1.0/mahout-collections-1.0.jar:/Users/jhoman/.m2/repository/com/google/guava/guava/r09/guava-r09.jar:/Users/jhoman/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.8.0/jackson-mapper-asl-1.8.0.jar:/Users/jhoman/.m2/repository/org/apache/zookeeper/zookeeper/3.3.3/zookeeper-3.3.3.jar:/Users/jhoman/.m2/repository/log4j/log4j/1.2.15/log4j-1.2.15.jar:/Users/jhoman/.m2/repository/javax/mail/mail/1.4/mail-1.4.jar:/Users/jhoman/.m2/repository/javax/activation/activation/1.1/activation-1.1.jar:/Users/jhoman/.m2/repository/jline/jline/0.9.94/jline-0.9.94.jar:/Users/jhoman/.m2/repository/org/apache/commons/commons-io/1.3.2/commons-io-1.3.2.jar:/Users/jhoman/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/Users/jhoman/.m2/repository/net/iharder/base64/2.3.8/base64-2.3.8.jar:/Users/jhoman/.m2/repository/org/json/json/20090211/json-20090211.jar:/Users/jhoman/.m2/repository/org/mockito/mockito-all/1.8.5/mockito-all-1.8.5.jar:
162-12/02/09 17:12:09 INFO server.ZooKeeperServer: Server 
environment:java.library.path=.:/Library/Java/Extensions:/System

[jira] [Commented] (GIRAPH-146) Maven is running the tests twice during builds

2012-02-09 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13205137#comment-13205137
 ] 

Jakob Homan commented on GIRAPH-146:


This might be a hiccup on my side.  The double run from site is to generate the 
test coverage data, and I can't get a second run now on package.  I'll keep 
poking it.

 Maven is running the tests twice during builds
 --

 Key: GIRAPH-146
 URL: https://issues.apache.org/jira/browse/GIRAPH-146
 Project: Giraph
  Issue Type: Bug
  Components: build
Reporter: Jakob Homan

 I had a feeling the build time had jumped significantly... 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-139) Change PageRankBenchmark to be accessible via bin/giraph

2012-02-08 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13203958#comment-13203958
 ] 

Jakob Homan commented on GIRAPH-139:


How about I add back in the main and run as deprecated, leave it in for 
developers, and change the wiki to use bin/giraph for the example, with an eye 
to removing it as soon as the example jar is set up?

 Change PageRankBenchmark to be accessible via bin/giraph
 

 Key: GIRAPH-139
 URL: https://issues.apache.org/jira/browse/GIRAPH-139
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-139.patch


 Currently the PageRankBenchmark has its own main and tool implementation and 
 is difficult to access from the bin/giraph script.  It would be better if 
 everything were accessible via bin/giraph.  The benchmark is particularly 
 problematic because it uses inner classes for its two actual Vertex 
 implementations, which have to be specified on the command line as their 
 .class name(ie 
 org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather 
 than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-143) Add support for giraph to have a conf file

2012-02-08 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-143:
---

Component/s: conf and scripts

 Add support for giraph to have a conf file
 --

 Key: GIRAPH-143
 URL: https://issues.apache.org/jira/browse/GIRAPH-143
 Project: Giraph
  Issue Type: New Feature
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-143.patch


 Currently one must provide all the Giraph-specific config values either via 
 the command line or snuck into another project's conf file.  Any 
 self-respecting Hadoop ecosystem project should have its own conf file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-136) Error message for bin/giraph could be improved

2012-02-03 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-136:
---

Summary: Error message for bin/giraph could be improved  (was: Erorr 
message for bin/giraph could be improved)

 Error message for bin/giraph could be improved
 --

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-136-b.patch, GIRAPH-136.patch


 Currently when one just runs bin/giraph without the required jar, the message 
 isn't very helpful:
 {noformat}[tardis giraph-0.1]$ bin/giraph
 Can't find user jar to execute.{noformat}
 It would be better to have a more in-depth message explaining Giraph and what 
 is expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[RESULT] [VOTE] Release Giraph 0.1-incubating (rc0)

2012-02-03 Thread Jakob Homan
Woohoo.  Vote passes:
PPMC +1s x 4: Avery, Hyunsik, Jake, Claudio
Mentors +1s x 1: Owen
Peanut gallery +1s x 1: Harsh

Hadoop versions tested (not technically part of vote, but nice to
know): 20.2, 1.0 and FB Distro.

Will start a new vote on Incubator for their OK.

Thanks, everybody.


Re: [VOTE] Release Giraph 0.1-incubating (rc0)

2012-02-02 Thread Jakob Homan
Are you +1ing the release, or just the idea of having a source release
in general?

The vote ends tomorrow, so it would be great if the committers and
mentors could take a look...


On Thu, Feb 2, 2012 at 2:18 PM, Avery Ching ach...@apache.org wrote:
 +1.
 I'm fine with this.

 Avery


 On 1/31/12 8:45 PM, Jakob Homan wrote:

 I think these concerns preclude the entire idea of a release.

 As mentioned above, we're releasing a tag (a specific svn revision).
 That is what the release is.  Both src .tar.gz and binary files are
 courtesies.

 A release should be something that users can use as a dependency. . .like
 a maven coordinate.

 A source release in no way prevents us from creating jars of the
 release and adding them to Apache's maven repo.  In fact, we can't add
 a jar until we have a release.

 I think you guys should wait until you have made these decisions

 If you would like to assist with moving away from the munging, there
 is an open JIRA to do so.  Any effort would be appreciated.

 To address the issues of binaries, could we release multiple binaries of
 Giraph that coincide with the different versions of Hadoop?

 Adding in external dependencies for a binary release (and even just
 for a source release with jars that couldn't be brought in via
 maven/sbt) caused significant delay recently for Kafka.  I'd like to
 avoid that here.  Also, since we intend to release early and often,
 there's no reason we can't follow up with a 0.2 in short order - there
 are going to be a lot of patches in the next few weeks.


 On Tue, Jan 31, 2012 at 8:17 PM, Avery Chingach...@apache.org  wrote:

 To address the issues of binaries, could we release multiple binaries of
 Giraph that coincide with the different versions of Hadoop?


 On 1/31/12 7:44 PM, David Garcia wrote:

 I think these concerns preclude the entire idea of a release.  A release
 should be something that users can use as a dependency. . .like a maven
 coordinate.  I think you guys should wait until you have made these
 decisions. . .and then cut a binary.

 On 1/31/12 5:36 PM, Jakob Homanjgho...@gmail.com    wrote:

 Giraphers-
 I've created a candidate for our first release. It's a source release
 without a binary for two reasons: first, there's still discussion
 going on about what needs to be done for the NOTICE and LICENSE files
 for projects that bring in transitive dependencies to the binary
 release

 (http://www.mail-archive.com/general@incubator.apache.org/msg32693.html)
 and second because we're still munging our binary against three types
 of Hadoop, which would mean we'd need to release three different
 binary artifacts, which seems suboptimal.  Hopefully both of these
 issues will be addressed by 0.2.

 I've tested the release against an unsecure 20.2 cluster.  It'd be
 great to test it against other configurations.  Note that we're voting
 on the tag; the files are provided as a convenience.

 Release notes:


 http://people.apache.org/~jghoman/giraph-0.1.0-incubating-rc0/RELEASE_NOTE
 S.html

 Release artifacts:
 http://people.apache.org/~jghoman/giraph-0.1.0-incubating-rc0/

 Corresponding svn tag:
 http://svn.apache.org/repos/asf/incubator/giraph/tags/release-0.1-rc0/

 Our signing keys (my key doesn't seem to be being picked up by
 http://people.apache.org/keys/group/giraph.asc):
 http://svn.apache.org/repos/asf/incubator/giraph/KEYS

 The vote runs for 72 hours, until Friday 4pm PST.  After a successful
 vote here, Incubator will vote on the release as well.

 Thanks,
 Jakob





[jira] [Commented] (GIRAPH-136) Erorr message for bin/giraph could be improved

2012-02-02 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13199482#comment-13199482
 ] 

Jakob Homan commented on GIRAPH-136:


@Avery - how does this one look?

 Erorr message for bin/giraph could be improved
 --

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-136-b.patch, GIRAPH-136.patch


 Currently when one just runs bin/giraph without the required jar, the message 
 isn't very helpful:
 {noformat}[tardis giraph-0.1]$ bin/giraph
 Can't find user jar to execute.{noformat}
 It would be better to have a more in-depth message explaining Giraph and what 
 is expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-136) Erorr message for bin/giraph could be improved

2012-02-01 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-136:
---

Attachment: GIRAPH-136-b.patch

Here's a version that tries to be a bit smarter.  If there's no lib directory, 
it checks for a target directory (if target doesn't exist, it exits) and loads 
the giraph jar from there and sets the classpath via maven (as described above).

This will work for dev enviroments with a hadoop instance.  Invariably, this 
won't work for someone and need to be modified more, but that's how these 
scripts end up becoming so convoluted.

 Erorr message for bin/giraph could be improved
 --

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0

 Attachments: GIRAPH-136-b.patch, GIRAPH-136.patch


 Currently when one just runs bin/giraph without the required jar, the message 
 isn't very helpful:
 {noformat}[tardis giraph-0.1]$ bin/giraph
 Can't find user jar to execute.{noformat}
 It would be better to have a more in-depth message explaining Giraph and what 
 is expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-134) Fix NOTICE file for release

2012-01-31 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-134:
---

Summary: Fix NOTICE file for release  (was: Fix NOTICE and LICENSE files)

 Fix NOTICE file for release
 ---

 Key: GIRAPH-134
 URL: https://issues.apache.org/jira/browse/GIRAPH-134
 Project: Giraph
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.1.0

 Attachments: GIRAPH-134.patch


 Currently both the LICENSE and NOTICE file are out of compliance for an 
 Apache release.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-136) Erorr message for bin/giraph could be improved

2012-01-31 Thread Jakob Homan (Created) (JIRA)
Erorr message for bin/giraph could be improved
--

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan


Currently when one just runs bin/giraph without the required jar, the message 
isn't very helpful:
{noformat}[tardis giraph-0.1]$ bin/giraph
Can't find user jar to execute.{noformat}
It would be better to have a more in-depth message explaining Giraph and what 
is expected.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: svn commit: r1238775 - /incubator/giraph/branches/branch-0.1/

2012-01-31 Thread Jakob Homan
Yeah, I'll do that.  Quite a few projects have these steps automated
via Maven, but that's way beyond my maven-fu.  Perhaps Andre would be
able to help with this?


On Tue, Jan 31, 2012 at 12:05 PM, Avery Ching ach...@apache.org wrote:
 Thanks again for doing the release Jakob.  It would be awesome if you could
 keep documentation on the steps you are taking so future releases will be
 easy.

 Avery


 On 1/31/12 11:54 AM, jgho...@apache.org wrote:

 Author: jghoman
 Date: Tue Jan 31 19:54:50 2012
 New Revision: 1238775

 URL: http://svn.apache.org/viewvc?rev=1238775view=rev
 Log:
 Branching from trunk at r1238773 for 0.1 release.

 Added:
     incubator/giraph/branches/branch-0.1/
       - copied from r1238774, incubator/giraph/trunk/




Re: [VOTE] Release Giraph 0.1-incubating (rc0)

2012-01-31 Thread Jakob Homan
 I think these concerns preclude the entire idea of a release.
As mentioned above, we're releasing a tag (a specific svn revision).
That is what the release is.  Both src .tar.gz and binary files are
courtesies.

A release should be something that users can use as a dependency. . .like a 
maven coordinate.
A source release in no way prevents us from creating jars of the
release and adding them to Apache's maven repo.  In fact, we can't add
a jar until we have a release.

 I think you guys should wait until you have made these decisions
If you would like to assist with moving away from the munging, there
is an open JIRA to do so.  Any effort would be appreciated.

 To address the issues of binaries, could we release multiple binaries of 
 Giraph that coincide with the different versions of Hadoop?
Adding in external dependencies for a binary release (and even just
for a source release with jars that couldn't be brought in via
maven/sbt) caused significant delay recently for Kafka.  I'd like to
avoid that here.  Also, since we intend to release early and often,
there's no reason we can't follow up with a 0.2 in short order - there
are going to be a lot of patches in the next few weeks.


On Tue, Jan 31, 2012 at 8:17 PM, Avery Ching ach...@apache.org wrote:
 To address the issues of binaries, could we release multiple binaries of
 Giraph that coincide with the different versions of Hadoop?


 On 1/31/12 7:44 PM, David Garcia wrote:

 I think these concerns preclude the entire idea of a release.  A release
 should be something that users can use as a dependency. . .like a maven
 coordinate.  I think you guys should wait until you have made these
 decisions. . .and then cut a binary.

 On 1/31/12 5:36 PM, Jakob Homanjgho...@gmail.com  wrote:

 Giraphers-
 I've created a candidate for our first release. It's a source release
 without a binary for two reasons: first, there's still discussion
 going on about what needs to be done for the NOTICE and LICENSE files
 for projects that bring in transitive dependencies to the binary
 release
 (http://www.mail-archive.com/general@incubator.apache.org/msg32693.html)
 and second because we're still munging our binary against three types
 of Hadoop, which would mean we'd need to release three different
 binary artifacts, which seems suboptimal.  Hopefully both of these
 issues will be addressed by 0.2.

 I've tested the release against an unsecure 20.2 cluster.  It'd be
 great to test it against other configurations.  Note that we're voting
 on the tag; the files are provided as a convenience.

 Release notes:

 http://people.apache.org/~jghoman/giraph-0.1.0-incubating-rc0/RELEASE_NOTE
 S.html

 Release artifacts:
 http://people.apache.org/~jghoman/giraph-0.1.0-incubating-rc0/

 Corresponding svn tag:
 http://svn.apache.org/repos/asf/incubator/giraph/tags/release-0.1-rc0/

 Our signing keys (my key doesn't seem to be being picked up by
 http://people.apache.org/keys/group/giraph.asc):
 http://svn.apache.org/repos/asf/incubator/giraph/KEYS

 The vote runs for 72 hours, until Friday 4pm PST.  After a successful
 vote here, Incubator will vote on the release as well.

 Thanks,
 Jakob




[jira] [Created] (GIRAPH-134) Fix NOTICE and LICENSE files

2012-01-30 Thread Jakob Homan (Created) (JIRA)
Fix NOTICE and LICENSE files


 Key: GIRAPH-134
 URL: https://issues.apache.org/jira/browse/GIRAPH-134
 Project: Giraph
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.1.0


Currently both the LICENSE and NOTICE file are out of compliance for an Apache 
release.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (GIRAPH-134) Fix NOTICE and LICENSE files

2012-01-30 Thread Jakob Homan (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated GIRAPH-134:
---

Attachment: GIRAPH-134.patch

LICENSE is actually ok for a source release, but NOTICE needs to be made 
minimal (see KAFKA-219 and associated incubator discussion list).  For the 
binary release, we'll add transitive dependencies via the maven external 
release plugin, so that'll be another JIRA.

 Fix NOTICE and LICENSE files
 

 Key: GIRAPH-134
 URL: https://issues.apache.org/jira/browse/GIRAPH-134
 Project: Giraph
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.1.0

 Attachments: GIRAPH-134.patch


 Currently both the LICENSE and NOTICE file are out of compliance for an 
 Apache release.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-131) enable creation of test-jars to simplify testing in downstream projects

2012-01-27 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195258#comment-13195258
 ] 

Jakob Homan commented on GIRAPH-131:


+1.  Tested patch and verified all the tests and infrastructure are now in the 
new jar.  Adding -SNAPSHOT makes a few more files break the 100-char path limit 
and we get more warnings, but this is expected.

 enable creation of test-jars to simplify testing in downstream projects
 ---

 Key: GIRAPH-131
 URL: https://issues.apache.org/jira/browse/GIRAPH-131
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Priority: Minor
 Attachments: GIRAPH-131.patch


 Attached patch enables the creation of test-jars, which are the tests 
 packaged in a separate jar file. This makes it possible to use the 
 super-useful test infrastructure in MockUtils in downstream projects. If you 
 add the patch, you will get a ${giraph.version}-tests.jar, which can be used 
 for downstream testing like this:
 dependency
   groupIdorg.apache.giraph/groupId
   artifactIdgiraph/artifactId
   version${giraph.version}/version
   typetest-jar/type
   scopetest/scope
 /dependency
 P.S.: The patch also resets the version to 0.1-SNAPSHOT as discussed in 
 GIRAPH-129

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-131) enable creation of test-jars to simplify testing in downstream projects

2012-01-27 Thread Jakob Homan (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved GIRAPH-131.


   Resolution: Fixed
Fix Version/s: 0.1.0
 Assignee: André Kelpe

I've committed this.  Resolving as fixed.  Thanks, André!

 enable creation of test-jars to simplify testing in downstream projects
 ---

 Key: GIRAPH-131
 URL: https://issues.apache.org/jira/browse/GIRAPH-131
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Fix For: 0.1.0

 Attachments: GIRAPH-131.patch


 Attached patch enables the creation of test-jars, which are the tests 
 packaged in a separate jar file. This makes it possible to use the 
 super-useful test infrastructure in MockUtils in downstream projects. If you 
 add the patch, you will get a ${giraph.version}-tests.jar, which can be used 
 for downstream testing like this:
 dependency
   groupIdorg.apache.giraph/groupId
   artifactIdgiraph/artifactId
   version${giraph.version}/version
   typetest-jar/type
   scopetest/scope
 /dependency
 P.S.: The patch also resets the version to 0.1-SNAPSHOT as discussed in 
 GIRAPH-129

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-27 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195347#comment-13195347
 ] 

Jakob Homan commented on GIRAPH-128:


Any reason the question about mocks/extending the class wasn't addressed?

 RPC port from BasicRPCCommunications should be only a starting port, and 
 retried
 

 Key: GIRAPH-128
 URL: https://issues.apache.org/jira/browse/GIRAPH-128
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch


 Currently Giraph uses a basic port + the task partition to get the RPC port.  
 This doesn't work well for when there are multiple Giraph jobs running 
 simultaneously in the same Hadoop cluster (port conflict).  At the same time, 
 it is nice to use this simple algorithm because it makes it very easy to 
 debug problems (you can find the troublesome mapper from the RPC port name).  
 I will be proposing a simple scheme to retry with another port.  I will round 
 the total number of mappers up to the nearest power of 10 (let's that that 
 number Z).  Then I will increment the port number by Z, retrying up to 20 
 tries.  If you have enough ports, this scheme would guarantee that up to 20 
 mappers / node would be supported.  It should be sufficient for most 
 clusters.  At the same time, we still maintain the easy debugging method 
 since you it's still easy to figure out the mapper partition from the port 
 (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-27 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13195382#comment-13195382
 ] 

Jakob Homan commented on GIRAPH-128:


Great, thanks.  +1.

 RPC port from BasicRPCCommunications should be only a starting port, and 
 retried
 

 Key: GIRAPH-128
 URL: https://issues.apache.org/jira/browse/GIRAPH-128
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-128.2.patch, GIRAPH-128.3.patch, 
 GIRAPH-128.4.patch


 Currently Giraph uses a basic port + the task partition to get the RPC port.  
 This doesn't work well for when there are multiple Giraph jobs running 
 simultaneously in the same Hadoop cluster (port conflict).  At the same time, 
 it is nice to use this simple algorithm because it makes it very easy to 
 debug problems (you can find the troublesome mapper from the RPC port name).  
 I will be proposing a simple scheme to retry with another port.  I will round 
 the total number of mappers up to the nearest power of 10 (let's that that 
 number Z).  Then I will increment the port number by Z, retrying up to 20 
 tries.  If you have enough ports, this scheme would guarantee that up to 20 
 mappers / node would be supported.  It should be sufficient for most 
 clusters.  At the same time, we still maintain the easy debugging method 
 since you it's still easy to figure out the mapper partition from the port 
 (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-129) enable creation of javadoc and sources jars

2012-01-25 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193282#comment-13193282
 ] 

Jakob Homan commented on GIRAPH-129:


ok, sounds good.  +1 on the patch.  I'll commit it.

 enable creation of javadoc and sources jars
 ---

 Key: GIRAPH-129
 URL: https://issues.apache.org/jira/browse/GIRAPH-129
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-129.patch


 It is pretty useful to enable the creation if javadoc and sources jars during 
 the build, so that people using IDEs like eclipse can easily jump into the 
 code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-129) enable creation of javadoc and sources jars

2012-01-25 Thread Jakob Homan (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved GIRAPH-129.


   Resolution: Fixed
Fix Version/s: 0.1.0

I've committed this.  Resolving as fixed.  Thanks for the contribution, André!

 enable creation of javadoc and sources jars
 ---

 Key: GIRAPH-129
 URL: https://issues.apache.org/jira/browse/GIRAPH-129
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Fix For: 0.1.0

 Attachments: GIRAPH-129.patch


 It is pretty useful to enable the creation if javadoc and sources jars during 
 the build, so that people using IDEs like eclipse can easily jump into the 
 code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-129) enable creation of javadoc and sources jars

2012-01-25 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193308#comment-13193308
 ] 

Jakob Homan commented on GIRAPH-129:


bq. (P.S.: Shouldn't the version of giraph be something like 0.1-SNAPSHOT? That 
would make it easier to introduce releases via the maven-release plugin later 
on.)
Yep.  Wanna spin up a quick patch?

 enable creation of javadoc and sources jars
 ---

 Key: GIRAPH-129
 URL: https://issues.apache.org/jira/browse/GIRAPH-129
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Fix For: 0.1.0

 Attachments: GIRAPH-129.patch


 It is pretty useful to enable the creation if javadoc and sources jars during 
 the build, so that people using IDEs like eclipse can easily jump into the 
 code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-130) Fix Javadoc warnings

2012-01-24 Thread Jakob Homan (Created) (JIRA)
Fix Javadoc warnings


 Key: GIRAPH-130
 URL: https://issues.apache.org/jira/browse/GIRAPH-130
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Priority: Minor


We've accumulated a fair number of javadoc warnings recently:
{noformat}[WARNING] Javadoc Warnings
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:129:
 warning - @param argument superstep is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84:
 warning - @param argument vertexIndex is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84:
 warning - @param argument msgList is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
 warning - Tag @link: reference not found: VertexIdMessage
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
 warning - Tag @link: reference not found: messages
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
 warning - Tag @link: reference not found: messages
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/AggregatorWriter.java:60:
 warning - @param argument map is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/GiraphJob.java:432:
 warning - @param argument graphPartitionerClass is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
 warning - Tag @link: reference not found: messages
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
 warning - @param argument availableWorkerInfos is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java:176:
 warning - @param argument allPartitionStatsList is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
 warning - Tag @link: reference not found: VertexIdMessage
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
 warning - Tag @link: reference not found: VertexIdMessage
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
 warning - Tag @link: reference not found: VertexIdMessage
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
 warning - Tag @link: reference not found: VertexIdMessage
{noformat}

It would be good to fix these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-129) enable creation of javadoc and sources jars

2012-01-24 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13192430#comment-13192430
 ] 

Jakob Homan commented on GIRAPH-129:


In contrast to {{mvn javadoc:jar}} and {{mvn source:jar}}.  One can call those 
directly, with this change one gets them each time one builds the regular jar.

 enable creation of javadoc and sources jars
 ---

 Key: GIRAPH-129
 URL: https://issues.apache.org/jira/browse/GIRAPH-129
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-129.patch


 It is pretty useful to enable the creation if javadoc and sources jars during 
 the build, so that people using IDEs like eclipse can easily jump into the 
 code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Assigned] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java

2012-01-18 Thread Jakob Homan (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan reassigned GIRAPH-126:
--

Assignee: André Kelpe

 Use Collections.emptyList() in BasicRPCCommunications.java
 --

 Key: GIRAPH-126
 URL: https://issues.apache.org/jira/browse/GIRAPH-126
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-126.patch


 I am doing some tests with giraph and I am having some memory problems. While 
 I was browsing through the codebase I saw that you are allocating a new 
 ArrayList (which has an underlying array of 10 elements) for each Vertex, 
 that has no Messages to be delivered. That's a waste of memory and time. This 
 patch replaces it with the EMPTY_LIST of the Collections utility class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java

2012-01-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188608#comment-13188608
 ] 

Jakob Homan commented on GIRAPH-126:


yeah, good catch:
{noformat}scala list3.add(42)
java.lang.UnsupportedOperationException
at java.util.AbstractList.add(AbstractList.java:131)
at java.util.AbstractList.add(AbstractList.java:91)
at .init(console:7)
at .clinit(console)
at RequestResult$.init(console:9)
at RequestResult$.clinit(console)
at RequestResult$scala_repl_result(console)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
scala.tools.nsc.Interpreter$Request$$anonfun$loadAndRun$1$$anonfun$apply$18.apply(Interpreter.scala:981)
at 
scala.tools.nsc.Interpreter$Request$$anonfun$loadAndRun$1$$anonfun$apply$...
{noformat}

 Use Collections.emptyList() in BasicRPCCommunications.java
 --

 Key: GIRAPH-126
 URL: https://issues.apache.org/jira/browse/GIRAPH-126
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-126.patch, GIRAPH-126.patch, GIRAPH-126.patch


 I am doing some tests with giraph and I am having some memory problems. While 
 I was browsing through the codebase I saw that you are allocating a new 
 ArrayList (which has an underlying array of 10 elements) for each Vertex, 
 that has no Messages to be delivered. That's a waste of memory and time. This 
 patch replaces it with the EMPTY_LIST of the Collections utility class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java

2012-01-18 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13188666#comment-13188666
 ] 

Jakob Homan commented on GIRAPH-126:


bq. I think we are moving to guava. Much nicer in my opinion.
+1


 Use Collections.emptyList() in BasicRPCCommunications.java
 --

 Key: GIRAPH-126
 URL: https://issues.apache.org/jira/browse/GIRAPH-126
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Assignee: André Kelpe
Priority: Minor
 Attachments: GIRAPH-126.patch, GIRAPH-126.patch, GIRAPH-126.patch


 I am doing some tests with giraph and I am having some memory problems. While 
 I was browsing through the codebase I saw that you are allocating a new 
 ArrayList (which has an underlying array of 10 elements) for each Vertex, 
 that has no Messages to be delivered. That's a waste of memory and time. This 
 patch replaces it with the EMPTY_LIST of the Collections utility class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: on the semantics of the combiner

2012-01-13 Thread Jakob Homan
 be
 implemented within a
 message object (still reducing the number of messages to 1 or 0)
 I suppose
 that in some simple cases (i.e. grouping), it might be easier by
 doing it in
 the combiner as you both have mentioned?  The only thing I
 suppose I'm
 concerned about is letting users do something that is not optimal.
   Generally, expanding messages is not what you want your
 combiner to do.
   Also, since grouping behavior can be implemented in the message
 object, it
 forces users to avoid shooting themselves in the foot.

 Good discussion (it's making me really think about this)!

 Avery


 On 1/10/12 10:32 AM, Claudio Martella wrote:
 Ok, now i see where you're going. I guess that the thing here is
 that
 the combiner would act like (on its behalf) D, and to do so
 concretely it would probably need some local data related to D
 (edges
 values? vertexvalue?).
 I also think that k    n is also possible in principle and we
 could let
 the user decide whether to use this power or not, once/if we agree
 that letting the user send k messages in the combiner is useful
 (and
 the grouping behavior shown by the label propagation example
 should do
 so).

 On Tue, Jan 10, 2012 at 7:04 PM, Jakob
 Homanjgho...@gmail.com    wrote:
 Those two messages would have gone to D, been expanded to, say, 4,
 which would have then then been sent to, say, M.  This would
 save the
 sending of the two to D and send the 4 directly to M.  I'm not
 saying
 it's a great example, but it is legal.  This is of course assuming
 that combiners can generate messages bound for vertices other
 than the
 original destination, which I don't know if that has even been
 discussed.

 On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
 claudio.marte...@gmail.com    wrote:
 i'm not sure i understand what you'd save here. if the two
 messages
 were going to be expanded to k messages on the destination
 worker D,
 but you expand them on W, you end up sending k messages
 instead of 2.
 right?

 On Tue, Jan 10, 2012 at 6:26 PM, Jakob
 Homanjgho...@gmail.com    wrote:
 it doesn't have to be expand, k, the number of elements
 returned by
 the combiner, can still be smaller than n,
 Right.  Grouping would be the most common case.  It would be
 possible
 to be great than k, as well.  For instance, consider two
 messages,
 both generated on the same worker (W) by two two different
 vertices,
 both bound for another vertex, Z.  A combiner on W could get
 both of
 these messages, do some work on them, as it would have
 knowledge of
 both, and generate some arbitrary number of messages bound
 for other
 vertices (thus saving the shuffle/transfer of the original
 messages).


 On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
 claudio.marte...@gmail.com    wrote:
 it doesn't have to be expand, k, the number of elements
 returned by
 the combiner, can still be smaller than n, the size of the
 messages
 parameter. as a first example, you can imagine your vertex
 receiving
 semantically-different classes/types of messages, and you
 can imagine
 willing to be summarizing them in different messages, i.e.
 if your
 messages come along with labels or just simply by the source
 vertex,
 if required by the algorithm, think of label propagation to
 have just
 an example, or some sort of labeled-pagerank.

 On Tue, Jan 10, 2012 at 3:05 AM, Avery Chingach...@apache.org
   wrote:
 I agree that CA doesn't require it, however, I can't think
 of why I
 would
 want to use a combiner to expand the number of messages.
 Can you?

 Avery


 On 1/9/12 3:57 PM, Jakob Homan wrote:
 In my opinion that means reducing to a single message or
 none at
 all.
 CA doesn't require this, however.  Hadoop's combiner
 interface, for
 instance, doesn't require a single  or no value to be
 returned; it
 has
 the same interface as a reducer, zero or more values.  Would
 adapting
 the semantics of Giraph's combiner to return a list of
 messages
 (possibly empty) make it more useful?

 On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
 claudio.marte...@gmail.com      wrote:
 Yes, what is you say is completely reasonable, you
 convinced me :)

 On Mon, Jan 9, 2012 at 11:28 PM, Avery
 Chingach...@apache.org
   wrote:
 Combiners should be commutative and associative.  In my
 opinion
 that
 means
 reducing to a single message or none at all.  Can you
 think of a
 case
 when
 more than 1 message should be returned from a combiner?
 I know
 that
 returning null isn't preferable in general, but I think
 that
 functionality
 (returning no messages), is nice to have and isn't a
 huge amount
 of work
 on
 our side.

 Avery


 On 1/9/12 12:13 PM, Claudio Martella wrote:
 To clarify, I was not discussing the possibility for
 combine to
 return
 null. I see why it would be useful, given that combine
 returns M,
 there's no other way to let combiner ask not to send
 any message,
 although i agree with Jakob, I also believe returning
 null should
 be
 avoided but only used, roughly, as an init value for a
 reference/pointer

[jira] [Commented] (GIRAPH-123) the wiki is not publicly accessible

2012-01-11 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184167#comment-13184167
 ] 

Jakob Homan commented on GIRAPH-123:


I've opened INFRA-4318 to have this fixed.  I'll close this when that gets 
resolved.

 the wiki is not publicly accessible
 ---

 Key: GIRAPH-123
 URL: https://issues.apache.org/jira/browse/GIRAPH-123
 Project: Giraph
  Issue Type: Bug
  Components: documentation
Reporter: André Kelpe
Priority: Minor

 When I try to read the documentation on the wiki I end up on a login screen. 
 Can you please make the wiki open for the public.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Call for Submission Berlin Buzzwords 2012all for Submission Berlin Buzzwords - http://berlinbuzzwords.de

2012-01-11 Thread Jakob Homan
I'm planning on submitting one.

On Wed, Jan 11, 2012 at 12:26 PM, Sebastian Schelter s...@apache.org wrote:
 Forwarding Simon's call for Berlin Buzzwords.

 Does anybody plan to give a talk about Giraph at Buzzwords? I'll
 definitely be at the conference as I'm living in Berlin. We should
 also try to organize a Giraph meeting in the evening maybe together
 with the Mahout people.

 Best,
 Sebastian


 -- Forwarded message --
 From: Simon Willnauer simon.willna...@googlemail.com
 Date: 2012/1/11
 Subject: Call for Submission Berlin Buzzwords 2012all for Submission
 Berlin Buzzwords - http://berlinbuzzwords.de
 To: java-user java-u...@lucene.apache.org, d...@lucene.apache.org,
 solr-u...@lucene.apache.org, mahout-...@lucene.apache.org,
 lucy-...@incubator.apache.org, lucy-u...@incubator.apache.org,
 mapreduce-u...@hadoop.apache.org, hdfs-u...@hadoop.apache.org,
 hdfs-...@hadoop.apache.org, mapreduce-...@hadoop.apache.org,
 gene...@lucene.apache.org


 Call for Submission Berlin Buzzwords 2012 - Search, Store, Scale  --
 June 4 / 5. 2012

 The event will comprise presentations on scalable data processing. We
 invite you to submit talks on the topics:
  * IR / Search - Lucene, Solr, katta, ElasticSearch or comparable solutions
  * NoSQL - like CouchDB, MongoDB, Jackrabbit, HBase and others
  * Hadoop - Hadoop itself, MapReduce, Cascading or Pig and relatives

 Related topics not explicitly listed above are more than welcome. We are
 looking for presentations on the implementation of the systems
 themselves, technical talks,
 real world applications and case studies.

 Important Dates (all dates in GMT +2)
  * Submission deadline: March 11th 2012, 23:59 MEZ
  * Notification of accepted speakers: April 6st, 2012, MEZ
  * Publication of final schedule: April 13th, 2012
  * Conference: June 4/5. 2012

 High quality, technical submissions are called for, ranging from
 principles to practice. We are looking for real world use cases,
 background on the architecture of specific projects and a deep dive
 into architectures built on top of e.g. Hadoop clusters.

 To submit your proposal please register to our website [1] and log in
 [2] once you received the confirmation email. Once this is done you
 can submit your proposal here [3]; please do so no later than March
 11th, 2012. Acceptance notifications will be sent out soon after the
 submission deadline. Please include your name, bio and email, the
 title of the talk, a brief abstract in English language. Please
 indicate whether you want to give a lightning (10min), short (20min)
 or long (40min) presentation and indicate the level of experience with
 the topic your audience should have (e.g. whether your talk will be
 suitable for newbies or is targeted for experienced users.) If you'd
 like to pitch your brand new product in your talk, please let us know
 as well -
 there will be extra space for presenting new ideas, awesome products
 and great new projects.

 The presentation format is short. We will be enforcing the schedule 
 rigorously.

 If you are interested in sponsoring the event (e.g. we would be happy
 to provide videos after the event, free drinks for attendees as well
 as an after-show party), please contact us.

 Follow @berlinbuzzwords on Twitter for updates. Tickets, news on the
 conference, and the final schedule are be published at
 http://berlinbuzzwords.de.

 Program Committee Chairs:

  *  Isabel Drost (Nokia  Apache Mahout)
  *  Jan Lehnardt (CouchBase  Apache CouchDB)
  *  Simon Willnauer (SearchWorkings  Apache Lucene)
  *  Grant Ingersoll (Lucid Imagination  Apache Lucene)
  *  Owen O’Malley (Yahoo Inc.  Apache Hadoop)
  *  Jim Webber (Neo Technology  Neo4j)
  *  Sean Treadway (Soundcloud)


 Please re-distribute this CfP to people who might be interested.

 Contact us at:

 newthinking communications
 GmbH Schönhauser Allee 6/7
 10119 Berlin,
 Germany
 Julia Gemählich j...@newthinking.de
 Isabel Drost i...@newthinking.de
 Simon Willnauer sim...@apache.org
  +49(0)30-9210 596

 [1] http://berlinbuzzwords.de/user/register
 [2] http://berlinbuzzwords.de/user
 [3] http://berlinbuzzwords.de/node/add/session


[jira] [Resolved] (GIRAPH-123) the wiki is not publicly accessible

2012-01-11 Thread Jakob Homan (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved GIRAPH-123.


Resolution: Fixed
  Assignee: Jakob Homan

INFRA has done this for us.  Let us know if you run into any issues with it.  
Resolving.

 the wiki is not publicly accessible
 ---

 Key: GIRAPH-123
 URL: https://issues.apache.org/jira/browse/GIRAPH-123
 Project: Giraph
  Issue Type: Bug
  Components: documentation
Reporter: André Kelpe
Assignee: Jakob Homan
Priority: Minor

 When I try to read the documentation on the wiki I end up on a login screen. 
 Can you please make the wiki open for the public.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: on the semantics of the combiner

2012-01-10 Thread Jakob Homan
 it doesn't have to be expand, k, the number of elements returned by
 the combiner, can still be smaller than n,
Right.  Grouping would be the most common case.  It would be possible
to be great than k, as well.  For instance, consider two messages,
both generated on the same worker (W) by two two different vertices,
both bound for another vertex, Z.  A combiner on W could get both of
these messages, do some work on them, as it would have knowledge of
both, and generate some arbitrary number of messages bound for other
vertices (thus saving the shuffle/transfer of the original messages).


On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
claudio.marte...@gmail.com wrote:
 it doesn't have to be expand, k, the number of elements returned by
 the combiner, can still be smaller than n, the size of the messages
 parameter. as a first example, you can imagine your vertex receiving
 semantically-different classes/types of messages, and you can imagine
 willing to be summarizing them in different messages, i.e. if your
 messages come along with labels or just simply by the source vertex,
 if required by the algorithm, think of label propagation to have just
 an example, or some sort of labeled-pagerank.

 On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching ach...@apache.org wrote:
 I agree that CA doesn't require it, however, I can't think of why I would
 want to use a combiner to expand the number of messages.  Can you?

 Avery


 On 1/9/12 3:57 PM, Jakob Homan wrote:

 In my opinion that means reducing to a single message or none at all.

 CA doesn't require this, however.  Hadoop's combiner interface, for
 instance, doesn't require a single  or no value to be returned; it has
 the same interface as a reducer, zero or more values.  Would adapting
 the semantics of Giraph's combiner to return a list of messages
 (possibly empty) make it more useful?

 On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
 claudio.marte...@gmail.com  wrote:

 Yes, what is you say is completely reasonable, you convinced me :)

 On Mon, Jan 9, 2012 at 11:28 PM, Avery Chingach...@apache.org  wrote:

 Combiners should be commutative and associative.  In my opinion that
 means
 reducing to a single message or none at all.  Can you think of a case
 when
 more than 1 message should be returned from a combiner?  I know that
 returning null isn't preferable in general, but I think that
 functionality
 (returning no messages), is nice to have and isn't a huge amount of work
 on
 our side.

 Avery


 On 1/9/12 12:13 PM, Claudio Martella wrote:

 To clarify, I was not discussing the possibility for combine to return
 null. I see why it would be useful, given that combine returns M,
 there's no other way to let combiner ask not to send any message,
 although i agree with Jakob, I also believe returning null should be
 avoided but only used, roughly, as an init value for a
 reference/pointer.
 Perhaps, we could, but i'm just thinking out loud here, let combine()
 return IterableM, basicallly letting it define what to combine to
 ({0, 1, k } messages). It would be a powerful extension to the model,
 but maybe it's too much.

 As far as the size of the messages parameter, I agree with you that 0
 messages gives nothing to combine and it would be somehow awkward, it
 was more a matter of synching it with the other methods getting the
 messages parameter.
 Probably, having a more clear javadoc will do the job here.

 What do you think?

 On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homanjgho...@gmail.com
  wrote:

 I'm not a big fan of returning null as it adds extra complexity to the
 calling code (null checks, or not, since people usually will forget
 them).  Avery is correct that combiners are application specific.  Is
 it conceivable that one would want to write a combiner that returned
 something for an input of no parameters, ie combining the empty list
 doesn't return the empty list?  I imagine for most combiners,
 combining a single message would result in that message.

 On Mon, Jan 9, 2012 at 11:28 AM, Avery Chingach...@apache.org
  wrote:

 The javadoc for VertexCombiner#combine() is

  /**
   * Combines message values for a particular vertex index.
   *
   * @param vertexIndex Index of the vertex getting these messages
   * @param msgList List of the messages to be combined
   * @return Message that is combined from {@link MsgList} or null if
 no
   *         message it to be sent
   * @throws IOException
   */

 I think we are somewhat vague on what a combiner can return to
 support
 various use cases.  A combiner should be particular to a particular
 compute() algorithm.  I think it should be legal to return null from
 a
 combiner, in that case, no message should be sent to that vertex.

 It seems like it would be an overhead to call a combiner when there
 are
 0
 messages.  I can't see a case where that would be useful.  Perhaps we
 should
 change the javadoc to insure that msgList must contain at least one
 message
 to have combine() being called

Re: on the semantics of the combiner

2012-01-10 Thread Jakob Homan
Those two messages would have gone to D, been expanded to, say, 4,
which would have then then been sent to, say, M.  This would save the
sending of the two to D and send the 4 directly to M.  I'm not saying
it's a great example, but it is legal.  This is of course assuming
that combiners can generate messages bound for vertices other than the
original destination, which I don't know if that has even been
discussed.

On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
claudio.marte...@gmail.com wrote:
 i'm not sure i understand what you'd save here. if the two messages
 were going to be expanded to k messages on the destination worker D,
 but you expand them on W, you end up sending k messages instead of 2.
 right?

 On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan jgho...@gmail.com wrote:
 it doesn't have to be expand, k, the number of elements returned by
 the combiner, can still be smaller than n,
 Right.  Grouping would be the most common case.  It would be possible
 to be great than k, as well.  For instance, consider two messages,
 both generated on the same worker (W) by two two different vertices,
 both bound for another vertex, Z.  A combiner on W could get both of
 these messages, do some work on them, as it would have knowledge of
 both, and generate some arbitrary number of messages bound for other
 vertices (thus saving the shuffle/transfer of the original messages).


 On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
 claudio.marte...@gmail.com wrote:
 it doesn't have to be expand, k, the number of elements returned by
 the combiner, can still be smaller than n, the size of the messages
 parameter. as a first example, you can imagine your vertex receiving
 semantically-different classes/types of messages, and you can imagine
 willing to be summarizing them in different messages, i.e. if your
 messages come along with labels or just simply by the source vertex,
 if required by the algorithm, think of label propagation to have just
 an example, or some sort of labeled-pagerank.

 On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching ach...@apache.org wrote:
 I agree that CA doesn't require it, however, I can't think of why I would
 want to use a combiner to expand the number of messages.  Can you?

 Avery


 On 1/9/12 3:57 PM, Jakob Homan wrote:

 In my opinion that means reducing to a single message or none at all.

 CA doesn't require this, however.  Hadoop's combiner interface, for
 instance, doesn't require a single  or no value to be returned; it has
 the same interface as a reducer, zero or more values.  Would adapting
 the semantics of Giraph's combiner to return a list of messages
 (possibly empty) make it more useful?

 On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
 claudio.marte...@gmail.com  wrote:

 Yes, what is you say is completely reasonable, you convinced me :)

 On Mon, Jan 9, 2012 at 11:28 PM, Avery Chingach...@apache.org  wrote:

 Combiners should be commutative and associative.  In my opinion that
 means
 reducing to a single message or none at all.  Can you think of a case
 when
 more than 1 message should be returned from a combiner?  I know that
 returning null isn't preferable in general, but I think that
 functionality
 (returning no messages), is nice to have and isn't a huge amount of work
 on
 our side.

 Avery


 On 1/9/12 12:13 PM, Claudio Martella wrote:

 To clarify, I was not discussing the possibility for combine to return
 null. I see why it would be useful, given that combine returns M,
 there's no other way to let combiner ask not to send any message,
 although i agree with Jakob, I also believe returning null should be
 avoided but only used, roughly, as an init value for a
 reference/pointer.
 Perhaps, we could, but i'm just thinking out loud here, let combine()
 return IterableM, basicallly letting it define what to combine to
 ({0, 1, k } messages). It would be a powerful extension to the model,
 but maybe it's too much.

 As far as the size of the messages parameter, I agree with you that 0
 messages gives nothing to combine and it would be somehow awkward, it
 was more a matter of synching it with the other methods getting the
 messages parameter.
 Probably, having a more clear javadoc will do the job here.

 What do you think?

 On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homanjgho...@gmail.com
  wrote:

 I'm not a big fan of returning null as it adds extra complexity to the
 calling code (null checks, or not, since people usually will forget
 them).  Avery is correct that combiners are application specific.  Is
 it conceivable that one would want to write a combiner that returned
 something for an input of no parameters, ie combining the empty list
 doesn't return the empty list?  I imagine for most combiners,
 combining a single message would result in that message.

 On Mon, Jan 9, 2012 at 11:28 AM, Avery Chingach...@apache.org
  wrote:

 The javadoc for VertexCombiner#combine() is

  /**
   * Combines message values for a particular vertex index.
   *
   * @param

Re: on the semantics of the combiner

2012-01-10 Thread Jakob Homan
 would
 want to use a combiner to expand the number of messages.  Can you?

 Avery


 On 1/9/12 3:57 PM, Jakob Homan wrote:

 In my opinion that means reducing to a single message or none at
 all.

 CA doesn't require this, however.  Hadoop's combiner interface, for
 instance, doesn't require a single  or no value to be returned; it
 has
 the same interface as a reducer, zero or more values.  Would
 adapting
 the semantics of Giraph's combiner to return a list of messages
 (possibly empty) make it more useful?

 On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
 claudio.marte...@gmail.com    wrote:

 Yes, what is you say is completely reasonable, you convinced me :)

 On Mon, Jan 9, 2012 at 11:28 PM, Avery Chingach...@apache.org
  wrote:

 Combiners should be commutative and associative.  In my opinion
 that
 means
 reducing to a single message or none at all.  Can you think of a
 case
 when
 more than 1 message should be returned from a combiner?  I know
 that
 returning null isn't preferable in general, but I think that
 functionality
 (returning no messages), is nice to have and isn't a huge amount
 of work
 on
 our side.

 Avery


 On 1/9/12 12:13 PM, Claudio Martella wrote:

 To clarify, I was not discussing the possibility for combine to
 return
 null. I see why it would be useful, given that combine returns M,
 there's no other way to let combiner ask not to send any message,
 although i agree with Jakob, I also believe returning null should
 be
 avoided but only used, roughly, as an init value for a
 reference/pointer.
 Perhaps, we could, but i'm just thinking out loud here, let
 combine()
 return IterableM, basicallly letting it define what to combine
 to
 ({0, 1, k } messages). It would be a powerful extension to the
 model,
 but maybe it's too much.

 As far as the size of the messages parameter, I agree with you
 that 0
 messages gives nothing to combine and it would be somehow
 awkward, it
 was more a matter of synching it with the other methods getting
 the
 messages parameter.
 Probably, having a more clear javadoc will do the job here.

 What do you think?

 On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homanjgho...@gmail.com
  wrote:

 I'm not a big fan of returning null as it adds extra complexity
 to the
 calling code (null checks, or not, since people usually will
 forget
 them).  Avery is correct that combiners are application
 specific.  Is
 it conceivable that one would want to write a combiner that
 returned
 something for an input of no parameters, ie combining the empty
 list
 doesn't return the empty list?  I imagine for most combiners,
 combining a single message would result in that message.

 On Mon, Jan 9, 2012 at 11:28 AM, Avery Chingach...@apache.org
  wrote:

 The javadoc for VertexCombiner#combine() is

  /**
   * Combines message values for a particular vertex index.
   *
   * @param vertexIndex Index of the vertex getting these
 messages
   * @param msgList List of the messages to be combined
   * @return Message that is combined from {@link MsgList} or
 null if
 no
   *         message it to be sent
   * @throws IOException
   */

 I think we are somewhat vague on what a combiner can return to
 support
 various use cases.  A combiner should be particular to a
 particular
 compute() algorithm.  I think it should be legal to return null
 from
 a
 combiner, in that case, no message should be sent to that
 vertex.

 It seems like it would be an overhead to call a combiner when
 there
 are
 0
 messages.  I can't see a case where that would be useful.
  Perhaps we
 should
 change the javadoc to insure that msgList must contain at least
 one
 message
 to have combine() being called.

 Avery


 On 1/9/12 5:37 AM, Claudio Martella wrote:

 Hi Sebastian,

 yes, that was my point, I agree completely with you.
 Fixing my test was not the issue, my question was whether we
 want to
 define explicitly the semantics of this scenario.
 Personally, I believe the combiner should be ready to receive
 0
 messages, as it's the case of BasicVertex::initialize(),
 putMessages()
 and compute(), and act accordingly.

 In the particular example, I believe the SimpleSumCombiner is
 bugged.
 It's true that the sum of no values is 0, but it's also true
 that
 the
 null return semantics of combine() is more suitable for this
 exact
 situation.


 On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
 Schelters...@apache.org
  wrote:

 I think we currently implicitly assume that there is at least
 one
 element in the Iterable passed to the combiner. The messaging
 code
 only
 invokes the combiner only if at least one message for the
 target
 vertex
 has been sent.

 However, we should not rely on implicit implementation
 details but
 explicitly specify the semantics of combiners.

 --sebastian

 On 09.01.2012 13:29, Claudio Martella wrote:

 Hello list,

 for GIRAPH-45 I'm touching the incoming messages and hit an
 interesting problem with the combiner semantics.
 currently, my code fails testBspCombiner for the following

Re: on the semantics of the combiner

2012-01-10 Thread Jakob Homan
A composite object would essentially be a wrapper around a list and
introduce the need for all vertices to be ready to extract that list
at all times.  For instance, a combiner passed 10 messages may be able
to combine 7 of them but do nothing with the other three, leaving four
messages.  If we allow zero or one return elements, the combiner would
have to create a composite object with a list of those four messages,
whereas if we return a list, it just skips that step and returns the
four messages.  Additionally, the receiving vertex would have to
handle the possibility of a composite object every time even though
the combiner may or may not have been run during the superstep, or
even included in that job (since combiners are optional to the job
itself).  It would be better if one could write a Giraph application
that was completely agnostic of whether or not a combiner was
included.

On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
claudio.marte...@gmail.com wrote:
 I believe the argument of not letting users shoot their foot doesn't
 stand :) Once you give them any API they have the power to do anything
 wrong, as they already can with Giraph (or anything else for what it
 matters), by designing an algorithm wrongly (which would be what it
 would turn out to be a wrong combiner). It's definitely true that a
 composite object would make the grouping (ListGroup) but I thought
 we were talking about simplifying life to users :). I think it would
 be more flexible (for the present and for the future) and also more
 elegant,  but not necessarily a must (although it'd come practically
 for free).

 Very cool discussion.

 On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan jgho...@gmail.com wrote:
 Combiners can only modify the messages sent to a single vertex, so they 
 can't send messages to other vertices.
 Yeah, the more I've thought about this, the more problematic it would
 be.  These new messages may be generated upon arrival at the
 destination vertex (since combiners can be run on the receiving vertex
 before processing as well).  When would they be forwarded to their new
 destinations at that point?  It would be possible to get into a
 feedback loop of messages jumping around before a superstep could ever
 actually be done.

 That being said, our inability to think of a good application doesn't
 mean there won't be one in the future, and it's probably better to be
 more flexible than try to impose what appears optimal now.  The
 benefit of forcing 0 or 1 message from a combiner seems less than the
 flexibility of allowing another list of messages (which may or may not
 be the same number of elements as the original, less than, or even
 more than).

Good discussion (it's making me really think about this)!
 Agreed.


 On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching ach...@apache.org wrote:
 The general idea of combiners is to reduce the number of messages sent.
  Combiners are purely an optimization and the application should work
 correctly without it (since it's never guaranteed to actually be called).
  Combiners can only modify the messages sent to a single vertex, so they
 can't send messages to other vertices.  Any other work (i.e. sending
 messages) should be done by the vertex in the compute() method.

 While I think that grouping behavior could actually be implemented within a
 message object (still reducing the number of messages to 1 or 0) I suppose
 that in some simple cases (i.e. grouping), it might be easier by doing it in
 the combiner as you both have mentioned?  The only thing I suppose I'm
 concerned about is letting users do something that is not optimal.
  Generally, expanding messages is not what you want your combiner to do.
  Also, since grouping behavior can be implemented in the message object, it
 forces users to avoid shooting themselves in the foot.

 Good discussion (it's making me really think about this)!

 Avery


 On 1/10/12 10:32 AM, Claudio Martella wrote:

 Ok, now i see where you're going. I guess that the thing here is that
 the combiner would act like (on its behalf) D, and to do so
 concretely it would probably need some local data related to D (edges
 values? vertexvalue?).
 I also think that k  n is also possible in principle and we could let
 the user decide whether to use this power or not, once/if we agree
 that letting the user send k messages in the combiner is useful (and
 the grouping behavior shown by the label propagation example should do
 so).

 On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homanjgho...@gmail.com  wrote:

 Those two messages would have gone to D, been expanded to, say, 4,
 which would have then then been sent to, say, M.  This would save the
 sending of the two to D and send the 4 directly to M.  I'm not saying
 it's a great example, but it is legal.  This is of course assuming
 that combiners can generate messages bound for vertices other than the
 original destination, which I don't know if that has even been
 discussed.

 On Tue, Jan 10, 2012

[jira] [Created] (GIRAPH-122) Roll version back to 0.1

2012-01-09 Thread Jakob Homan (Created) (JIRA)
Roll version back to 0.1


 Key: GIRAPH-122
 URL: https://issues.apache.org/jira/browse/GIRAPH-122
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan


Per the vote on the list, we're going to roll Giraph back to 0.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-122) Roll version back to 0.1

2012-01-09 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182721#comment-13182721
 ] 

Jakob Homan commented on GIRAPH-122:


yep, looks like I could.  

 Roll version back to 0.1
 

 Key: GIRAPH-122
 URL: https://issues.apache.org/jira/browse/GIRAPH-122
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.1.0

 Attachments: GIRAPH-122.patch


 Per the vote on the list, we're going to roll Giraph back to 0.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: on the semantics of the combiner

2012-01-09 Thread Jakob Homan
I'm not a big fan of returning null as it adds extra complexity to the
calling code (null checks, or not, since people usually will forget
them).  Avery is correct that combiners are application specific.  Is
it conceivable that one would want to write a combiner that returned
something for an input of no parameters, ie combining the empty list
doesn't return the empty list?  I imagine for most combiners,
combining a single message would result in that message.

On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching ach...@apache.org wrote:
 The javadoc for VertexCombiner#combine() is

  /**
   * Combines message values for a particular vertex index.
   *
   * @param vertexIndex Index of the vertex getting these messages
   * @param msgList List of the messages to be combined
   * @return Message that is combined from {@link MsgList} or null if no
   *         message it to be sent
   * @throws IOException
   */

 I think we are somewhat vague on what a combiner can return to support
 various use cases.  A combiner should be particular to a particular
 compute() algorithm.  I think it should be legal to return null from a
 combiner, in that case, no message should be sent to that vertex.

 It seems like it would be an overhead to call a combiner when there are 0
 messages.  I can't see a case where that would be useful.  Perhaps we should
 change the javadoc to insure that msgList must contain at least one message
 to have combine() being called.

 Avery


 On 1/9/12 5:37 AM, Claudio Martella wrote:

 Hi Sebastian,

 yes, that was my point, I agree completely with you.
 Fixing my test was not the issue, my question was whether we want to
 define explicitly the semantics of this scenario.
 Personally, I believe the combiner should be ready to receive 0
 messages, as it's the case of BasicVertex::initialize(), putMessages()
 and compute(), and act accordingly.

 In the particular example, I believe the SimpleSumCombiner is bugged.
 It's true that the sum of no values is 0, but it's also true that the
 null return semantics of combine() is more suitable for this exact
 situation.


 On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelters...@apache.org  wrote:

 I think we currently implicitly assume that there is at least one
 element in the Iterable passed to the combiner. The messaging code only
 invokes the combiner only if at least one message for the target vertex
 has been sent.

 However, we should not rely on implicit implementation details but
 explicitly specify the semantics of combiners.

 --sebastian

 On 09.01.2012 13:29, Claudio Martella wrote:

 Hello list,

 for GIRAPH-45 I'm touching the incoming messages and hit an
 interesting problem with the combiner semantics.
 currently, my code fails testBspCombiner for the following reason:

 SimpleSumCombiner::compute() returns a value even if there are no
 messages in the iterator (in this case it returns 0) and for this
 reason the vertices get activated at each superstep.

 At each superstep, under-the-hood, I pass the combiner for each vertex
 an Iterable, which can be empty:

     public IterableM  getMessages(I vertexId) {
       IterableM  messages = inMessages.getMessages(vertexId);
       if (combiner != null) {
               M combinedMsg;
               try {
                       combinedMsg = combiner.combine(vertexId,
 messages);
               }  catch (IOException e) {
                       throw new RuntimeException(could not combine,
 e);
               }
               if (combinedMsg != null) {
                       ListM  tmp = new ArrayListM(1);
                       tmp.add(combinedMsg);
                       messages = tmp;
               } else {
                       messages = new ArrayListM(0);
               }
       }
       return messages;
     }

 the Iterable returned by this methods is passed to
 basicVertex.putMessages() right before the compute().
 Now, the question is: who's wrong? The combiner code that returns a
 sum of 0 over no values, or the framework that calls the combiner with
 0 messages?








Re: on the semantics of the combiner

2012-01-09 Thread Jakob Homan
 In my opinion that means reducing to a single message or none at all.
CA doesn't require this, however.  Hadoop's combiner interface, for
instance, doesn't require a single  or no value to be returned; it has
the same interface as a reducer, zero or more values.  Would adapting
the semantics of Giraph's combiner to return a list of messages
(possibly empty) make it more useful?

On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
claudio.marte...@gmail.com wrote:
 Yes, what is you say is completely reasonable, you convinced me :)

 On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching ach...@apache.org wrote:
 Combiners should be commutative and associative.  In my opinion that means
 reducing to a single message or none at all.  Can you think of a case when
 more than 1 message should be returned from a combiner?  I know that
 returning null isn't preferable in general, but I think that functionality
 (returning no messages), is nice to have and isn't a huge amount of work on
 our side.

 Avery


 On 1/9/12 12:13 PM, Claudio Martella wrote:

 To clarify, I was not discussing the possibility for combine to return
 null. I see why it would be useful, given that combine returns M,
 there's no other way to let combiner ask not to send any message,
 although i agree with Jakob, I also believe returning null should be
 avoided but only used, roughly, as an init value for a
 reference/pointer.
 Perhaps, we could, but i'm just thinking out loud here, let combine()
 return IterableM, basicallly letting it define what to combine to
 ({0, 1, k } messages). It would be a powerful extension to the model,
 but maybe it's too much.

 As far as the size of the messages parameter, I agree with you that 0
 messages gives nothing to combine and it would be somehow awkward, it
 was more a matter of synching it with the other methods getting the
 messages parameter.
 Probably, having a more clear javadoc will do the job here.

 What do you think?

 On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homanjgho...@gmail.com  wrote:

 I'm not a big fan of returning null as it adds extra complexity to the
 calling code (null checks, or not, since people usually will forget
 them).  Avery is correct that combiners are application specific.  Is
 it conceivable that one would want to write a combiner that returned
 something for an input of no parameters, ie combining the empty list
 doesn't return the empty list?  I imagine for most combiners,
 combining a single message would result in that message.

 On Mon, Jan 9, 2012 at 11:28 AM, Avery Chingach...@apache.org  wrote:

 The javadoc for VertexCombiner#combine() is

  /**
   * Combines message values for a particular vertex index.
   *
   * @param vertexIndex Index of the vertex getting these messages
   * @param msgList List of the messages to be combined
   * @return Message that is combined from {@link MsgList} or null if no
   *         message it to be sent
   * @throws IOException
   */

 I think we are somewhat vague on what a combiner can return to support
 various use cases.  A combiner should be particular to a particular
 compute() algorithm.  I think it should be legal to return null from a
 combiner, in that case, no message should be sent to that vertex.

 It seems like it would be an overhead to call a combiner when there are
 0
 messages.  I can't see a case where that would be useful.  Perhaps we
 should
 change the javadoc to insure that msgList must contain at least one
 message
 to have combine() being called.

 Avery


 On 1/9/12 5:37 AM, Claudio Martella wrote:

 Hi Sebastian,

 yes, that was my point, I agree completely with you.
 Fixing my test was not the issue, my question was whether we want to
 define explicitly the semantics of this scenario.
 Personally, I believe the combiner should be ready to receive 0
 messages, as it's the case of BasicVertex::initialize(), putMessages()
 and compute(), and act accordingly.

 In the particular example, I believe the SimpleSumCombiner is bugged.
 It's true that the sum of no values is 0, but it's also true that the
 null return semantics of combine() is more suitable for this exact
 situation.


 On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelters...@apache.org
  wrote:

 I think we currently implicitly assume that there is at least one
 element in the Iterable passed to the combiner. The messaging code
 only
 invokes the combiner only if at least one message for the target
 vertex
 has been sent.

 However, we should not rely on implicit implementation details but
 explicitly specify the semantics of combiners.

 --sebastian

 On 09.01.2012 13:29, Claudio Martella wrote:

 Hello list,

 for GIRAPH-45 I'm touching the incoming messages and hit an
 interesting problem with the combiner semantics.
 currently, my code fails testBspCombiner for the following reason:

 SimpleSumCombiner::compute() returns a value even if there are no
 messages in the iterator (in this case it returns 0) and for this
 reason the vertices get activated at each 

[jira] [Commented] (GIRAPH-119) VertexCombiner should work on IterableM instead of ListM

2012-01-06 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181522#comment-13181522
 ] 

Jakob Homan commented on GIRAPH-119:


+1.  The change log isn't usually modified as part of the patch, but as part of 
the commit, although I don't see a reason it would hurt, except perhaps 
conflicts in the file? 

 VertexCombiner should work on IterableM instead of ListM
 

 Key: GIRAPH-119
 URL: https://issues.apache.org/jira/browse/GIRAPH-119
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
 Attachments: GIRAPH-119.diff


 Currently VertexCombiner expects a ListM. It should be refactored to 
 IterableM to sync with Iterable-based BasicVertex messages logics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [jira] [Commented] (GIRAPH-119) VertexCombiner should work on IterableM instead of ListM

2012-01-06 Thread Jakob Homan
Maybe if changelog got modified (unlikely, I know).  Doesn't hurt
anything.  Also, for non-committer contributors, they shouldn't modify
the log since they don't know who may commit their code (ie bob via
joe)

On Fri, Jan 6, 2012 at 11:07 AM, Claudio Martella
claudio.marte...@gmail.com wrote:
 Ok, I didn't know. I usually do it as it requires 0-knowledge by a second 
 hand.

 Btw, why would it conflict?

 On Fri, Jan 6, 2012 at 8:05 PM, Jakob Homan (Commented) (JIRA)
 j...@apache.org wrote:

    [ 
 https://issues.apache.org/jira/browse/GIRAPH-119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181522#comment-13181522
  ]

 Jakob Homan commented on GIRAPH-119:
 

 +1.  The change log isn't usually modified as part of the patch, but as part 
 of the commit, although I don't see a reason it would hurt, except perhaps 
 conflicts in the file?

 VertexCombiner should work on IterableM instead of ListM
 

                 Key: GIRAPH-119
                 URL: https://issues.apache.org/jira/browse/GIRAPH-119
             Project: Giraph
          Issue Type: Improvement
          Components: graph
    Affects Versions: 0.70.0
            Reporter: Claudio Martella
            Assignee: Claudio Martella
         Attachments: GIRAPH-119.diff


 Currently VertexCombiner expects a ListM. It should be refactored to 
 IterableM to sync with Iterable-based BasicVertex messages logics.

 --
 This message is automatically generated by JIRA.
 If you think it was sent incorrectly, please contact your JIRA 
 administrators: 
 https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
 For more information on JIRA, see: http://www.atlassian.com/software/jira





 --
    Claudio Martella
    claudio.marte...@gmail.com


Re: Added stub Incubator report for January 2012

2012-01-04 Thread Jakob Homan
+1

On Wed, Jan 4, 2012 at 11:37 AM, Owen O'Malley o...@hortonworks.com wrote:
 On Wed, Jan 4, 2012 at 10:54 AM, Avery Ching ach...@apache.org wrote:

 Thanks Chris.  Looks good to me.


 It looks good to me too.

 -- Owen




[jira] [Commented] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

2011-12-20 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173537#comment-13173537
 ] 

Jakob Homan commented on GIRAPH-111:


bq. I'm not clear on why this is necessary.
I agree.  Hadoop's file formats, etc. are designed to be exceedingly forgiving 
and flexible as to the underlying storage mechanism.  Can you point to where 
they're lacking for Mesos' case?

bq. We could also copy out the relevant Hadoop I/O classes (InputFormat, 
OutputFormat, etc) into Giraph, rename their packages, and begin reworking them 
in an appropriate way to better suit Giraph.
-1.  Therein lies madness.


 Refactor I/O to be independent of Map/Reduce
 

 Key: GIRAPH-111
 URL: https://issues.apache.org/jira/browse/GIRAPH-111
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Ed Kohlwey

 The I/O mechanisms should probably be abstracted entirely from Map/Reduce in 
 order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-110) Add guide to setup the enviroment for running the unit tests in a pseudo-distributed hadoop instance

2011-12-20 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173631#comment-13173631
 ] 

Jakob Homan commented on GIRAPH-110:


Sorry to be late on this one, but I'd been meaning to ask if we should retire 
most of the README content in favor of the site documentation?  The content 
between the two was originally duplicated and is starting to drift...

 Add guide to setup the enviroment for running the unit tests in a 
 pseudo-distributed hadoop instance
 

 Key: GIRAPH-110
 URL: https://issues.apache.org/jira/browse/GIRAPH-110
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Minor
 Fix For: 0.70.0

 Attachments: GIRAPH-110.2.patch, GIRAPH-110.patch


 Giraph should provide a small guide for setting up the local environment to 
 run the unit tests in a pseudo-distributed hadoop instance as there are some 
 non-obvious hurdles to take.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-113) Change cast to Vertex used in prepareSuperstep() to BasicVertex

2011-12-20 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13173921#comment-13173921
 ] 

Jakob Homan commented on GIRAPH-113:


+1 (grumbling about GIRAPH-83)

 Change cast to Vertex used in prepareSuperstep() to BasicVertex
 ---

 Key: GIRAPH-113
 URL: https://issues.apache.org/jira/browse/GIRAPH-113
 Project: Giraph
  Issue Type: Bug
Reporter: Yuanyuan Tian
Assignee: Avery Ching
Priority: Minor
 Attachments: GIRAPH-113.patch


 Hi,
 I decided to use LongDoubleFloatDoubleVertex in a graph algorithm because it 
 uses more compact and efficient mahout collections. However I run into an 
 error when running the algorithm:
 java.lang.ClassCastException: 
 org.apache.giraph.graph.LongDoubleFloatDoubleVertex cannot be cast to 
 org.apache.giraph.graph.Vertex
 at 
 org.apache.giraph.comm.BasicRPCCommunications.prepareSuperstep(BasicRPCCommunications.java:1016)
 at 
 org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:843)
 at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:569)
 at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:728)
 ... 7 more
 Basically, the problem is that in BasicRPCCommunications.prepareSuperStep(), 
 the LongDoubleFloatDoubleVertex are cast to Vertex in the following code 
 fragment. But LongDoubleFloatDoubleVertex inherits from BasicVertex instead 
 of Vertex.
 if (vertex != null) {
((MutableVertexI, V, E, M) vertex).setVertexId(vertexIndex);
partition.putVertex((VertexI, V, E, M) vertex);
 } else if (originalVertex != null) {
   partition.removeVertex(originalVertex.getVertexId());
 }
 I did a simple change: cast LongDoubleFloatDoubleVertex to BasicVertex. The 
 problem went away, and the algorithm finished without any error. But I am not 
 sure this change has any implication to other parts of the code. So, I hope 
 to get some comments from the Giraph developers.
 if (vertex != null) {
((MutableVertexI, V, E, M) vertex).setVertexId(vertexIndex);
partition.putVertex((BasicVertexI, V, E, M) vertex);
 } else if (originalVertex != null) {
   partition.removeVertex(originalVertex.getVertexId());
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-57) Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together

2011-12-15 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-57?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13170605#comment-13170605
 ] 

Jakob Homan commented on GIRAPH-57:
---

Can we post the final patch, along with the I give this to Apache button?

 Add new RPC call (putVertexIdMessagesList) to batch putMsgList RPCs together
 

 Key: GIRAPH-57
 URL: https://issues.apache.org/jira/browse/GIRAPH-57
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Avery Ching
 Attachments: GIRAPH-57.diff


 Right now messages are sent to a vertex one at a time.  It would be good to 
 have a putMsgs call that could send messages to multiple vertices (all hosted 
 on the same worker).  We'd save a huge number of individual RPC calls at the 
 expense of having smaller calls with larger payloads.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements

2011-12-01 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161213#comment-13161213
 ] 

Jakob Homan commented on GIRAPH-100:


Please avoid formatting changes as part of code change patches.  They blow up 
the size of the patch and introduce a lot of what's the difference between 
these lines? Did anything change that needs to be reviewed?  For instance, 
most of the changes in SimpleCheckpointVertex appear to be spurious.

* What's the point of the changes in TextVertexInputFormat method visibility? 
Are they related to this patch?
* We're throwing a lot of Stringly typed exceptions. For more robust error 
handling and recovery, it may be good to strongly type these instead.
* re: SuperstepHashPartitionerFactory. Moving it out of test and into the 
example directory seems a bit counterproductive to me.  It's a pathological 
implementation; wouldn't it be better to provide a more useful example, rather 
than one that's explicitly not meant to be used?





 Data input sampling and testing improvements
 

 Key: GIRAPH-100
 URL: https://issues.apache.org/jira/browse/GIRAPH-100
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-100.patch


 It would be really nice to help debug an application by limiting the input 
 data (% of input splits, max vertices per input split).  Also, it would be 
 nice for the workers to provide a little more debugging info on how far along 
 they are with processing the input data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements

2011-12-01 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161236#comment-13161236
 ] 

Jakob Homan commented on GIRAPH-100:


bq. Which exceptions are you referring to?
{noformat}+throw new IllegalStateException(
+prepareSuperstep: Impossible that this worker  +
+service.getWorkerInfo() +  was sent  +
+entry.getValue().size() +  message(s) with  +
+vertex id  + entry.getKey() +
+ when it does not own this partition.  It should  +
+have gone to partition owner  +
+service.getVertexPartitionOwner(entry.getKey()) +
+.  The partition owners are  +
+service.getPartitionOwners());{noformat}
{noformat}+throw new IllegalStateException(
+prepareSuperstep: Impossible to not remove  +
+vertex);{noformat}
{noformat}+throw new IllegalStateException(
+coordinateSuperstep: Worker failed during input split  +
+(currently not supported));{noformat}
{noformat}+throw new IllegalStateException(
+barrierOnWorkerList: KeeperException -  +
+Couldn't get  + workerInfoHealthyPath, e);{noformat}
{noformat}+throw new IllegalStateException(
+loadVertices: KeeperException on  +
+inputSplitFinishedPath, e);{noformat}
etc. These are all specific types of exceptions being wrapped in 
IllegalStateException.  We'll likely want to catch and handle them later in an 
effort to be more robust. It'd be better if these existed as their own types, 
so we don't have to try to tease out the details later.
bq. Can we do that in another issue? I agree that it isn't a good example, but 
it's still a good test since it guarantees partition movement between workers.
I have trouble putting something that we agree is a bad example into the 
example directory. The issue is that it's not actually a unit test, since it 
involves Hadoop.  That makes it an integration test.  The best answer is to 
have integration tests in their own directory (and either bundled with the main 
jar or a separate integration test directory).  Since this verifies important 
behavior, the basic test itself should run without Hadoop, via mocking, and the 
ability to run it as an integration test under a real Hadoop maintained.

 Data input sampling and testing improvements
 

 Key: GIRAPH-100
 URL: https://issues.apache.org/jira/browse/GIRAPH-100
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-100.patch


 It would be really nice to help debug an application by limiting the input 
 data (% of input splits, max vertices per input split).  Also, it would be 
 nice for the workers to provide a little more debugging info on how far along 
 they are with processing the input data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-100) Data input sampling and testing improvements

2011-12-01 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13161264#comment-13161264
 ] 

Jakob Homan commented on GIRAPH-100:


bq. In the future, I'll file separate issues for formatting cleanup.
Great. This also gives us a steady supply of newbie JIRAs, since the latest 
batch is almost used up.

bq.We should file another JIRA to create a GiraphException and the various 
types I suppose. Or do you want me to do it in this JIRA?
Either in this JIRA, or put the current ones in with FIXME/TODO annotations so 
we can back and fix them easily, and then immediately open a new JIRA.

bq. but not sure how mocking can verify the behavior in this case.
It may end up being a challenge, but it's a strong guard against building up a 
huge number of integration tests, calling them unit tests and then having tests 
that run for four, six or nine hours (see: every other Hadoop ecosystem 
project).  Being able to swap out the backing dependency from a mock to a real 
Hadoop cluster is a great way to test quickly (ie, often) as well as test 
reality (ie, against a real cluster).  I'll take a look at making sure we have 
infrastructure that is amenable to this.

bq. we should file a separate JIRA for separating the tests into unittests 
(mocking, individual class tests) and integration tests, but integration tests 
can still be run locally and/or remote.
Can we go ahead and create test/integration as part of this JIRA and put 
SuperstepHashPartitionerFactory there? That way it doesn't go into the 
inappropriate examples directory but can still be bundled as part of the jar.  
The remaining partitioning can probably be done as part of GIRAPH-22.


 Data input sampling and testing improvements
 

 Key: GIRAPH-100
 URL: https://issues.apache.org/jira/browse/GIRAPH-100
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Reporter: Avery Ching
Assignee: Avery Ching
 Attachments: GIRAPH-100.patch


 It would be really nice to help debug an application by limiting the input 
 data (% of input splits, max vertices per input split).  Also, it would be 
 nice for the workers to provide a little more debugging info on how far along 
 they are with processing the input data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-45) Improve the way to keep outgoing messages

2011-11-28 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-45?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158734#comment-13158734
 ] 

Jakob Homan commented on GIRAPH-45:
---

It's certainly an intriguing idea to go with something like leveldb. This is 
obviously an area for lots of exploration and experimentation. As such, 
probably the best idea is to make the interface pluggable and keep a 
good-enough Java version as default.  It's probably time for a giraph-site.xml 
file to track these configuration possibilities.

 Improve the way to keep outgoing messages
 -

 Key: GIRAPH-45
 URL: https://issues.apache.org/jira/browse/GIRAPH-45
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Reporter: Hyunsik Choi
Assignee: Hyunsik Choi

 As discussed in GIRAPH-12(http://goo.gl/CE32U), I think that there is a 
 potential problem to cause out of memory when the rate of message generation 
 is higher than the rate of message flush (or network bandwidth).
 To overcome this problem, we need more eager strategy for message flushing or 
 some approach to spill messages into disk.
 The below link is Dmitriy's suggestion.
 https://issues.apache.org/jira/browse/GIRAPH-12?focusedCommentId=13116253page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13116253

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-51) Provide unit testing tool for Giraph algorithms

2011-11-28 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-51?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13158743#comment-13158743
 ] 

Jakob Homan commented on GIRAPH-51:
---

+1.

 Provide unit testing tool for Giraph algorithms
 ---

 Key: GIRAPH-51
 URL: https://issues.apache.org/jira/browse/GIRAPH-51
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Sebastian Schelter
 Attachments: GIRAPH-51-2.patch, GIRAPH-51-3.patch, GIRAPH-51.patch


 It would be nice to have a little tool, similar to MRUnit, that would allow 
 Giraph application writers to quickly unit test their algorithms.  The tool 
 could take a Vertex implementation, a set of input and expected output and 
 verify that after the specified number of supersteps, we've gotten what we 
 expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-99) Make AdjacencyListVertexReader and its constructor public

2011-11-22 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13155234#comment-13155234
 ] 

Jakob Homan commented on GIRAPH-99:
---

+1. Looks good.

 Make AdjacencyListVertexReader and its constructor public
 -

 Key: GIRAPH-99
 URL: https://issues.apache.org/jira/browse/GIRAPH-99
 Project: Giraph
  Issue Type: Wish
  Components: lib
Reporter: Kohei Ozaki
Priority: Minor
  Labels: patch
 Attachments: GIRAPH-99.diff


 Hi,
 I'd like to write a class inherited from AdjacencyListVertexReader
 to make a library using Giraph (like git.io/ALVR),
 but AdjacencyListVertexReader is a private class and its constructor are 
 private.
 I guess making it public is useful to handle a more complex input format
 specified by the data structure of algorithms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (GIRAPH-99) Make AdjacencyListVertexReader and its constructor public

2011-11-22 Thread Jakob Homan (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-99?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan resolved GIRAPH-99.
---

   Resolution: Fixed
Fix Version/s: 0.70.0
 Assignee: Kohei Ozaki

I've committed this. Resolving as fixed. Thanks for the contribution, Kohei!

 Make AdjacencyListVertexReader and its constructor public
 -

 Key: GIRAPH-99
 URL: https://issues.apache.org/jira/browse/GIRAPH-99
 Project: Giraph
  Issue Type: Wish
  Components: lib
Reporter: Kohei Ozaki
Assignee: Kohei Ozaki
Priority: Minor
  Labels: patch
 Fix For: 0.70.0

 Attachments: GIRAPH-99.diff


 Hi,
 I'd like to write a class inherited from AdjacencyListVertexReader
 to make a library using Giraph (like git.io/ALVR),
 but AdjacencyListVertexReader is a private class and its constructor are 
 private.
 I guess making it public is useful to handle a more complex input format
 specified by the data structure of algorithms.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (GIRAPH-84) Simplify boolean expressions in BspRecordReader

2011-11-20 Thread Jakob Homan (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13153951#comment-13153951
 ] 

Jakob Homan commented on GIRAPH-84:
---

The patch *must* be attached to the JIRA.  We need the little icon that says 
the contributor has given it to Apache.  Reviewboard is optional; the patch 
should always be attached to the JIRA first.

 Simplify boolean expressions in BspRecordReader
 ---

 Key: GIRAPH-84
 URL: https://issues.apache.org/jira/browse/GIRAPH-84
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Shaunak Kashyap
  Labels: newbie

 Twice in BspRecordReader boolean expressions are evaluated with == and can be 
 simplified to just one liners or variable evaluation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >