[jira] [Created] (GIRAPH-187) SequenceFileVertexInputFormat has WritableComparable as a bounded type for I

2012-04-18 Thread Jan van der Lugt (Created) (JIRA)
SequenceFileVertexInputFormat has WritableComparable as a bounded type for I
---

 Key: GIRAPH-187
 URL: https://issues.apache.org/jira/browse/GIRAPH-187
 Project: Giraph
  Issue Type: Bug
  Components: lib
Affects Versions: 0.2.0
Reporter: Jan van der Lugt
Priority: Minor


This is the first JIRA I ever file, so please let me know if I'm not doing this 
right.
Basically, SequenceFileVertexInputFormat has WritableComparable as a bounded 
type for I, while the Hadoop serializable data types implement 
WritableComparable. Because of this, I suspect TextVertexInputFormat only has 
WritableComparable as a bounded type for I and has a 
SuppressWarnings("rawtypes") annotation. I think SequenceFileVertexInputFormat 
should follow the same style, otherwise it's not possible to use, for example, 
IntComparable as vertex id type in a SequenceVertexInputFormat.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-186) Improve concurrency of putVertexList

2012-04-17 Thread Bo Wang (Created) (JIRA)
Improve concurrency of putVertexList


 Key: GIRAPH-186
 URL: https://issues.apache.org/jira/browse/GIRAPH-186
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.2.0
Reporter: Bo Wang
 Fix For: 0.2.0


It's pretty similar to GIRAPH-185. The whole inPartitionVertexMap is locked 
when there is a call to it. We should allow multiple calls adding different 
partitions to the same worker at the same time.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-185) Improve concurrency of putMsg / putMsgList

2012-04-17 Thread Bo Wang (Created) (JIRA)
Improve concurrency of putMsg / putMsgList
--

 Key: GIRAPH-185
 URL: https://issues.apache.org/jira/browse/GIRAPH-185
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.2.0
Reporter: Bo Wang
 Fix For: 0.2.0


Currently in putMsg / putMsgList, a synchronized closure is used to protect the 
whole transientInMessages when adding the new message. This lock prevents other 
concurrent calls to putMsg/putMsgList and increases the response time. We 
should use fine-grain locks to allow high concurrency in message communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-184) Upgrade to junit4

2012-04-14 Thread Devaraj K (Created) (JIRA)
Upgrade to junit4
-

 Key: GIRAPH-184
 URL: https://issues.apache.org/jira/browse/GIRAPH-184
 Project: Giraph
  Issue Type: Bug
Reporter: Devaraj K


Presently Giraph uses JUnit 3.8.1. We can upgrade to JUnit 4

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-183) Add Claudio's FOSDEM presentation (slides and video) to the site

2012-04-13 Thread Claudio Martella (Created) (JIRA)
Add Claudio's FOSDEM presentation (slides and video) to the site


 Key: GIRAPH-183
 URL: https://issues.apache.org/jira/browse/GIRAPH-183
 Project: Giraph
  Issue Type: Improvement
  Components: site
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Trivial


Presentation: 
http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/
Video: http://www.youtube.com/watch?v=3ZrqPEIPRe4, 
http://www.youtube.com/watch?v=BmRaejKGeDM

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat

2012-04-10 Thread Pradeep Gollakota (Created) (JIRA)
Provide SequenceFileVertexOutputFormat as an available OutputFormat
---

 Key: GIRAPH-182
 URL: https://issues.apache.org/jira/browse/GIRAPH-182
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Reporter: Pradeep Gollakota
Priority: Minor


SequenceFile's are heavily used in Hadoop. We should provide 
SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is already 
provided, it makes sense to also provide a mirroring OutputFormat

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-181) Add Hadoop 1.0 profile to pom.xml

2012-04-10 Thread Eugene Koontz (Created) (JIRA)
Add Hadoop 1.0 profile to pom.xml
-

 Key: GIRAPH-181
 URL: https://issues.apache.org/jira/browse/GIRAPH-181
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.2.0
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Fix For: 0.2.0


Hadoop 1.0.x is now considered the "current stable version" of Hadoop, 
according to http://hadoop.apache.org/common/releases.html#Download .

This JIRA is to add support within Giraph's maven profile for the 1.0.x Hadoop 
release. 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-180) Publish SNAPSHOTs and released artifacts in the Maven repository

2012-04-10 Thread Paolo Castagna (Created) (JIRA)
Publish SNAPSHOTs and released artifacts in the Maven repository


 Key: GIRAPH-180
 URL: https://issues.apache.org/jira/browse/GIRAPH-180
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: Paolo Castagna
Priority: Minor


Currently Giraph uses Maven to drive its build.
However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven 
repository or Maven central.
It would be useful to have Apache Giraph artifacts and SNAPSHOTs published and 
enable people to use Giraph without recompiling themselves.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-179) BspServiceMaster's PathFilter can be simplified

2012-04-07 Thread Jakob Homan (Created) (JIRA)
BspServiceMaster's PathFilter can be simplified
---

 Key: GIRAPH-179
 URL: https://issues.apache.org/jira/browse/GIRAPH-179
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial


{code}  /**
   * Only get the finalized checkpoint files
   */
  public static class FinalizedCheckpointPathFilter implements PathFilter {
@Override
public boolean accept(Path path) {
  if (path.getName().endsWith(
  BspService.CHECKPOINT_FINALIZED_POSTFIX)) {
return true;
  }
  return false;
}
  }{code}
we can simplify this, eliminating the if statement and just returning the 
result of {{endsWith()}}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-178) TestPredicate lock has lots of boolean expressions to be simplified

2012-04-07 Thread Jakob Homan (Created) (JIRA)
TestPredicate lock has lots of boolean expressions to be simplified
---

 Key: GIRAPH-178
 URL: https://issues.apache.org/jira/browse/GIRAPH-178
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial


TestPredicateLock.java has several instances of 
{code}assertTrue(gotPredicate == false);{code} (or {{== true}}) that can be 
simplified to more idiomatic Java.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-177) SimplePageRankVertex has two redundant casts

2012-04-07 Thread Jakob Homan (Created) (JIRA)
SimplePageRankVertex has two redundant casts


 Key: GIRAPH-177
 URL: https://issues.apache.org/jira/browse/GIRAPH-177
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial


{code}DoubleWritable maxPagerank =
(DoubleWritable) maxAggreg.getAggregatedValue();
LOG.info("aggregatedMaxPageRank=" + maxPagerank.get());
DoubleWritable minPagerank =
(DoubleWritable) minAggreg.getAggregatedValue();
LOG.info("aggregatedMinPageRank=" + minPagerank.get());{code}
Both MinAggregator and MaxAggregator are already parameterized on 
DoubleWritable, so it's not necessary to cast their functions' results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-176) BasicRPCCommunications has unnecessary cast of Vertex

2012-04-07 Thread Jakob Homan (Created) (JIRA)
BasicRPCCommunications has unnecessary cast of Vertex
-

 Key: GIRAPH-176
 URL: https://issues.apache.org/jira/browse/GIRAPH-176
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Minor


BasicRPCCommunications.java, 1224:
{code}  BasicVertex vertex =
  vertexResolver.resolve(vertexIndex,
  originalVertex,
  vertexMutations,
  messages);{code}
and then a few lines later at 1248:
{code}partition.putVertex((BasicVertex) vertex);{code}
vertex gets cast to its own type. This cast can be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-175) Replace manual array copy to utility method call

2012-04-07 Thread Jakob Homan (Created) (JIRA)
Replace manual array copy to utility method call


 Key: GIRAPH-175
 URL: https://issues.apache.org/jira/browse/GIRAPH-175
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial


{code}  String[] zkJavaOptsArray = zkJavaOptsString.split(" ");
  if (zkJavaOptsArray != null) {
for (String javaOpt : zkJavaOptsArray) {
  commandList.add(javaOpt);
}
  }{code}
Rather than doing the loop ourselves, Collections.addAll would be simpler (and 
faster, though that doesn't matter with such a small array).  Still cleaner, 
though.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-174) ConnectedComponentsVertex for loops can be replaced with for-each loops

2012-04-07 Thread Jakob Homan (Created) (JIRA)
ConnectedComponentsVertex for loops can be replaced with for-each loops
---

 Key: GIRAPH-174
 URL: https://issues.apache.org/jira/browse/GIRAPH-174
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial


{code}// First superstep is special, because we can simply look at the 
neighbors
if (getSuperstep() == 0) {
  for (Iterator edges = iterator(); edges.hasNext();) {
int neighbor = edges.next().get();
if (neighbor < currentComponent) {
  currentComponent = neighbor;
}
  }
  // Only need to send value if it is not the own id
  if (currentComponent != getVertexValue().get()) {
setVertexValue(new IntWritable(currentComponent));
for (Iterator edges = iterator();
edges.hasNext();) {
  int neighbor = edges.next().get();
  if (neighbor > currentComponent) {
sendMsg(new IntWritable(neighbor), getVertexValue());
  }
}
  }{code}
Both of the for loops in this chunk from ConnectedComponentsVertex can be 
replaced with for(IntWritable i : iterator()) loops to be more idiomatic.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-172) Javadoc for BasicVertex:compute link to compute is broken

2012-04-07 Thread Jakob Homan (Created) (JIRA)
Javadoc for BasicVertex:compute link to compute is broken
-

 Key: GIRAPH-172
 URL: https://issues.apache.org/jira/browse/GIRAPH-172
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Priority: Trivial


In BasicVertex the JavaDoc link to #compute can't be resolved:
{code} /**
   * Release unnecessary resources (will be called after vertex returns from
   * {@link #compute()})
   */
  abstract void releaseResources();{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-173) BspCase:getNumWorkers javadoc refers to non-existent parameter

2012-04-07 Thread Jakob Homan (Created) (JIRA)
BspCase:getNumWorkers javadoc refers to non-existent parameter
--

 Key: GIRAPH-173
 URL: https://issues.apache.org/jira/browse/GIRAPH-173
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Priority: Trivial


{code}  /**
   * Get the number of workers used in the BSP application
   *
   * @param numProcs number of processes to use
   */
  public int getNumWorkers() {
return numWorkers;
  }{code}
numProcs is a lie...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-171) total time in MasterThread.run() is calculated incorrectly

2012-04-05 Thread Eugene Koontz (Created) (JIRA)
total time in MasterThread.run() is calculated incorrectly
--

 Key: GIRAPH-171
 URL: https://issues.apache.org/jira/browse/GIRAPH-171
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz
Assignee: Eugene Koontz
 Attachments: GIRAPH-171.patch

While running PageMarkBenchMark, I was seeing in the output:

{{graph.MasterThread(172): total: Took 1.3336739262910001E9 seconds.}}

This was because currently, in {{MasterThread.run()}}, we have:

{code}
LOG.info("total: Took " +
 ((System.currentTimeMillis() / 1000.0d) -
 setupSecs) + " seconds.");
{code}

but it should be:

{code}
   LOG.info("total: Took " +
   ((System.currentTimeMillis() - startMillis) /
  1000.0d) + " seconds.");
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-170) Workflow for loading RDF graph data into Giraph

2012-04-05 Thread Dan Brickley (Created) (JIRA)
Workflow for loading RDF graph data into Giraph
---

 Key: GIRAPH-170
 URL: https://issues.apache.org/jira/browse/GIRAPH-170
 Project: Giraph
  Issue Type: New Feature
Reporter: Dan Brickley
Priority: Minor


W3C RDF provides a family of Web standards for exchanging graph-based data. RDF 
uses sets of simple binary relationships, labeling nodes and links with Web 
identifiers (URIs). Many public datasets are available as RDF, including the 
"Linked Data" cloud (see http://richard.cyganiak.de/2007/10/lod/ ). Many such 
datasets are listed at http://thedatahub.org/

RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple 
line-oriented format is N-Triples. A format aligned with RDF's SPARQL query 
language is Turtle. Apache Jena and Any23 provide software to handle all these; 
http://incubator.apache.org/jena/ http://incubator.apache.org/any23/

This JIRA leaves open the strategy for loading RDF data into Giraph. There are 
various possibilites, including exploitation of intermediate Hadoop-friendly 
stores, or pre-processing with e.g. Pig-based tools into a more Giraph-friendly 
form, or writing custom loaders. Even a HOWTO document or implementor notes 
here would be an advance on the current state of the art. The BluePrints Graph 
API (Gremlin etc.) has also been aligned with various RDF datasources.

Related topics: multigraphs https://issues.apache.org/jira/browse/GIRAPH-141 
touches on the issue (since we can't currently easily represent fully general 
RDF graphs since two nodes might be connected by more than one typed edge). 
Even without multigraphs it ought to be possible to bring RDF-sourced data
into Giraph, e.g. perhaps some app is only interested in say the Movies + 
People subset of a big RDF collection.

>From Avery in email: "a helper VertexInputFormat (and maybe 
>VertexOutputFormat) would certainly [despite GIRAPH-141] still help"



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-169) How to close all child when a job finished?

2012-03-23 Thread Jianfeng Qian (Created) (JIRA)
How to close all child when a job finished?
---

 Key: GIRAPH-169
 URL: https://issues.apache.org/jira/browse/GIRAPH-169
 Project: Giraph
  Issue Type: Improvement
  Components: mapreduce
Affects Versions: 0.2.0
 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 
slaves,
Reporter: Jianfeng Qian
Priority: Minor


I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in slaves 
didn't quit immediately and sometimes they never quit and I have to kill them. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE rather than HADOOP_FACEBOOK and HADOOP_NON_SECURE

2012-03-21 Thread Eugene Koontz (Created) (JIRA)
Simplify munge directive usage with new munge flag HADOOP_SECURE rather than 
HADOOP_FACEBOOK and HADOOP_NON_SECURE
--

 Key: GIRAPH-168
 URL: https://issues.apache.org/jira/browse/GIRAPH-168
 Project: Giraph
  Issue Type: Improvement
Reporter: Eugene Koontz


This JIRA relates to the mail thread here: 

http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser

Currently we check for the munge flags HADOOP and HADOOP_FACEBOOK and 
HADOOP_NON_SECURE when using munge in a few places. Hopefully we can eliminate 
usage of munge in the future, but until then, we can mitigate the complexity by 
consolidating the number of flags checked. This JIRA proposes a single flag, 
HADOOP_SECURE, to handle the same conditional compilation requirements. It also 
makes it easier to add more maven profiles so that we can easily increase our 
hadoop version coverage.

This patch modifies the existing hadoop_facebook profile to use the new 
HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK.

It also adds a new hadoop maven profile, hadoop_trunk, which also sets 
HADOOP_SECURE. 

Finally, it adds a default profile, hadoop_0.20.203. This is needed so that we 
can specify its dependencies separately from hadoop_trunk, because the hadoop 
dependencies have changed between trunk and 0.205.0 - the former requires 
hadoop-common, hadoop-mapreduce-client-core, and 
hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. 

With this patch, the following passes:

{code}
mvn clean verify && mvn -Phadoop_trunk clean verify && mvn -Phadoop_0.20.203 
clean verify
{code}

Current problems: 

* I left in place the usage of HADOOP_NON_SECURE, but note that the profile 
that uses this is hadoop_non_secure, which fails to compile on trunk: 
https://issues.apache.org/jira/browse/GIRAPH-167 .

* I couldn't get -Phadoop_facebook to work; does this work outside of Facebook?



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-167) mvn -Phadoop_non_secure clean verify fails

2012-03-21 Thread Eugene Koontz (Created) (JIRA)
mvn -Phadoop_non_secure clean verify fails
--

 Key: GIRAPH-167
 URL: https://issues.apache.org/jira/browse/GIRAPH-167
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz


The {{hadoop_non_secure}} profile, which uses hadoop 0.20.2, is failing to 
compile:

{code}
[ERROR] COMPILATION ERROR : 
[INFO] -
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/RangePartitionOwner.java:[26,27]
 package org.apache.hadoop.io does not exist
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/BasicPartitionOwner.java:[26,29]
 package org.apache.hadoop.conf does not exist
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/BasicPartitionOwner.java:[27,29]
 package org.apache.hadoop.conf does not exist
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/PartitionOwner.java:[22,27]
 package org.apache.hadoop.io does not exist
[ERROR] 
/Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/PartitionOwner.java:[27,40]
 cannot find symbol
symbol: class Writable
{code}

(more error messages follow)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-166) add '*.patch' to list of files that Apache Rat ignores

2012-03-21 Thread Eugene Koontz (Created) (JIRA)
add '*.patch' to list of files that Apache Rat ignores
--

 Key: GIRAPH-166
 URL: https://issues.apache.org/jira/browse/GIRAPH-166
 Project: Giraph
  Issue Type: Improvement
Reporter: Eugene Koontz
Priority: Trivial
 Attachments: GIRAPH-166.patch

Apache Rat will complain about "too many files without licenses" if it finds 
any *.patch files in your working directory. Rat should ignore these since they 
are temp files that aren't included in the distribution.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-165) checkstyle error: 'conf'hides a field' on line 154 of GraphRunner

2012-03-21 Thread Eugene Koontz (Created) (JIRA)
checkstyle error: 'conf'hides a field' on line 154 of GraphRunner
-

 Key: GIRAPH-165
 URL: https://issues.apache.org/jira/browse/GIRAPH-165
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz
Priority: Minor


full checkstyle error is 
{code}



{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-164) fix 5 "Line is longer than 80 characters" style errors in GiraphRunner

2012-03-21 Thread Eugene Koontz (Created) (JIRA)
fix 5 "Line is longer than 80 characters" style errors in GiraphRunner
--

 Key: GIRAPH-164
 URL: https://issues.apache.org/jira/browse/GIRAPH-164
 Project: Giraph
  Issue Type: Bug
Reporter: Eugene Koontz
Priority: Trivial


{code}

  
  
  
  

{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-163) bin/giraph script overwrites CLASSPATH if "dev environment" detected (this also removes USER_JAR from CLASSPATH)

2012-03-21 Thread Benjamin Heitmann (Created) (JIRA)
bin/giraph script overwrites CLASSPATH if "dev environment" detected (this also 
removes USER_JAR from CLASSPATH)


 Key: GIRAPH-163
 URL: https://issues.apache.org/jira/browse/GIRAPH-163
 Project: Giraph
  Issue Type: Improvement
  Components: conf and scripts
Affects Versions: 0.1.0, 0.2.0
 Environment: current trunk of giraph, after running "mvn compile" (as 
advised in the quick start guide). 
Also Hadoop 1.0.1 was used. 
Reporter: Benjamin Heitmann


If no ./lib dir is present, then the bin/giraph script assumes it is running in 
a "dev environment". 
This chooses an execution path through the bin/giraph script, which overwrites 
the CLASSPATH variable instead of appending to it. 

Incidentally, this also removes the name of the jar submitted by the user, 
which got appended to CLASSPATH earlier in the script. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-162) BspCase.setup() should catch FileNotFoundException thrown from org.apache.hadoop.fs.FileSystem.listStatus()

2012-03-20 Thread Eugene Koontz (Created) (JIRA)
BspCase.setup() should catch FileNotFoundException thrown from 
org.apache.hadoop.fs.FileSystem.listStatus()
---

 Key: GIRAPH-162
 URL: https://issues.apache.org/jira/browse/GIRAPH-162
 Project: Giraph
  Issue Type: Bug
  Components: test
Reporter: Eugene Koontz


In hadoop trunk, org.apache.hadoop.fs.FileSystem.listStatus() is declared to 
throws both FileNotFoundException and IOException. The former 
(FileNotFoundException) is currently not caught when BspCase.setup() looks for 
the GiraphJob.ZOOKEEPER_MANAGER_DIR_DEFAULT directory in order to delete it. 
The listStatus() call throws FileNotException if this directory does not exist 
and causes several tests to fail when using Hadoop trunk. This exception should 
be caught and ignored during setup(), since it's not an error for this 
directory not to exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-161) Handling null messages and edges when initializing IntIntNullIntVertex

2012-03-20 Thread Dionysios Logothetis (Created) (JIRA)
Handling null messages and edges when initializing IntIntNullIntVertex
--

 Key: GIRAPH-161
 URL: https://issues.apache.org/jira/browse/GIRAPH-161
 Project: Giraph
  Issue Type: Bug
  Components: graph
Affects Versions: 0.1.0
Reporter: Dionysios Logothetis
 Attachments: GIRAPH-161.patch

The initialize() method in org.apache.giraph.graph.IntIntNullIntVertex should 
handle null messages or null edges. Especially initializing with null messages 
is a common case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-160) Vertex reader that reads adjacency lists with no vertex and edge values associated

2012-03-20 Thread Dionysios Logothetis (Created) (JIRA)
Vertex reader that reads adjacency lists with no vertex and edge values 
associated
--

 Key: GIRAPH-160
 URL: https://issues.apache.org/jira/browse/GIRAPH-160
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Affects Versions: 0.1.0
Reporter: Dionysios Logothetis
Priority: Minor


A very common format of graphs is adjacency lists with no values associated to 
edges or vertices. For instance a line in the input can be of the type:
1 2 3
which represents a vertex with id 1 that has edges to vertices 2 and 3 with no 
values associated.

I've created a vertex reader named AdjacencyListVertexReader which is 
essentially a copy of the AdjacencyListVertexReader modified to handle this 
format. It's an abstract class and subclasses can override the 
defaultVertexValue() and defaultEdgeValue() methods to provide default values 
for vertices and edges correspondingly (otherwise values are initialized to 
null).

I've also created an example subclass.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.

2012-03-19 Thread Brian Femiano (Created) (JIRA)
Case insensitive file/directory name matching will produce errors on M/R jar 
unpack. 
-

 Key: GIRAPH-159
 URL: https://issues.apache.org/jira/browse/GIRAPH-159
 Project: Giraph
  Issue Type: Bug
  Components: build
Affects Versions: 0.2.0
 Environment: OSX 10.6.8
Reporter: Brian Femiano


This only seems to affect platforms where there can be a file/directory naming 
conflicts
from case insensitive matches. 
 
I was able to reproduce running the pseudo-distributed unit tests within OSX.

This has affected other projects: 
https://issues.apache.org/jira/browse/MAHOUT-780

I've been able to reproduce this on my local OSX install with the following 
error:
https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8

Since LICENSE.txt contains the same content as the file LICENSE, I propose we 
exclude any LICENSE matches found in the unpacked dependency jars
when the maven assembly phase hits 'jar-with-dependencies'. 

I have a patch which moves the 'jar-with-dependencies' descriptor to an 
external compile.xml file which has the proper excludes. This might also
come in handy down the road should any additional tweaks be needed to the 
compile phase. 




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-158) Support YARN (next generation MapReduce)

2012-03-17 Thread Eugene Koontz (Created) (JIRA)
Support YARN (next generation MapReduce)


 Key: GIRAPH-158
 URL: https://issues.apache.org/jira/browse/GIRAPH-158
 Project: Giraph
  Issue Type: New Feature
Reporter: Eugene Koontz


YARN is a re-architecturing of the Hadoop MapReduce framework, described here:
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html

It would be good to offer support within Giraph for this framework. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-157) Vertex to perform graph coloring on simple, connected, undirected graphs and related test.

2012-03-17 Thread Eli Reisman (Created) (JIRA)
Vertex to perform graph coloring on simple, connected, undirected graphs and 
related test.
--

 Key: GIRAPH-157
 URL: https://issues.apache.org/jira/browse/GIRAPH-157
 Project: Giraph
  Issue Type: Test
  Components: examples, test
Affects Versions: 0.2.0
Reporter: Eli Reisman
Assignee: Eli Reisman
Priority: Trivial


Hi. I am attempting to learn the Hadoop and Giraph codebases and wanted to 
write a simple client application for Giraph to help me learn the ins and outs 
of it. This is a simple unit test and vertex modeled after the 
ConnectedComponentsVertex and related test. The vertex test runs whenever you 
run the "mvn test" or "mvn verify" suite of tests. When finished processing, 
each vertex will have an integer value that is its color.

This is a pretty simple implementation, and although I have tested it on a 
number of small graphs of varied trickiness and it seems to rapidly arrive at a 
minimal coloring, its hard (for me at least) to guess which possible coloring 
it will arrive at and I have no idea how it will do on really big graphs yet 
without finding some more pre-colored larger test graphs to try it on. Ideas 
anyone?

Anyway, it was fun to put this together, and I'd be happy to improve it or 
receive some help or advice to further the cause. Thanks again, I am hoping 
this will be the first of many (hopefully more useful) contributions!

Eli

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner

2012-03-15 Thread Sebastian Schelter (Created) (JIRA)
Users should be able to set simple 'custom arguments' via 
org.apache.giraph.GiraphRunner


 Key: GIRAPH-156
 URL: https://issues.apache.org/jira/browse/GIRAPH-156
 Project: Giraph
  Issue Type: Improvement
  Components: conf and scripts
Affects Versions: 0.1.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter


Some vertices need custom arguments to run. The SimpleShortestPathsVertex for 
example needs to know the source vertex for the computation which is saved in 
the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should 
be able to apply such simple custom arguments via GiraphRunner. 

I propose to add a new option _--customArguments_ where users can supply 
arguments in the form _=,=_ for this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-155) Allow creation of graph by adding edges that span multiple workers

2012-03-15 Thread Dionysios Logothetis (Created) (JIRA)
Allow creation of graph by adding edges that span multiple workers
--

 Key: GIRAPH-155
 URL: https://issues.apache.org/jira/browse/GIRAPH-155
 Project: Giraph
  Issue Type: New Feature
  Components: graph, lib
Affects Versions: 0.1.0
Reporter: Dionysios Logothetis


Currently a graph is created only be adding vertices. The typical way is to 
read input text files line-by-line with each line describing a vertex (its 
value, its edges etc). The current API allows for the creation of a vertex only 
if all the information for the vertex is available in a single line.

However, it's common to have graphs described in the form of edges. Edges might 
span multiple lines in an input file or even span multiple workers. The current 
API doesn't allow this. In the input superstep, a vertex must be created by a 
single worker.

Instead, it should be possible for multiple workers to mutate the graph during 
the input superstep.

This has the following implications:
(i) Instead of just instantiating a vertex, a vertex reader should be able to 
do vertex addition and edge addition requests.
(ii) Multiple workers might try to create the same vertex. Any conflicts should 
be handled with a VertexResolver. So the resolver has to be instantiated before 
load time.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-154) Worker ports are not synched properly with its peers

2012-03-14 Thread Zhiwei Gu (Created) (JIRA)
Worker ports are not synched properly with its peers


 Key: GIRAPH-154
 URL: https://issues.apache.org/jira/browse/GIRAPH-154
 Project: Giraph
  Issue Type: Bug
  Components: bsp
Affects Versions: 0.2.0
Reporter: Zhiwei Gu
Assignee: Zhiwei Gu


When worker trying multiple ports to setup the rpc server, the final port is 
not synched with it's peer workers properly, and resulted in peer workers send 
message to the default port.

Here is some logs:


Base port: 34900



log for worker 161:

IPC Server handler 98 on 36061: starting
BasicRPCCommunications: Started RPC communication server: 
gsta32085.tan.ygrid.yahoo.com/10.216.148.47:36061 with 100 handlers and 199 
flush threads on bind attempt 1
IPC Server handler 99 on 36061: starting
setup: Registering health of this worker...
getJobState: Job state already exists 
(/_hadoopBsp/job_201203130609_14838/_masterJobState)
getApplicationAttempt: Node 
/_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists!
getApplicationAttempt: Node 
/_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists!
registerHealth: Created my health node for attempt=0, superstep=-1 with 
/_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta32085.tan.ygrid.yahoo.com_161
 and workerInfo= Worker(hostname=gsta32085.tan.ygrid.yahoo.com, 
MRpartition=161, port=35061)
process: partitionAssignmentsReadyChanged (partitions are assigned)
startSuperstep: Ready for computation on superstep -1 since worker selection 
and vertex range assignments are done in 
/_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_partitionAssignments
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 0 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 1 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 2 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 3 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 4 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 5 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 6 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 7 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 8 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 9 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 10 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 11 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 12 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 13 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 14 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 15 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 16 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 17 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 18 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 19 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 20 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 21 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 22 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 23 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 24 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 25 time(s).
Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. 
Already tried 26 time(s).
Retrying connect to s

[jira] [Created] (GIRAPH-153) HBase/Accumulo Input and Output formats

2012-03-07 Thread Brian Femiano (Created) (JIRA)
HBase/Accumulo Input and Output formats
---

 Key: GIRAPH-153
 URL: https://issues.apache.org/jira/browse/GIRAPH-153
 Project: Giraph
  Issue Type: New Feature
  Components: bsp
Affects Versions: 0.1.0
 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB
Reporter: Brian Femiano
 Attachments: AccumuloRootMarker.java, 
AccumuloRootMarkerInputFormat.java, AccumuloRootMarkerOutputFormat.java, 
AccumuloVertexInputFormat.java, AccumuloVertexOutputFormat.java, 
ComputeIsRoot.java, DistributedCacheHelper.java, HBaseVertexInputFormat.java, 
HBaseVertexOutputFormat.java, IdentifyAndMarkRoots.java, SetLongWritable.java, 
SetTextWritable.java, TableRootMarker.java, TableRootMarkerInputFormat.java, 
TableRootMarkerOutputFormat.java

Four abstract classes that wrap their respective delegate input/output formats 
for
easy hooks into vertex input format subclasses. I've included some sample 
programs that show two very simple graph
algorithms. I have a graph generator that builds out a very simple direct 
structure, starting with a few 'root' nodes.

Root nodes are defined as nodes that is not listed as a child anywhere in the 
graph. 

Algorithm 1) AccumuloRootMarker.java  --> Accumulo as read/write source. Every 
vertex starts thinking it's a root. At superstep 0, send a message down to each
child as a non-root notification. After superstep 1, only root nodes will have 
never been messaged. 

Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by 
bundling the notification logic followed by root node propagation. Once we've 
marked the appropriate nodes as roots, tell every child which roots it can be 
traced back to via one or more spanning trees. This will take N + 2 supersteps 
where N is the maximum number of hops from any root to any leaf, plus 2 
supersteps for the initial root flagging. 

I've included all relevant code plus DistributedCacheHelper.java for recursive 
cache file and archive searches. It is more hadoop centric than giraph in 
particular, but these jobs use it so I figured why not commit here. 

These have been tested through local JobRunner, pseudo-distributed on the 
aforementioned hardware, and full distributed on EC2. More details in the 
comments.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-152) NPE at PageRankBenchmark

2012-02-20 Thread Yury Litvinov (Created) (JIRA)
NPE at PageRankBenchmark


 Key: GIRAPH-152
 URL: https://issues.apache.org/jira/browse/GIRAPH-152
 Project: Giraph
  Issue Type: Bug
  Components: examples
Affects Versions: 0.2.0
 Environment: Hadoop-0.20.205.0
Linux: Amazon EC2, standard one "Amazon Linux 32 bit"
Giraph: compiled from CL 1245205
Reporter: Yury Litvinov


1. I've copied hadoop-0.20.205.0 into Amazon EC2 linux
2. Compiled latest Giraph (giraph-0.2-SNAPSHOT-jar-with-dependencies.jar) from 
sources (CL 1245205) and copied it to Linux as well: 
3. Run this command as suggested in docs 
(https://cwiki.apache.org/confluence/display/GIRAPH/Quick+Start+Guide)

> hadoop-0.20.205.0/bin/hadoop jar 
> giraph-0.2-SNAPSHOT-jar-with-dependencies.jar 
> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5 -w 1

OBSERVED:
{code} 
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:127)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at 
org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:162)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
{code} 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-151) IntIntNullIntVertex.initialize method should handle "null" edges argument

2012-02-19 Thread yavuz gokirmak (Created) (JIRA)
IntIntNullIntVertex.initialize method should handle "null" edges argument
-

 Key: GIRAPH-151
 URL: https://issues.apache.org/jira/browse/GIRAPH-151
 Project: Giraph
  Issue Type: Bug
  Components: graph
Affects Versions: 0.1.0
 Environment: Linux 2.6.18-028stab095.1-PAE
Reporter: yavuz gokirmak
Priority: Trivial
 Fix For: 0.1.0


IntIntNullIntVertex.initialize method should handle "null" edges argument
because in VertexResolver.resolve method (line:91) vertex.initialize is called 
with edge argument as null

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-150) PageRankBenchmark accesses wrong conf after GiraphJob is created

2012-02-16 Thread Avery Ching (Created) (JIRA)
PageRankBenchmark accesses wrong conf after GiraphJob is created


 Key: GIRAPH-150
 URL: https://issues.apache.org/jira/browse/GIRAPH-150
 Project: Giraph
  Issue Type: Bug
Reporter: Avery Ching
Assignee: Avery Ching




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-149) Clone Vertex on loading

2012-02-14 Thread Zechao Shang (Created) (JIRA)
Clone Vertex on loading
---

 Key: GIRAPH-149
 URL: https://issues.apache.org/jira/browse/GIRAPH-149
 Project: Giraph
  Issue Type: Bug
  Components: bsp
Affects Versions: 0.1.0, 0.2.0
Reporter: Zechao Shang
Priority: Minor


AFAIK, it's a documented behavior that Hadoop io reuses instance on loading 
data. 
Check BspServiceWorker#readVerticesFromInputSplit, readerVertex maybe reused by 
RecordReader(at least our SequenceFileVertexReader do), and must be cloned 
somewhere.
In my opinion, our inherited RecordReaders should follow the behavior of 
Hadoop's RecordReader, and the vertex should be cloned in 
BspServiceWorker#readVerticesFromInputSplit. Just calling 
org.apache.hadoop.io.WritableUtils.clone will be fine.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-148) giraph-site.xml needs Apache head

2012-02-10 Thread Jakob Homan (Created) (JIRA)
giraph-site.xml needs Apache head
-

 Key: GIRAPH-148
 URL: https://issues.apache.org/jira/browse/GIRAPH-148
 Project: Giraph
  Issue Type: Bug
  Components: conf and scripts
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.1.0
 Attachments: GIRAPH-148.patch

I forgot to add the license to the conf file and now rat is failing...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-147) Add Blueprints Tinkerpop support

2012-02-10 Thread Avery Ching (Created) (JIRA)
Add Blueprints Tinkerpop support


 Key: GIRAPH-147
 URL: https://issues.apache.org/jira/browse/GIRAPH-147
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Priority: Minor


Got this issue on the old Giraph GitHub (deprecated).  Moving it here.

jeffg2k opened this issue 2 hours ago
Hoping that Giraph might add TinkerPop Blueprint support. :)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-146) Maven is running the tests twice during builds

2012-02-09 Thread Jakob Homan (Created) (JIRA)
Maven is running the tests twice during builds
--

 Key: GIRAPH-146
 URL: https://issues.apache.org/jira/browse/GIRAPH-146
 Project: Giraph
  Issue Type: Bug
  Components: build
Reporter: Jakob Homan


I had a feeling the build time had jumped significantly... 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-145) Change partition request log level to debug rather than info

2012-02-09 Thread Jakob Homan (Created) (JIRA)
Change partition request log level to debug rather than info


 Key: GIRAPH-145
 URL: https://issues.apache.org/jira/browse/GIRAPH-145
 Project: Giraph
  Issue Type: Improvement
  Components: bsp
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0


{code:title=BasicRPCCommunications.java|borderStyle=solid}
if (LOG.isInfoEnabled()) {
LOG.info("sendPartitionReq: Sending to " + rpcProxy.getName() +
 " " + addr + " from " + workerInfo +
 ", with partition " + partition);
}{code}
is too chatty.  We're seeing thousands and sounds of these lines for larger 
graphs.  This should be at debug level...


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-144) GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc)

2012-02-08 Thread Dave (Created) (JIRA)
GiraphJob should not extend Job  (users should not be able to call Job methods 
like waitForCompletion or setMapper..etc)


 Key: GIRAPH-144
 URL: https://issues.apache.org/jira/browse/GIRAPH-144
 Project: Giraph
  Issue Type: Bug
Reporter: Dave




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-143) Add support for giraph to have a conf file

2012-02-08 Thread Jakob Homan (Created) (JIRA)
Add support for giraph to have a conf file
--

 Key: GIRAPH-143
 URL: https://issues.apache.org/jira/browse/GIRAPH-143
 Project: Giraph
  Issue Type: New Feature
Affects Versions: 0.2.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.2.0


Currently one must provide all the Giraph-specific config values either via the 
command line or snuck into another project's conf file.  Any self-respecting 
Hadoop ecosystem project should have its own conf file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-142) _hadoopBsp should be prefixable via configuration

2012-02-07 Thread Jakob Homan (Created) (JIRA)
_hadoopBsp should be prefixable via configuration
-

 Key: GIRAPH-142
 URL: https://issues.apache.org/jira/browse/GIRAPH-142
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan


In multitennant zookeeper clusters, it would be good to be able to specify the 
base directory that's created for the _hadoopBsp znodes.  This would also fix 
the issue we have with creating that directory in the source root during tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-141) mulitgraph support in giraph

2012-02-04 Thread Created
mulitgraph support in giraph


 Key: GIRAPH-141
 URL: https://issues.apache.org/jira/browse/GIRAPH-141
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: André Kelpe


The current vertex API only supports simple graphs, meaning that there can only 
ever be one edge between two vertices. Many graphs like the road network are in 
fact multigraphs, where many edges can connect two vertices at the same time.

Support for this could be added by introducing an Iterator 
getEdgeValue() or a similar construct. Maybe introducing a slim object like a 
Connector between the edge and the vertex is also a good idea, so that you 
could do something like:

for (final Connector conn: getEdgeValues(){
 final EdgeWritable edge = conn.getEdge();
 final VertexWritable otherVertex = conn.getOther();
// do interesting stuff
}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-140) Enforce a maximum number of iterations

2012-01-31 Thread Jakob Homan (Created) (JIRA)
Enforce a maximum number of iterations
--

 Key: GIRAPH-140
 URL: https://issues.apache.org/jira/browse/GIRAPH-140
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan


While Giraph is still on MR and keeping track of its statistics via Hadoop's 
counters, there is the danger that a huge number of iterations will negatively 
impact the cluster's jobtracker by adding counter statistics for each one 
(basically, the flip side of GIRAPH-52).  We should have a configurable maximum 
number of iterations to prevent this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-139) Change PageRankBenchmark to use be accessible via bin/giraph

2012-01-31 Thread Jakob Homan (Created) (JIRA)
Change PageRankBenchmark to use be accessible via bin/giraph


 Key: GIRAPH-139
 URL: https://issues.apache.org/jira/browse/GIRAPH-139
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan


Currently the PageRankBenchmark has its own main and tool implementation and is 
difficult to access from the bin/giraph script.  It would be better if 
everything were accessible via bin/giraph.  The benchmark is particularly 
problematic because it uses inner classes for its two actual Vertex 
implementations, which have to be specified on the command line as their .class 
name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) 
rather than just with dots, as one would expect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-138) Don't throw stack trace for classes that aren't vertices

2012-01-31 Thread Jakob Homan (Created) (JIRA)
Don't throw stack trace for classes that aren't vertices


 Key: GIRAPH-138
 URL: https://issues.apache.org/jira/browse/GIRAPH-138
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan


Currently if one passes in a class that isn't a vertex, we throw up a complete 
stack trace:
{noformat}[tardis giraph-0.1]$ bin/giraph lib/giraph-0.1.jar 
org.apache.giraph.benchmark.PageRankBenchmark -w 10 -if 
org.apache.giraph.benchmark.PseudoRandomVertexInputFormat
Exception in thread "main" java.lang.RuntimeException: class 
org.apache.giraph.benchmark.PageRankBenchmark not 
org.apache.giraph.graph.BasicVertex
at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:858)
at org.apache.giraph.graph.GiraphJob.setVertexClass(GiraphJob.java:395)
at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:132)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156){noformat}
This type of user error is routine and should be caught and result in a more 
descriptive error message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-137) De-duplicate pagerank implementation in PageRankBenchmark

2012-01-31 Thread Jakob Homan (Created) (JIRA)
De-duplicate pagerank implementation in PageRankBenchmark
-

 Key: GIRAPH-137
 URL: https://issues.apache.org/jira/browse/GIRAPH-137
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Minor


Currently in PageRankBenchmark we have the code for pagerank duplicated in each 
of the implementations of Vertex:
{noformat}public static class PageRankHashMapVertex extends HashMapVertex<
LongWritable, DoubleWritable, DoubleWritable, DoubleWritable> {
@Override
public void compute(Iterator msgIterator) {
if (getSuperstep() >= 1) {
double sum = 0;
while (msgIterator.hasNext()) {
sum += msgIterator.next().get();
}
DoubleWritable vertexValue =
new DoubleWritable((0.15f / getNumVertices()) + 0.85f *
   sum);
setVertexValue(vertexValue);
}

if (getSuperstep() < getConf().getInt(SUPERSTEP_COUNT, -1)) {
long edges = getNumOutEdges();
sendMsgToAllEdges(
new DoubleWritable(getVertexValue().get() / edges));
} else {
voteToHalt();
}
}
}

public static class PageRankEdgeListVertex extends EdgeListVertex<
LongWritable, DoubleWritable, DoubleWritable, DoubleWritable> {
@Override
public void compute(Iterator msgIterator) {
if (getSuperstep() >= 1) {
double sum = 0;
while (msgIterator.hasNext()) {
sum += msgIterator.next().get();
}
DoubleWritable vertexValue =
new DoubleWritable((0.15f / getNumVertices()) + 0.85f *
   sum);
setVertexValue(vertexValue);
}

if (getSuperstep() < getConf().getInt(SUPERSTEP_COUNT, -1)) {
long edges = getNumOutEdges();
sendMsgToAllEdges(
new DoubleWritable(getVertexValue().get() / edges));
} else {
voteToHalt();
}
}
}{noformat}
This code can be consolidated into private class and the two implementations 
just extend that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-136) Erorr message for bin/giraph could be improved

2012-01-31 Thread Jakob Homan (Created) (JIRA)
Erorr message for bin/giraph could be improved
--

 Key: GIRAPH-136
 URL: https://issues.apache.org/jira/browse/GIRAPH-136
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan


Currently when one just runs bin/giraph without the required jar, the message 
isn't very helpful:
{noformat}[tardis giraph-0.1]$ bin/giraph
Can't find user jar to execute.{noformat}
It would be better to have a more in-depth message explaining Giraph and what 
is expected.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-135) Need DISCLAIMER for incubator

2012-01-31 Thread Jakob Homan (Created) (JIRA)
Need DISCLAIMER for incubator
-

 Key: GIRAPH-135
 URL: https://issues.apache.org/jira/browse/GIRAPH-135
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan


Releases need to have a DISCLAIMER file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-134) Fix NOTICE and LICENSE files

2012-01-30 Thread Jakob Homan (Created) (JIRA)
Fix NOTICE and LICENSE files


 Key: GIRAPH-134
 URL: https://issues.apache.org/jira/browse/GIRAPH-134
 Project: Giraph
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.1.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.1.0


Currently both the LICENSE and NOTICE file are out of compliance for an Apache 
release.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-133) Type in JavaDoc in BspCase::remove

2012-01-30 Thread Jakob Homan (Created) (JIRA)
Type in JavaDoc in BspCase::remove
--

 Key: GIRAPH-133
 URL: https://issues.apache.org/jira/browse/GIRAPH-133
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial


Configuration is spelled wrong in the javadoc:
{noformat}/**
 * Helper method to remove a path if it exists.
 * 
 * @param conf Configutation
 * @param path Path to remove
 * @throws IOException
 */
public static void remove(Configuration conf, Path path) 
throws IOException {
FileSystem hdfs = FileSystem.get(conf);
hdfs.delete(path, true);
}{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-132) Simplify boolean expression in GraphMapper::map()

2012-01-30 Thread Jakob Homan (Created) (JIRA)
Simplify boolean expression in GraphMapper::map()
-

 Key: GIRAPH-132
 URL: https://issues.apache.org/jira/browse/GIRAPH-132
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Priority: Trivial


The boolean expression in:
{noformat}@Override
public void map(Object key, Object value, Context context)
throws IOException, InterruptedException {
// map() only does computation
// 1) Run checkpoint per frequency policy.
// 2) For every vertex on this mapper, run the compute() function
// 3) Wait until all messaging is done.
// 4) Check if all vertices are done.  If not goto 2).
// 5) Dump output.
if (done == true) {
return;
}{noformat}
can be simplified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-131) enable creation of test-jars to simplify testing in downstream projects

2012-01-26 Thread Created
enable creation of test-jars to simplify testing in downstream projects
---

 Key: GIRAPH-131
 URL: https://issues.apache.org/jira/browse/GIRAPH-131
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Priority: Minor
 Attachments: GIRAPH-131.patch

Attached patch enables the creation of test-jars, which are the tests packaged 
in a separate jar file. This makes it possible to use the super-useful test 
infrastructure in MockUtils in downstream projects. If you add the patch, you 
will get a ${giraph.version}-tests.jar, which can be used for downstream 
testing like this:


  org.apache.giraph
  giraph
  ${giraph.version}
  test-jar
  test


P.S.: The patch also resets the version to 0.1-SNAPSHOT as discussed in 
GIRAPH-129

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-130) Fix Javadoc warnings

2012-01-24 Thread Jakob Homan (Created) (JIRA)
Fix Javadoc warnings


 Key: GIRAPH-130
 URL: https://issues.apache.org/jira/browse/GIRAPH-130
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan
Priority: Minor


We've accumulated a fair number of javadoc warnings recently:
{noformat}[WARNING] Javadoc Warnings
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:129:
 warning - @param argument "superstep" is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84:
 warning - @param argument "vertexIndex" is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84:
 warning - @param argument "msgList" is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
 warning - Tag @link: reference not found: VertexIdMessage
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
 warning - Tag @link: reference not found: messages
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
 warning - Tag @link: reference not found: messages
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/AggregatorWriter.java:60:
 warning - @param argument "map" is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/GiraphJob.java:432:
 warning - @param argument "graphPartitionerClass" is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46:
 warning - Tag @link: reference not found: messages
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62:
 warning - @param argument "availableWorkerInfos" is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java:176:
 warning - @param argument "allPartitionStatsList" is not a parameter name.
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
 warning - Tag @link: reference not found: VertexIdMessage
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
 warning - Tag @link: reference not found: VertexIdMessage
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
 warning - Tag @link: reference not found: VertexIdMessage
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146:
 warning - Tag @link: reference not found: GraphPartitioner
[WARNING] 
/Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32:
 warning - Tag @link: reference not found: VertexIdMessage
{noformat}

It would be good to fix these.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-129) enable creation of javadoc and sources jars

2012-01-24 Thread Created
enable creation of javadoc and sources jars
---

 Key: GIRAPH-129
 URL: https://issues.apache.org/jira/browse/GIRAPH-129
 Project: Giraph
  Issue Type: Improvement
  Components: build
Affects Versions: 0.1.0
Reporter: André Kelpe
Priority: Minor
 Attachments: GIRAPH-129.patch

It is pretty useful to enable the creation if javadoc and sources jars during 
the build, so that people using IDEs like eclipse can easily jump into the code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried

2012-01-22 Thread Avery Ching (Created) (JIRA)
RPC port from BasicRPCCommunications should be only a starting port, and retried


 Key: GIRAPH-128
 URL: https://issues.apache.org/jira/browse/GIRAPH-128
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Avery Ching
Assignee: Avery Ching


Currently Giraph uses a basic port + the task partition to get the RPC port.  
This doesn't work well for when there are multiple Giraph jobs running 
simultaneously in the same Hadoop cluster (port conflict).  At the same time, 
it is nice to use this simple algorithm because it makes it very easy to debug 
problems (you can find the troublesome mapper from the RPC port name).  I will 
be proposing a simple scheme to retry with another port.  I will round the 
total number of mappers up to the nearest power of 10 (let's that that number 
Z).  Then I will increment the port number by Z, retrying up to 20 tries.  If 
you have enough ports, this scheme would guarantee that up to 20 mappers / node 
would be supported.  It should be sufficient for most clusters.  At the same 
time, we still maintain the easy debugging method since you it's still easy to 
figure out the mapper partition from the port (port % Z = map partition). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-127) Extending the API with a master.compute() function.

2012-01-19 Thread Semih Salihoglu (Created) (JIRA)
Extending the API with a master.compute() function.
---

 Key: GIRAPH-127
 URL: https://issues.apache.org/jira/browse/GIRAPH-127
 Project: Giraph
  Issue Type: New Feature
  Components: bsp, examples, graph
Reporter: Semih Salihoglu


First of all, sorry for the long explanation to this feature.

I want to expand the API of Giraph with a new function called master.compute(), 
that would get called at the master before each superstep and I will try to 
explain the purpose that it would serve with an example. Let's say we want to 
implement the following simplified version of the k-means clustering algorithm. 
Pseudocode below:
 * Input G(V, E), k, numEdgesThreshold, maxIterations
 * Algorithm:
 * int numEdgesCrossingClusters = Integer.MAX_INT;
*  int iterationNo = 0;
 * while ((numEdgesCrossingCluster > numEdgesThreshold) && iterationNo < 
maxIterations) {
 *iterationNo++;
 *int[] clusterCenters = pickKClusterCenters(k, G);
 *findClusterCenters(G, clusterCenters);
 *numEdgesCrossingClusters = countNumEdgesCrossingClusters();
 * }
The algorithm goes through the following steps in iterations:
1) Pick k random initial cluster centers
2) Assign each vertex to the cluster center that it's closest to (in Giraph, 
this can be implemented in message passing similar to how ShortestPaths is 
implemented):
3) Count the nuimber of edges crossing clusters
4) Go back to step 1, if there are a lot of edges crossing clusters and we 
haven't exceeded maximum number of iterations yet.

In an algorithm like this, step 2 and 3 are where most of the work happens and 
both parts have very neat message-passing implementations. I'll try to give an 
overview without going into the details. Let's say we define a Vertex in Giraph 
to hold a custom Writable object that holds 2 integer values and sends a 
message with upto 2 integer values.
Step 2 is very similar to ShortestPaths algorithm and has two stages: In the 
first stage, each vertex checks to see whether or not it's one of the cluster 
centers. If so, it assigns itself the value (id, 0), otherwise it assigns 
itself (Null, Null). In the 2nd stage, the vertices assign themselves to the 
minimum distance cluster center by looking at their neighbors (cluster centers, 
distance) values (received as 2 integer messages) and their current values, and 
changing their values if they find a lower distance cluster center. This 
happens in x number of supersteps until every vertex converges.
Step 3, counting the number of edges crossing clusters, is also very easy to 
implement in Giraph. Once each vertex has a cluster center, the number of edges 
crossing clusters can be counted by an aggregator, let's say called 
"num-edges-crossing". It would again have two stages: First stage, every vertex 
just sends its cluster id to all its neighbors. Second stage, every vertex 
looks at their neighbors' cluster ids in the messages, and for each cluster id 
that is not equal to its own cluster id, it increments "num-edges-crossing" by 
1.

The other 2 steps, step 1 and 4, are very simple sequential computations. Step 
1 just picks k random vertex ids and puts it into an aggregator. Step 4 just 
compares "num-edges-crossing" by a threshold and also checks whether or not the 
algorithm has exceeded maxIterations (not supersteps but iterations of going 
through Steps 1-4). With the current API, it's not clear where to do these 
computations. There is a per worker function preSuperstep() that can be 
implemented, but if we decide to pick a special worker, let's say worker 1,  to 
pick the k vertices then we'd waste an entire superstep where only worker 1 
would do work, (by picking k vertices  in preSuperstep() and put them into an 
aggregator), and all other workers would be idle. Trying to do this in worker 1 
in postSuperstep() would not work either because, worker 1 needs to know that 
all the vertices have converged to understand that it's time to pick k vertices 
or it's time do check in step 4, which would only be available to it in the 
beginning of the next superstep.

A master.compute() extension would run at the master and before the superstep 
and would modify the aggregator that would keep the k vertices before the 
aggregators are broadcast to the workers, which are all very short sequential 
computations, so they would not waste resources the way a preSuperstep() or 
postSuperstep() approach would do. It would also enable running new algorithms 
like kmeans that are composed of very vertex-centric computations glued 
together by small sequential ones. It would basically boost Giraph with 
sequential computation in a non-wasteful way.

I am a phd student at Stanford and I have been working on my own BSP/Pregel 
implementation since last year. It's called GPS. I haven't distributed it, 
mainly because in September I learned about Giraph and I decided to sl

[jira] [Created] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java

2012-01-18 Thread Created
Use Collections.emptyList() in BasicRPCCommunications.java
--

 Key: GIRAPH-126
 URL: https://issues.apache.org/jira/browse/GIRAPH-126
 Project: Giraph
  Issue Type: Improvement
Reporter: André Kelpe
Priority: Minor


I am doing some tests with giraph and I am having some memory problems. While I 
was browsing through the codebase I saw that you are allocating a new ArrayList 
(which has an underlying array of 10 elements) for each Vertex, that has no 
Messages to be delivered. That's a waste of memory and time. This patch 
replaces it with the EMPTY_LIST of the Collections utility class.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-125) Bug in LongDoubleFloatDoubleVertex.sendMsgToAllEdges()

2012-01-17 Thread Yuanyuan Tian (Created) (JIRA)
Bug in LongDoubleFloatDoubleVertex.sendMsgToAllEdges()
--

 Key: GIRAPH-125
 URL: https://issues.apache.org/jira/browse/GIRAPH-125
 Project: Giraph
  Issue Type: Bug
  Components: graph
Affects Versions: 0.1.0
Reporter: Yuanyuan Tian


I just found a bug in the sendMsgToAllEdges() function of the 
LongDoubleFloatDoubleVertex class. The segment of the code that contains the 
bug is:

final LongWritable destVertex = new LongWritable();
final MutableVertex vertex = this;
verticesWithEdgeValues.forEachKey(new LongProcedure() {
@Override
public boolean apply(long destVertexId) {
destVertex.set(destVertexId);
vertex.sendMsg(destVertex, msg);
return true;
}
});

Here destVertex is a final object, but this single object is reused in the 
forEachKey function many times. Each time its actual value is changed but the 
same object is put to the underlying message list (a hashmap) through 
vertex.sendMsg. Because the single destVertex object has been put into the 
underlying hashmap again and again, destVertex.set(destVertexId) will change 
the existing keys in the hashmap. Eventually, every keys added to the hash map 
will have the same value as the last key. 

A simple fix is as follows:

final MutableVertex vertex = this;
verticesWithEdgeValues.forEachKey(new LongProcedure() {
@Override
public boolean apply(long destVertexId) {
vertex.sendMsg(new LongWritable(destVertexId), msg);
return true;
}
});

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-124) Combiner should return Iterable instead of M or null.

2012-01-16 Thread Claudio Martella (Created) (JIRA)
Combiner should return Iterable instead of M or null.


 Key: GIRAPH-124
 URL: https://issues.apache.org/jira/browse/GIRAPH-124
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.1.0
Reporter: Claudio Martella


Currently VertexCombiner is expected to return a single message combining the 
input messages, or null in case no message should be sent. The new expected 
interface should return an Iterable, possibly empty. The number of elements 
in the returned Iterable is supposed to be smaller than the number of input 
messages, by the initial definition of a Combiner (defined as a function to 
reduce I/O by combining multiple messages into 1).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-123) the wiki is not publicly accessible

2012-01-11 Thread Created
the wiki is not publicly accessible
---

 Key: GIRAPH-123
 URL: https://issues.apache.org/jira/browse/GIRAPH-123
 Project: Giraph
  Issue Type: Bug
  Components: documentation
Reporter: André Kelpe
Priority: Minor


When I try to read the documentation on the wiki I end up on a login screen. 
Can you please make the wiki open for the public.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-122) Roll version back to 0.1

2012-01-09 Thread Jakob Homan (Created) (JIRA)
Roll version back to 0.1


 Key: GIRAPH-122
 URL: https://issues.apache.org/jira/browse/GIRAPH-122
 Project: Giraph
  Issue Type: Improvement
Reporter: Jakob Homan
Assignee: Jakob Homan


Per the vote on the list, we're going to roll Giraph back to 0.1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-121) BasicVertexResolver should implementation and VertexResolver should be interface

2012-01-08 Thread Claudio Martella (Created) (JIRA)
BasicVertexResolver should implementation and VertexResolver should be interface


 Key: GIRAPH-121
 URL: https://issues.apache.org/jira/browse/GIRAPH-121
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Trivial


After change of naming in Vertex, VertexResolver and BasicVertexResolver naming 
should be synched.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-120) Add Sebastian Schelter to site

2012-01-07 Thread Sebastian Schelter (Created) (JIRA)
Add Sebastian Schelter to site
--

 Key: GIRAPH-120
 URL: https://issues.apache.org/jira/browse/GIRAPH-120
 Project: Giraph
  Issue Type: Task
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-119) VertexCombiner should work on Iterable instead of List

2012-01-06 Thread Claudio Martella (Created) (JIRA)
VertexCombiner should work on Iterable instead of List


 Key: GIRAPH-119
 URL: https://issues.apache.org/jira/browse/GIRAPH-119
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella


Currently VertexCombiner expects a List. It should be refactored to 
Iterable to sync with Iterable-based BasicVertex messages logics.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-118) Clarify messages behavior in BasicVertex

2012-01-06 Thread Claudio Martella (Created) (JIRA)
Clarify messages behavior in BasicVertex


 Key: GIRAPH-118
 URL: https://issues.apache.org/jira/browse/GIRAPH-118
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Claudio Martella
Priority: Minor




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-117) DefaultWorkerContext should preserve the method signatures of WorkerContext

2012-01-02 Thread Sebastian Schelter (Created) (JIRA)
DefaultWorkerContext should preserve the method signatures of WorkerContext
---

 Key: GIRAPH-117
 URL: https://issues.apache.org/jira/browse/GIRAPH-117
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Assignee: Sebastian Schelter
Priority: Trivial


DefaultWorkerContext.preApplication() swallows the InstantiationException and 
IllegalAccessException of WorkerContext.preApplication(). These should be 
preserved for applications that want to register an aggregator in this method.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-116) Make EdgeListVertex the default vertex implementation

2011-12-31 Thread Avery Ching (Created) (JIRA)
Make EdgeListVertex the default vertex implementation
-

 Key: GIRAPH-116
 URL: https://issues.apache.org/jira/browse/GIRAPH-116
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching


I think this would best for new users as it is much more memory efficient than 
Vertex with respect to edges (list vs hash map).  We seem to be mostly 
iterating over the edges (as several others had pointed out in earlier JIRAs 
and emails), so this would provide early users with a more memory efficient 
implementation without performance loss.  If anyone disagrees, please voice 
your opinions.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-115) Port of the HCC algorithm for identifying all connected components of a graph

2011-12-24 Thread Sebastian Schelter (Created) (JIRA)
Port of the HCC algorithm for identifying all connected components of a graph
-

 Key: GIRAPH-115
 URL: https://issues.apache.org/jira/browse/GIRAPH-115
 Project: Giraph
  Issue Type: New Feature
Affects Versions: 0.70.0
Reporter: Sebastian Schelter


Port of the HCC algorithm that identifies connected components and assigns a 
componented id (the smallest vertex id in the component) to each vertex.

The idea behind the algorithm is very simple: propagate the smallest vertex id 
along the edges to all vertices of a connected component until convergence. The 
number of supersteps necessary is equal to the length of the maximum diameter 
of all components + 1

The original Hadoop-based variant of this algorithm was proposed by Kang, 
Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 
2010

http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-114) Inconsistent message map handling in BasicRPCCommunications.LargeMessageFlushExecutor

2011-12-21 Thread Sebastian Schelter (Created) (JIRA)
Inconsistent message map handling in 
BasicRPCCommunications.LargeMessageFlushExecutor
-

 Key: GIRAPH-114
 URL: https://issues.apache.org/jira/browse/GIRAPH-114
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Critical
 Attachments: GIRAPH-114.patch

I'm currently implementing a simple algorithm to identify all the connected 
components of a graph. The algorithm ran well in a local IDE unit tests on toy 
data and in a local single node hadoop instance using a graph of ~100k edges.

When I tested it on a real cluster with the wikipedia pagelink graph (5.7M 
vertices, 130M edges), I ran into strange exceptions like this:

{noformat} 
2011-12-21 12:03:57,015 INFO org.apache.hadoop.mapred.TaskInProgress: Error 
from attempt_201112131541_0034_m_27_0: java.lang.IllegalStateException: 
run: Caught an unrecoverable exception flush: Got ExecutionException
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Caused by: java.lang.IllegalStateException: flush: Got ExecutionException
at 
org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:946)
at 
org.apache.giraph.graph.BspServiceWorker.finishSuperstep(BspServiceWorker.java:916)
at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:588)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:632)
... 7 more
Caused by: java.util.concurrent.ExecutionException: 
java.lang.IllegalStateException: run: Impossible for no messages in 1603276
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at 
org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:941)
... 10 more
Caused by: java.lang.IllegalStateException: run: Impossible for no messages in 
1603276
at 
org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:245)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{noformat} 

The exception is thrown because a vertex with no message to send to is found in 
the datastructure holding the outgoing messages.

I tracked this behavior down:


In *BasicRPCCommunications:541-546* the map holding the outgoing messages for 
vertices of a particular machine is created. It's stored in two places 
_BasicRPCCommunications.outMessages_ and as member variable 
_outMessagesPerPeer_ of its _PeerConnection_ :

{noformat} 
outMsgMap = new HashMap>();
outMessages.put(addrUnresolved, outMsgMap);

PeerConnection peerConnection = new PeerConnection(outMsgMap, peer, isProxy);
{noformat} 

In case that there are a lot of messages available for a particular vertex, a 
large flush is trigged via _LargeMessageFlushExecutor_ (I guess this only 
happened in the wikipedia test). During this flush the list of messages for the 
vertex is sent out and replaced with an empty list in 
*BasicRPCCommunications:341*

{noformat}
outMessageList = peerConnection.outMessagesPerPeer.get(destVertex);
peerConnection.outMessagesPerPeer.put(destVertex, new MsgList());
{noformat}

Now in the last flush that is trigggered at the end of the superstep we 
encounter an empty message list for the vertex and therefore the exception is 
thrown in *BasicRPCCommunications:228-247*

{noformat}
for (Entry> entry : peerConnection.outMessagesPerPeer.entrySet()) 
{
...
  if (entry.getValue().isEmpty()) {
throw new IllegalStateException(...);
}
{noformat}

Simply removing the list for the vertex when executing the large flush solved 
the issue (patch to come).

I'd like to note that it is generally very dangerous to let different classes 
have access to a datastructure directly and it produces subtle bugs like this. 
It would be better to think of a centralized way of handling the datastructure. 



--
This mes

[jira] [Created] (GIRAPH-113) Change cast to Vertex used in prepareSuperstep() to BasicVertex

2011-12-20 Thread Avery Ching (Created) (JIRA)
Change cast to Vertex used in prepareSuperstep() to BasicVertex
---

 Key: GIRAPH-113
 URL: https://issues.apache.org/jira/browse/GIRAPH-113
 Project: Giraph
  Issue Type: Bug
Reporter: Yuanyuan Tian
Priority: Minor


Hi,

I decided to use LongDoubleFloatDoubleVertex in a graph algorithm because it 
uses more compact and efficient mahout collections. However I run into an error 
when running the algorithm:

java.lang.ClassCastException: 
org.apache.giraph.graph.LongDoubleFloatDoubleVertex cannot be cast to 
org.apache.giraph.graph.Vertex
at 
org.apache.giraph.comm.BasicRPCCommunications.prepareSuperstep(BasicRPCCommunications.java:1016)
at 
org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:843)
at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:569)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:728)
... 7 more

Basically, the problem is that in BasicRPCCommunications.prepareSuperStep(), 
the LongDoubleFloatDoubleVertex are cast to Vertex in the following code 
fragment. But LongDoubleFloatDoubleVertex inherits from BasicVertex instead of 
Vertex.

if (vertex != null) {
   ((MutableVertex) vertex).setVertexId(vertexIndex);
   partition.putVertex((Vertex) vertex);
} else if (originalVertex != null) {
  partition.removeVertex(originalVertex.getVertexId());
}

I did a simple change: cast LongDoubleFloatDoubleVertex to BasicVertex. The 
problem went away, and the algorithm finished without any error. But I am not 
sure this change has any implication to other parts of the code. So, I hope to 
get some comments from the Giraph developers.

if (vertex != null) {
   ((MutableVertex) vertex).setVertexId(vertexIndex);
   partition.putVertex((BasicVertex) vertex);
} else if (originalVertex != null) {
  partition.removeVertex(originalVertex.getVertexId());
}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-112) A bug in LongDoubleFloatDoubleVertex.write(DataOutput out)

2011-12-20 Thread Yuanyuan Tian (Created) (JIRA)
A bug in LongDoubleFloatDoubleVertex.write(DataOutput out)
--

 Key: GIRAPH-112
 URL: https://issues.apache.org/jira/browse/GIRAPH-112
 Project: Giraph
  Issue Type: Bug
  Components: graph
Affects Versions: 0.70.0
 Environment: Any
Reporter: Yuanyuan Tian
 Fix For: 0.70.0


I found a bug in LongDoubleFloatDoubleVertex.write(DataOutput out) when running 
a small graph algorithm. The symptom is that a vertex read from a different 
worker becomes junk after the RPC communication. And the source of the problem 
is the writing of the messages in LongDoubleFloatDoubleVertex.write(DataOutput 
out):

for(double msg : messageList.elements()) {
   out.writeDouble(msg);
}

Here messageList.elements() will returns all the elements currently stored in 
the mahout DoubleArrayList, even including invalid elements between size and 
capacity. Therefore, the write() function will write a bunch of invalid 
messages, which will cause error when reading them back in readfields().

The following is a simple solution:

double[] elements=messageList.elements();
for(int i=0; ihttps://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce

2011-12-20 Thread Ed Kohlwey (Created) (JIRA)
Refactor I/O to be independent of Map/Reduce


 Key: GIRAPH-111
 URL: https://issues.apache.org/jira/browse/GIRAPH-111
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Ed Kohlwey


The I/O mechanisms should probably be abstracted entirely from Map/Reduce in 
order to support making Giraph an independent framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-110) Add guide to setup the enviroment for running the unit tests in a pseudo-distributed hadoop instance

2011-12-20 Thread Sebastian Schelter (Created) (JIRA)
Add guide to setup the enviroment for running the unit tests in a 
pseudo-distributed hadoop instance


 Key: GIRAPH-110
 URL: https://issues.apache.org/jira/browse/GIRAPH-110
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Minor


Giraph should provide a small guide for setting up the local environment to run 
the unit tests in a pseudo-distributed hadoop instance as there are some 
non-obvious hurdles to take.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-109) GiraphRunner should provide support for combiners

2011-12-20 Thread Sebastian Schelter (Created) (JIRA)
GiraphRunner should provide support for combiners
-

 Key: GIRAPH-109
 URL: https://issues.apache.org/jira/browse/GIRAPH-109
 Project: Giraph
  Issue Type: Improvement
Affects Versions: 0.70.0
Reporter: Sebastian Schelter


Currently there's no way to tell GiraphRunner that you want to use a Combiner. 
A simple option should be added, similar to the way in- and outputformats are 
specified.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-108) Refactor code to run independently of Map/Reduce

2011-12-19 Thread Ed Kohlwey (Created) (JIRA)
Refactor code to run independently of Map/Reduce


 Key: GIRAPH-108
 URL: https://issues.apache.org/jira/browse/GIRAPH-108
 Project: Giraph
  Issue Type: Improvement
  Components: graph
Reporter: Ed Kohlwey


It would be nice for Giraph to be refactored such that the code could 
eventually be run outside of map/reduce. This will allow people to write 
drivers that can run in the cool new resource manager frameworks like Mesos and 
YARN, and eventually let the application's code base evolve to be independent 
of map/reduce.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-106) Refactor prepareSuperstep() to make setMessages(Iterable messages) package-private

2011-12-17 Thread Avery Ching (Created) (JIRA)
Refactor prepareSuperstep() to make setMessages(Iterable messages) 
package-private
-

 Key: GIRAPH-106
 URL: https://issues.apache.org/jira/browse/GIRAPH-106
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching


GIRAPH-80 revealed that there is some refactoring to make setMessages() 
package-private (prevent users from messing around with internals).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-107) Refactor prepareSuperstep() to make setMessages(Iterable messages) package-private

2011-12-17 Thread Avery Ching (Created) (JIRA)
Refactor prepareSuperstep() to make setMessages(Iterable messages) 
package-private
-

 Key: GIRAPH-107
 URL: https://issues.apache.org/jira/browse/GIRAPH-107
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Assignee: Avery Ching


GIRAPH-80 revealed that there is some refactoring to make setMessages() 
package-private (prevent users from messing around with internals).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-105) BspServiceMaster.checkWorkers() should return empty lists instead of null

2011-12-17 Thread Sebastian Schelter (Created) (JIRA)
BspServiceMaster.checkWorkers() should return empty lists instead of null
-

 Key: GIRAPH-105
 URL: https://issues.apache.org/jira/browse/GIRAPH-105
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.70.0
Reporter: Sebastian Schelter
Priority: Minor


BspServiceMaster.checkWorkers() is invoked in 
BspServiceMaster.coordinateSuperstep() and in 
BspServiceMaster.createInputSplits(). Both check for an empty list to fail the 
job in case something has gone wrong. However, checkWorkers() returns null in 
case of problems, causing an NPE in the calling code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-104) Save half of maximum memory used from messaging

2011-12-13 Thread Avery Ching (Created) (JIRA)
Save half of maximum memory used from messaging
---

 Key: GIRAPH-104
 URL: https://issues.apache.org/jira/browse/GIRAPH-104
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Priority: Critical


Currently, the amount of memory that Giraph uses for messaging is huge.  This 
JIRA will reduce the messaging memory by half and provide periodic updates of 
memory for debugging.  Details are below:

Refactored RandomMessageBenchmark to an internal vertex class.  Added 
aggregators to RandomMessagesBenchmark to track bytes, messages, and time for 
the messaging.  Adjusted the postSuperstep() to be called after the flush() for 
more accurate timings.

Added periodic minute updates for message flushing (which can take a while, 
especially on the memory benchmark).  This helps to see how progress is going 
and gives an ETA.

Memory optimizations include:
- Clear the message list after computation 
- Free vertex messages on the source as the flush is going on 
- TreeMap -> HashMap for VertexMutations
- Sizing the ArrayList properly in transientInMessages


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-103) Added properties for commonly used package version to pom.xml

2011-12-09 Thread Avery Ching (Created) (JIRA)
Added properties for commonly used package version to pom.xml
-

 Key: GIRAPH-103
 URL: https://issues.apache.org/jira/browse/GIRAPH-103
 Project: Giraph
  Issue Type: Improvement
  Components: build
Reporter: Avery Ching
Priority: Trivial
 Attachments: GIRAPH-103.diff



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-102) Create an exception class for Giraph

2011-12-01 Thread Avery Ching (Created) (JIRA)
Create an exception class for Giraph


 Key: GIRAPH-102
 URL: https://issues.apache.org/jira/browse/GIRAPH-102
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching


Many of the Exceptions are IllegalStateException but could be better as 
GiraphException and reasonable derivatives.

This would

1) Allow us to differentiate exceptions specific to Giraph
2) Allow us to add useful information (i.e. superstep, attempt)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-101) Replace munge with shim layer similar to Pig and Hive

2011-11-28 Thread Avery Ching (Created) (JIRA)
Replace munge with shim layer similar to Pig and Hive
-

 Key: GIRAPH-101
 URL: https://issues.apache.org/jira/browse/GIRAPH-101
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching
Priority: Minor


Munge is a hacky way of support multiple versions of Hadoop.  The shim layers 
in Pig and Hive are a cleaner way to do this I think.  That being said, since 
it does work now, it's not a huge priority I guess.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-100) Data input sampling and testing improvements

2011-11-28 Thread Avery Ching (Created) (JIRA)
Data input sampling and testing improvements


 Key: GIRAPH-100
 URL: https://issues.apache.org/jira/browse/GIRAPH-100
 Project: Giraph
  Issue Type: New Feature
  Components: graph
Reporter: Avery Ching


It would be really nice to help debug an application by limiting the input data 
(% of input splits, max vertices per input split).  Also, it would be nice for 
the workers to provide a little more debugging info on how far along they are 
with processing the input data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-99) Make AdjacencyListVertexReader and its constructor public

2011-11-22 Thread Kohei Ozaki (Created) (JIRA)
Make AdjacencyListVertexReader and its constructor public
-

 Key: GIRAPH-99
 URL: https://issues.apache.org/jira/browse/GIRAPH-99
 Project: Giraph
  Issue Type: Wish
  Components: lib
Reporter: Kohei Ozaki
Priority: Minor


Hi,

I'd like to write a class inherited from AdjacencyListVertexReader
to make a library using Giraph (like git.io/ALVR),
but AdjacencyListVertexReader is a private class and its constructor are 
private.
I guess making it public is useful to handle a more complex input format
specified by the data structure of algorithms.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-98) Add Claudio Martella to site

2011-11-18 Thread Claudio Martella (Created) (JIRA)
Add Claudio Martella to site


 Key: GIRAPH-98
 URL: https://issues.apache.org/jira/browse/GIRAPH-98
 Project: Giraph
  Issue Type: Task
Reporter: Claudio Martella
 Attachments: GIRAPH-98.diff



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-97) TestIdWithValueTextOutputFormat.java and IdWithValueTextOutputFormat.java missing license header

2011-11-18 Thread Claudio Martella (Created) (JIRA)
TestIdWithValueTextOutputFormat.java and IdWithValueTextOutputFormat.java 
missing license header


 Key: GIRAPH-97
 URL: https://issues.apache.org/jira/browse/GIRAPH-97
 Project: Giraph
  Issue Type: Bug
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
Priority: Trivial
 Fix For: 0.70.0
 Attachments: GIRAPH-97.diff

As reported by Yingyi Bu on user mailinglist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-96) Support for Graphs with Huge adjacency lists

2011-11-17 Thread Arun Suresh (Created) (JIRA)
Support for Graphs with Huge adjacency lists


 Key: GIRAPH-96
 URL: https://issues.apache.org/jira/browse/GIRAPH-96
 Project: Giraph
  Issue Type: Improvement
Reporter: Arun Suresh


Currently the vertex initialize() method is passed the complete adjacency list 
as a HashMap. All the current concrete implementations of Vertex iterate over 
the adjacency list and recreate new Data Structures within the Vertex instance 
to hold/manipulate the adjacency list. This would seize to be feasible once the 
size of the adjacency list becomes really huge.

I propose storing the adjacency list and all vertex information (and incoming 
messages ?) in a distributed data store such as HBase. The adjacency list can 
be lazily loaded via HBase Scans. I was thinking of an HBase schema where the 
row Id is a concatenation of VertexID+OutboundVertexId with a single column 
containing the edge.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-95) vertex resolution expects MutableVertex instead of BasicVertex

2011-11-17 Thread Claudio Martella (Created) (JIRA)
vertex resolution expects MutableVertex instead of BasicVertex
--

 Key: GIRAPH-95
 URL: https://issues.apache.org/jira/browse/GIRAPH-95
 Project: Giraph
  Issue Type: Bug
  Components: graph
Reporter: Claudio Martella


At the beginning of the superstep, when a message is sent to non-existing 
vertex, the new vertex is created. This new vertex id is set through 
setVertexId() which belongs to MutableVertex. Should use initialize() instead.

See BspRPCCommunication:948 (on my local trunk)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-94) Loading vertex ranges from HBase

2011-11-17 Thread Claudio Martella (Created) (JIRA)
Loading vertex ranges from HBase


 Key: GIRAPH-94
 URL: https://issues.apache.org/jira/browse/GIRAPH-94
 Project: Giraph
  Issue Type: New Feature
Reporter: Claudio Martella
Assignee: Claudio Martella


Loading vertices from an HTable would be an option.

A possible schema for storing the graph would be Hexastore 
(http://www.vldb.org/pvldb/1/1453965.pdf). 
Also, as vertices whom messages are sent to get created on the fly (if they 
don't exist already), we could potentially have a HBaseVertex that fetches the 
adjacency list + vertex value from HBase. That would be kind of a Lazy-load 
approach, if you can define the initial split as an HBase query.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-93) Hive input / output format

2011-11-16 Thread Avery Ching (Created) (JIRA)
Hive input / output format
--

 Key: GIRAPH-93
 URL: https://issues.apache.org/jira/browse/GIRAPH-93
 Project: Giraph
  Issue Type: New Feature
Reporter: Avery Ching
Assignee: Avery Ching


It would be great to be able to load/store data from/to Hive tables.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-92) Need outputformat for just vertex ID and value

2011-11-16 Thread Jakob Homan (Created) (JIRA)
Need outputformat for just vertex ID and value
--

 Key: GIRAPH-92
 URL: https://issues.apache.org/jira/browse/GIRAPH-92
 Project: Giraph
  Issue Type: New Feature
  Components: lib
Affects Versions: 0.70.0
Reporter: Jakob Homan
Assignee: Jakob Homan
 Fix For: 0.70.0
 Attachments: GIRAPH-92.patch

We should have an text outputformat that just spits out the vertex id and value 
without its edges:
{noformat}index.html 0.9423{noformat}
This would be particularly helpful for further processing by, for instance, Pig.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)

2011-11-16 Thread Avery Ching (Created) (JIRA)
Large-memory improvements (Memory reduced vertex implementation, fast failure, 
added settings) 
---

 Key: GIRAPH-91
 URL: https://issues.apache.org/jira/browse/GIRAPH-91
 Project: Giraph
  Issue Type: Improvement
Reporter: Avery Ching


Current vertex implementation uses a HashMap for storing the edges, which is 
quite memory heavy for large graphs.  The default settings in Giraph need to be 
improved for large graphs and heaps of >20G.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-90) LongDoubleFloatDoubleVertex has possibily the iterator() implementation broken

2011-11-16 Thread Claudio Martella (Created) (JIRA)
LongDoubleFloatDoubleVertex has possibily the iterator() implementation broken
--

 Key: GIRAPH-90
 URL: https://issues.apache.org/jira/browse/GIRAPH-90
 Project: Giraph
  Issue Type: Bug
  Components: graph
Affects Versions: 0.70.0
Reporter: Claudio Martella
Assignee: Claudio Martella
 Fix For: 0.70.0


iterator() implementation returns LongWritable which is cached in a final 
variable and set() with the new value at next(). This could be misleading as 
the user might create a list from the iterator's data. Something similar is 
happening in the getMsgList() as well.

Is this really what we want?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex

2011-11-15 Thread Jakob Homan (Created) (JIRA)
Remove debugging system.out from LongDoubleFloatDoubleVertex


 Key: GIRAPH-89
 URL: https://issues.apache.org/jira/browse/GIRAPH-89
 Project: Giraph
  Issue Type: Bug
Reporter: Jakob Homan


Line 137: {{System.out.println("in getNumVertices!");}}
looks like a debugging line and should be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (GIRAPH-88) Message count not updated properly after GIRAPH-11

2011-11-15 Thread Avery Ching (Created) (JIRA)
Message count not updated properly after GIRAPH-11
--

 Key: GIRAPH-88
 URL: https://issues.apache.org/jira/browse/GIRAPH-88
 Project: Giraph
  Issue Type: Bug
Reporter: Avery Ching


Email from s...@apache.org

Hi,

I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to
continue to work on GIRAPH-51 where I use a small toy graph to test
SimpleShortestPathVertex.

Unfortunately my code did not work anymore and I guess I tracked it down
to the fact that vertex that voted to halt are not reacted anymore when
new messages arrive.

In SimpleShortestPathVertex every vertex always votes to halt and only
gets reactivated when a shorter path to it has been found. However my
test run always finished after superstep 0.

I don't know too much about Giraph's internals yet, but my guess is that
the number of sent messages is not tracked correctly anymore. Therefore
giraph finishes the algorithm (as all vertices voted to halt) although
there should still be messages in the pipeline.

I think I tracked it down to this behavior:

GraphMapper declares a variable workerSentMessages = 0 and never
increases it. This variable is given to
BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses
it to compute the GlobalStats afterwards, which are used to decide
whether a new superstep has to be scheduled. As it has never been
increased, the algorithm will always stop when all vertices voted to halt.

It would be great if someone could confirm/disprove this speculation and
help me to continue work on GIRAPH-51


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




  1   2   >