[jira] [Created] (GIRAPH-187) SequenceFileVertexInputFormat has WritableComparable as a bounded type for I
SequenceFileVertexInputFormat has WritableComparable as a bounded type for I --- Key: GIRAPH-187 URL: https://issues.apache.org/jira/browse/GIRAPH-187 Project: Giraph Issue Type: Bug Components: lib Affects Versions: 0.2.0 Reporter: Jan van der Lugt Priority: Minor This is the first JIRA I ever file, so please let me know if I'm not doing this right. Basically, SequenceFileVertexInputFormat has WritableComparable as a bounded type for I, while the Hadoop serializable data types implement WritableComparable. Because of this, I suspect TextVertexInputFormat only has WritableComparable as a bounded type for I and has a SuppressWarnings("rawtypes") annotation. I think SequenceFileVertexInputFormat should follow the same style, otherwise it's not possible to use, for example, IntComparable as vertex id type in a SequenceVertexInputFormat. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-186) Improve concurrency of putVertexList
Improve concurrency of putVertexList Key: GIRAPH-186 URL: https://issues.apache.org/jira/browse/GIRAPH-186 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.2.0 Reporter: Bo Wang Fix For: 0.2.0 It's pretty similar to GIRAPH-185. The whole inPartitionVertexMap is locked when there is a call to it. We should allow multiple calls adding different partitions to the same worker at the same time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-185) Improve concurrency of putMsg / putMsgList
Improve concurrency of putMsg / putMsgList -- Key: GIRAPH-185 URL: https://issues.apache.org/jira/browse/GIRAPH-185 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.2.0 Reporter: Bo Wang Fix For: 0.2.0 Currently in putMsg / putMsgList, a synchronized closure is used to protect the whole transientInMessages when adding the new message. This lock prevents other concurrent calls to putMsg/putMsgList and increases the response time. We should use fine-grain locks to allow high concurrency in message communication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-184) Upgrade to junit4
Upgrade to junit4 - Key: GIRAPH-184 URL: https://issues.apache.org/jira/browse/GIRAPH-184 Project: Giraph Issue Type: Bug Reporter: Devaraj K Presently Giraph uses JUnit 3.8.1. We can upgrade to JUnit 4 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-183) Add Claudio's FOSDEM presentation (slides and video) to the site
Add Claudio's FOSDEM presentation (slides and video) to the site Key: GIRAPH-183 URL: https://issues.apache.org/jira/browse/GIRAPH-183 Project: Giraph Issue Type: Improvement Components: site Reporter: Claudio Martella Assignee: Claudio Martella Priority: Trivial Presentation: http://prezi.com/9ake_klzwrga/apache-giraph-distributed-graph-processing-in-the-cloud/ Video: http://www.youtube.com/watch?v=3ZrqPEIPRe4, http://www.youtube.com/watch?v=BmRaejKGeDM -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-182) Provide SequenceFileVertexOutputFormat as an available OutputFormat
Provide SequenceFileVertexOutputFormat as an available OutputFormat --- Key: GIRAPH-182 URL: https://issues.apache.org/jira/browse/GIRAPH-182 Project: Giraph Issue Type: New Feature Components: lib Reporter: Pradeep Gollakota Priority: Minor SequenceFile's are heavily used in Hadoop. We should provide SequenceFileVertexOutputFormat. Since SequenceFileVertexInputFormat is already provided, it makes sense to also provide a mirroring OutputFormat -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-181) Add Hadoop 1.0 profile to pom.xml
Add Hadoop 1.0 profile to pom.xml - Key: GIRAPH-181 URL: https://issues.apache.org/jira/browse/GIRAPH-181 Project: Giraph Issue Type: Improvement Components: build Affects Versions: 0.2.0 Reporter: Eugene Koontz Assignee: Eugene Koontz Fix For: 0.2.0 Hadoop 1.0.x is now considered the "current stable version" of Hadoop, according to http://hadoop.apache.org/common/releases.html#Download . This JIRA is to add support within Giraph's maven profile for the 1.0.x Hadoop release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-180) Publish SNAPSHOTs and released artifacts in the Maven repository
Publish SNAPSHOTs and released artifacts in the Maven repository Key: GIRAPH-180 URL: https://issues.apache.org/jira/browse/GIRAPH-180 Project: Giraph Issue Type: Improvement Components: build Affects Versions: 0.1.0 Reporter: Paolo Castagna Priority: Minor Currently Giraph uses Maven to drive its build. However, no Maven artifacts nor SNAPSHOTs are published in the Apache Maven repository or Maven central. It would be useful to have Apache Giraph artifacts and SNAPSHOTs published and enable people to use Giraph without recompiling themselves. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-179) BspServiceMaster's PathFilter can be simplified
BspServiceMaster's PathFilter can be simplified --- Key: GIRAPH-179 URL: https://issues.apache.org/jira/browse/GIRAPH-179 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Priority: Trivial {code} /** * Only get the finalized checkpoint files */ public static class FinalizedCheckpointPathFilter implements PathFilter { @Override public boolean accept(Path path) { if (path.getName().endsWith( BspService.CHECKPOINT_FINALIZED_POSTFIX)) { return true; } return false; } }{code} we can simplify this, eliminating the if statement and just returning the result of {{endsWith()}} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-178) TestPredicate lock has lots of boolean expressions to be simplified
TestPredicate lock has lots of boolean expressions to be simplified --- Key: GIRAPH-178 URL: https://issues.apache.org/jira/browse/GIRAPH-178 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Priority: Trivial TestPredicateLock.java has several instances of {code}assertTrue(gotPredicate == false);{code} (or {{== true}}) that can be simplified to more idiomatic Java. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-177) SimplePageRankVertex has two redundant casts
SimplePageRankVertex has two redundant casts Key: GIRAPH-177 URL: https://issues.apache.org/jira/browse/GIRAPH-177 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Priority: Trivial {code}DoubleWritable maxPagerank = (DoubleWritable) maxAggreg.getAggregatedValue(); LOG.info("aggregatedMaxPageRank=" + maxPagerank.get()); DoubleWritable minPagerank = (DoubleWritable) minAggreg.getAggregatedValue(); LOG.info("aggregatedMinPageRank=" + minPagerank.get());{code} Both MinAggregator and MaxAggregator are already parameterized on DoubleWritable, so it's not necessary to cast their functions' results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-176) BasicRPCCommunications has unnecessary cast of Vertex
BasicRPCCommunications has unnecessary cast of Vertex - Key: GIRAPH-176 URL: https://issues.apache.org/jira/browse/GIRAPH-176 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Priority: Minor BasicRPCCommunications.java, 1224: {code} BasicVertex vertex = vertexResolver.resolve(vertexIndex, originalVertex, vertexMutations, messages);{code} and then a few lines later at 1248: {code}partition.putVertex((BasicVertex) vertex);{code} vertex gets cast to its own type. This cast can be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-175) Replace manual array copy to utility method call
Replace manual array copy to utility method call Key: GIRAPH-175 URL: https://issues.apache.org/jira/browse/GIRAPH-175 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Priority: Trivial {code} String[] zkJavaOptsArray = zkJavaOptsString.split(" "); if (zkJavaOptsArray != null) { for (String javaOpt : zkJavaOptsArray) { commandList.add(javaOpt); } }{code} Rather than doing the loop ourselves, Collections.addAll would be simpler (and faster, though that doesn't matter with such a small array). Still cleaner, though. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-174) ConnectedComponentsVertex for loops can be replaced with for-each loops
ConnectedComponentsVertex for loops can be replaced with for-each loops --- Key: GIRAPH-174 URL: https://issues.apache.org/jira/browse/GIRAPH-174 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Priority: Trivial {code}// First superstep is special, because we can simply look at the neighbors if (getSuperstep() == 0) { for (Iterator edges = iterator(); edges.hasNext();) { int neighbor = edges.next().get(); if (neighbor < currentComponent) { currentComponent = neighbor; } } // Only need to send value if it is not the own id if (currentComponent != getVertexValue().get()) { setVertexValue(new IntWritable(currentComponent)); for (Iterator edges = iterator(); edges.hasNext();) { int neighbor = edges.next().get(); if (neighbor > currentComponent) { sendMsg(new IntWritable(neighbor), getVertexValue()); } } }{code} Both of the for loops in this chunk from ConnectedComponentsVertex can be replaced with for(IntWritable i : iterator()) loops to be more idiomatic. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-172) Javadoc for BasicVertex:compute link to compute is broken
Javadoc for BasicVertex:compute link to compute is broken - Key: GIRAPH-172 URL: https://issues.apache.org/jira/browse/GIRAPH-172 Project: Giraph Issue Type: Bug Reporter: Jakob Homan Priority: Trivial In BasicVertex the JavaDoc link to #compute can't be resolved: {code} /** * Release unnecessary resources (will be called after vertex returns from * {@link #compute()}) */ abstract void releaseResources();{code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-173) BspCase:getNumWorkers javadoc refers to non-existent parameter
BspCase:getNumWorkers javadoc refers to non-existent parameter -- Key: GIRAPH-173 URL: https://issues.apache.org/jira/browse/GIRAPH-173 Project: Giraph Issue Type: Bug Reporter: Jakob Homan Priority: Trivial {code} /** * Get the number of workers used in the BSP application * * @param numProcs number of processes to use */ public int getNumWorkers() { return numWorkers; }{code} numProcs is a lie... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-171) total time in MasterThread.run() is calculated incorrectly
total time in MasterThread.run() is calculated incorrectly -- Key: GIRAPH-171 URL: https://issues.apache.org/jira/browse/GIRAPH-171 Project: Giraph Issue Type: Bug Reporter: Eugene Koontz Assignee: Eugene Koontz Attachments: GIRAPH-171.patch While running PageMarkBenchMark, I was seeing in the output: {{graph.MasterThread(172): total: Took 1.3336739262910001E9 seconds.}} This was because currently, in {{MasterThread.run()}}, we have: {code} LOG.info("total: Took " + ((System.currentTimeMillis() / 1000.0d) - setupSecs) + " seconds."); {code} but it should be: {code} LOG.info("total: Took " + ((System.currentTimeMillis() - startMillis) / 1000.0d) + " seconds."); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-170) Workflow for loading RDF graph data into Giraph
Workflow for loading RDF graph data into Giraph --- Key: GIRAPH-170 URL: https://issues.apache.org/jira/browse/GIRAPH-170 Project: Giraph Issue Type: New Feature Reporter: Dan Brickley Priority: Minor W3C RDF provides a family of Web standards for exchanging graph-based data. RDF uses sets of simple binary relationships, labeling nodes and links with Web identifiers (URIs). Many public datasets are available as RDF, including the "Linked Data" cloud (see http://richard.cyganiak.de/2007/10/lod/ ). Many such datasets are listed at http://thedatahub.org/ RDF has several standard exchange syntaxes. The oldest is RDF/XML. A simple line-oriented format is N-Triples. A format aligned with RDF's SPARQL query language is Turtle. Apache Jena and Any23 provide software to handle all these; http://incubator.apache.org/jena/ http://incubator.apache.org/any23/ This JIRA leaves open the strategy for loading RDF data into Giraph. There are various possibilites, including exploitation of intermediate Hadoop-friendly stores, or pre-processing with e.g. Pig-based tools into a more Giraph-friendly form, or writing custom loaders. Even a HOWTO document or implementor notes here would be an advance on the current state of the art. The BluePrints Graph API (Gremlin etc.) has also been aligned with various RDF datasources. Related topics: multigraphs https://issues.apache.org/jira/browse/GIRAPH-141 touches on the issue (since we can't currently easily represent fully general RDF graphs since two nodes might be connected by more than one typed edge). Even without multigraphs it ought to be possible to bring RDF-sourced data into Giraph, e.g. perhaps some app is only interested in say the Movies + People subset of a big RDF collection. >From Avery in email: "a helper VertexInputFormat (and maybe >VertexOutputFormat) would certainly [despite GIRAPH-141] still help" -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-169) How to close all child when a job finished?
How to close all child when a job finished? --- Key: GIRAPH-169 URL: https://issues.apache.org/jira/browse/GIRAPH-169 Project: Giraph Issue Type: Improvement Components: mapreduce Affects Versions: 0.2.0 Environment: sles 11 x64,jdk 1.6,hadoop 0.20.205.0,1 Master and 8 slaves, Reporter: Jianfeng Qian Priority: Minor I ran pagerank at hadoop 0.20.205.0. When the job finished,the child in slaves didn't quit immediately and sometimes they never quit and I have to kill them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-168) Simplify munge directive usage with new munge flag HADOOP_SECURE rather than HADOOP_FACEBOOK and HADOOP_NON_SECURE
Simplify munge directive usage with new munge flag HADOOP_SECURE rather than HADOOP_FACEBOOK and HADOOP_NON_SECURE -- Key: GIRAPH-168 URL: https://issues.apache.org/jira/browse/GIRAPH-168 Project: Giraph Issue Type: Improvement Reporter: Eugene Koontz This JIRA relates to the mail thread here: http://mail-archives.apache.org/mod_mbox/incubator-giraph-dev/201203.mbox/browser Currently we check for the munge flags HADOOP and HADOOP_FACEBOOK and HADOOP_NON_SECURE when using munge in a few places. Hopefully we can eliminate usage of munge in the future, but until then, we can mitigate the complexity by consolidating the number of flags checked. This JIRA proposes a single flag, HADOOP_SECURE, to handle the same conditional compilation requirements. It also makes it easier to add more maven profiles so that we can easily increase our hadoop version coverage. This patch modifies the existing hadoop_facebook profile to use the new HADOOP_SECURE munge flag, rather than HADOOP_FACEBOOK. It also adds a new hadoop maven profile, hadoop_trunk, which also sets HADOOP_SECURE. Finally, it adds a default profile, hadoop_0.20.203. This is needed so that we can specify its dependencies separately from hadoop_trunk, because the hadoop dependencies have changed between trunk and 0.205.0 - the former requires hadoop-common, hadoop-mapreduce-client-core, and hadoop-mapreduce-client-common, whereas the latter requires hadoop-core. With this patch, the following passes: {code} mvn clean verify && mvn -Phadoop_trunk clean verify && mvn -Phadoop_0.20.203 clean verify {code} Current problems: * I left in place the usage of HADOOP_NON_SECURE, but note that the profile that uses this is hadoop_non_secure, which fails to compile on trunk: https://issues.apache.org/jira/browse/GIRAPH-167 . * I couldn't get -Phadoop_facebook to work; does this work outside of Facebook? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-167) mvn -Phadoop_non_secure clean verify fails
mvn -Phadoop_non_secure clean verify fails -- Key: GIRAPH-167 URL: https://issues.apache.org/jira/browse/GIRAPH-167 Project: Giraph Issue Type: Bug Reporter: Eugene Koontz The {{hadoop_non_secure}} profile, which uses hadoop 0.20.2, is failing to compile: {code} [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/RangePartitionOwner.java:[26,27] package org.apache.hadoop.io does not exist [ERROR] /Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/BasicPartitionOwner.java:[26,29] package org.apache.hadoop.conf does not exist [ERROR] /Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/BasicPartitionOwner.java:[27,29] package org.apache.hadoop.conf does not exist [ERROR] /Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/PartitionOwner.java:[22,27] package org.apache.hadoop.io does not exist [ERROR] /Users/ekoontz/giraph/target/munged/main/org/apache/giraph/graph/partition/PartitionOwner.java:[27,40] cannot find symbol symbol: class Writable {code} (more error messages follow) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-166) add '*.patch' to list of files that Apache Rat ignores
add '*.patch' to list of files that Apache Rat ignores -- Key: GIRAPH-166 URL: https://issues.apache.org/jira/browse/GIRAPH-166 Project: Giraph Issue Type: Improvement Reporter: Eugene Koontz Priority: Trivial Attachments: GIRAPH-166.patch Apache Rat will complain about "too many files without licenses" if it finds any *.patch files in your working directory. Rat should ignore these since they are temp files that aren't included in the distribution. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-165) checkstyle error: 'conf'hides a field' on line 154 of GraphRunner
checkstyle error: 'conf'hides a field' on line 154 of GraphRunner - Key: GIRAPH-165 URL: https://issues.apache.org/jira/browse/GIRAPH-165 Project: Giraph Issue Type: Bug Reporter: Eugene Koontz Priority: Minor full checkstyle error is {code} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-164) fix 5 "Line is longer than 80 characters" style errors in GiraphRunner
fix 5 "Line is longer than 80 characters" style errors in GiraphRunner -- Key: GIRAPH-164 URL: https://issues.apache.org/jira/browse/GIRAPH-164 Project: Giraph Issue Type: Bug Reporter: Eugene Koontz Priority: Trivial {code} {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-163) bin/giraph script overwrites CLASSPATH if "dev environment" detected (this also removes USER_JAR from CLASSPATH)
bin/giraph script overwrites CLASSPATH if "dev environment" detected (this also removes USER_JAR from CLASSPATH) Key: GIRAPH-163 URL: https://issues.apache.org/jira/browse/GIRAPH-163 Project: Giraph Issue Type: Improvement Components: conf and scripts Affects Versions: 0.1.0, 0.2.0 Environment: current trunk of giraph, after running "mvn compile" (as advised in the quick start guide). Also Hadoop 1.0.1 was used. Reporter: Benjamin Heitmann If no ./lib dir is present, then the bin/giraph script assumes it is running in a "dev environment". This chooses an execution path through the bin/giraph script, which overwrites the CLASSPATH variable instead of appending to it. Incidentally, this also removes the name of the jar submitted by the user, which got appended to CLASSPATH earlier in the script. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-162) BspCase.setup() should catch FileNotFoundException thrown from org.apache.hadoop.fs.FileSystem.listStatus()
BspCase.setup() should catch FileNotFoundException thrown from org.apache.hadoop.fs.FileSystem.listStatus() --- Key: GIRAPH-162 URL: https://issues.apache.org/jira/browse/GIRAPH-162 Project: Giraph Issue Type: Bug Components: test Reporter: Eugene Koontz In hadoop trunk, org.apache.hadoop.fs.FileSystem.listStatus() is declared to throws both FileNotFoundException and IOException. The former (FileNotFoundException) is currently not caught when BspCase.setup() looks for the GiraphJob.ZOOKEEPER_MANAGER_DIR_DEFAULT directory in order to delete it. The listStatus() call throws FileNotException if this directory does not exist and causes several tests to fail when using Hadoop trunk. This exception should be caught and ignored during setup(), since it's not an error for this directory not to exist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-161) Handling null messages and edges when initializing IntIntNullIntVertex
Handling null messages and edges when initializing IntIntNullIntVertex -- Key: GIRAPH-161 URL: https://issues.apache.org/jira/browse/GIRAPH-161 Project: Giraph Issue Type: Bug Components: graph Affects Versions: 0.1.0 Reporter: Dionysios Logothetis Attachments: GIRAPH-161.patch The initialize() method in org.apache.giraph.graph.IntIntNullIntVertex should handle null messages or null edges. Especially initializing with null messages is a common case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-160) Vertex reader that reads adjacency lists with no vertex and edge values associated
Vertex reader that reads adjacency lists with no vertex and edge values associated -- Key: GIRAPH-160 URL: https://issues.apache.org/jira/browse/GIRAPH-160 Project: Giraph Issue Type: New Feature Components: lib Affects Versions: 0.1.0 Reporter: Dionysios Logothetis Priority: Minor A very common format of graphs is adjacency lists with no values associated to edges or vertices. For instance a line in the input can be of the type: 1 2 3 which represents a vertex with id 1 that has edges to vertices 2 and 3 with no values associated. I've created a vertex reader named AdjacencyListVertexReader which is essentially a copy of the AdjacencyListVertexReader modified to handle this format. It's an abstract class and subclasses can override the defaultVertexValue() and defaultEdgeValue() methods to provide default values for vertices and edges correspondingly (otherwise values are initialized to null). I've also created an example subclass. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-159) Case insensitive file/directory name matching will produce errors on M/R jar unpack.
Case insensitive file/directory name matching will produce errors on M/R jar unpack. - Key: GIRAPH-159 URL: https://issues.apache.org/jira/browse/GIRAPH-159 Project: Giraph Issue Type: Bug Components: build Affects Versions: 0.2.0 Environment: OSX 10.6.8 Reporter: Brian Femiano This only seems to affect platforms where there can be a file/directory naming conflicts from case insensitive matches. I was able to reproduce running the pseudo-distributed unit tests within OSX. This has affected other projects: https://issues.apache.org/jira/browse/MAHOUT-780 I've been able to reproduce this on my local OSX install with the following error: https://groups.google.com/a/cloudera.org/group/cdh-user/browse_thread/thread/a201218000e956d3/cc6eca3ef9f80ff8 Since LICENSE.txt contains the same content as the file LICENSE, I propose we exclude any LICENSE matches found in the unpacked dependency jars when the maven assembly phase hits 'jar-with-dependencies'. I have a patch which moves the 'jar-with-dependencies' descriptor to an external compile.xml file which has the proper excludes. This might also come in handy down the road should any additional tweaks be needed to the compile phase. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-158) Support YARN (next generation MapReduce)
Support YARN (next generation MapReduce) Key: GIRAPH-158 URL: https://issues.apache.org/jira/browse/GIRAPH-158 Project: Giraph Issue Type: New Feature Reporter: Eugene Koontz YARN is a re-architecturing of the Hadoop MapReduce framework, described here: http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html It would be good to offer support within Giraph for this framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-157) Vertex to perform graph coloring on simple, connected, undirected graphs and related test.
Vertex to perform graph coloring on simple, connected, undirected graphs and related test. -- Key: GIRAPH-157 URL: https://issues.apache.org/jira/browse/GIRAPH-157 Project: Giraph Issue Type: Test Components: examples, test Affects Versions: 0.2.0 Reporter: Eli Reisman Assignee: Eli Reisman Priority: Trivial Hi. I am attempting to learn the Hadoop and Giraph codebases and wanted to write a simple client application for Giraph to help me learn the ins and outs of it. This is a simple unit test and vertex modeled after the ConnectedComponentsVertex and related test. The vertex test runs whenever you run the "mvn test" or "mvn verify" suite of tests. When finished processing, each vertex will have an integer value that is its color. This is a pretty simple implementation, and although I have tested it on a number of small graphs of varied trickiness and it seems to rapidly arrive at a minimal coloring, its hard (for me at least) to guess which possible coloring it will arrive at and I have no idea how it will do on really big graphs yet without finding some more pre-colored larger test graphs to try it on. Ideas anyone? Anyway, it was fun to put this together, and I'd be happy to improve it or receive some help or advice to further the cause. Thanks again, I am hoping this will be the first of many (hopefully more useful) contributions! Eli -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-156) Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner
Users should be able to set simple 'custom arguments' via org.apache.giraph.GiraphRunner Key: GIRAPH-156 URL: https://issues.apache.org/jira/browse/GIRAPH-156 Project: Giraph Issue Type: Improvement Components: conf and scripts Affects Versions: 0.1.0 Reporter: Sebastian Schelter Assignee: Sebastian Schelter Some vertices need custom arguments to run. The SimpleShortestPathsVertex for example needs to know the source vertex for the computation which is saved in the job's Configuration as _SimpleShortestPathsVertex.sourceId_. Users should be able to apply such simple custom arguments via GiraphRunner. I propose to add a new option _--customArguments_ where users can supply arguments in the form _=,=_ for this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-155) Allow creation of graph by adding edges that span multiple workers
Allow creation of graph by adding edges that span multiple workers -- Key: GIRAPH-155 URL: https://issues.apache.org/jira/browse/GIRAPH-155 Project: Giraph Issue Type: New Feature Components: graph, lib Affects Versions: 0.1.0 Reporter: Dionysios Logothetis Currently a graph is created only be adding vertices. The typical way is to read input text files line-by-line with each line describing a vertex (its value, its edges etc). The current API allows for the creation of a vertex only if all the information for the vertex is available in a single line. However, it's common to have graphs described in the form of edges. Edges might span multiple lines in an input file or even span multiple workers. The current API doesn't allow this. In the input superstep, a vertex must be created by a single worker. Instead, it should be possible for multiple workers to mutate the graph during the input superstep. This has the following implications: (i) Instead of just instantiating a vertex, a vertex reader should be able to do vertex addition and edge addition requests. (ii) Multiple workers might try to create the same vertex. Any conflicts should be handled with a VertexResolver. So the resolver has to be instantiated before load time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-154) Worker ports are not synched properly with its peers
Worker ports are not synched properly with its peers Key: GIRAPH-154 URL: https://issues.apache.org/jira/browse/GIRAPH-154 Project: Giraph Issue Type: Bug Components: bsp Affects Versions: 0.2.0 Reporter: Zhiwei Gu Assignee: Zhiwei Gu When worker trying multiple ports to setup the rpc server, the final port is not synched with it's peer workers properly, and resulted in peer workers send message to the default port. Here is some logs: Base port: 34900 log for worker 161: IPC Server handler 98 on 36061: starting BasicRPCCommunications: Started RPC communication server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:36061 with 100 handlers and 199 flush threads on bind attempt 1 IPC Server handler 99 on 36061: starting setup: Registering health of this worker... getJobState: Job state already exists (/_hadoopBsp/job_201203130609_14838/_masterJobState) getApplicationAttempt: Node /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists! getApplicationAttempt: Node /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir already exists! registerHealth: Created my health node for attempt=0, superstep=-1 with /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/gsta32085.tan.ygrid.yahoo.com_161 and workerInfo= Worker(hostname=gsta32085.tan.ygrid.yahoo.com, MRpartition=161, port=35061) process: partitionAssignmentsReadyChanged (partitions are assigned) startSuperstep: Ready for computation on superstep -1 since worker selection and vertex range assignments are done in /_hadoopBsp/job_201203130609_14838/_applicationAttemptsDir/0/_superstepDir/-1/_partitionAssignments Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 0 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 1 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 2 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 3 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 4 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 5 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 6 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 7 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 8 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 9 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 10 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 11 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 12 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 13 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 14 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 15 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 16 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 17 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 18 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 19 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 20 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 21 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 22 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 23 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 24 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 25 time(s). Retrying connect to server: gsta32085.tan.ygrid.yahoo.com/10.216.148.47:35061. Already tried 26 time(s). Retrying connect to s
[jira] [Created] (GIRAPH-153) HBase/Accumulo Input and Output formats
HBase/Accumulo Input and Output formats --- Key: GIRAPH-153 URL: https://issues.apache.org/jira/browse/GIRAPH-153 Project: Giraph Issue Type: New Feature Components: bsp Affects Versions: 0.1.0 Environment: Single host OSX 10.6.8 2.2Ghz Intel i7, 8GB Reporter: Brian Femiano Attachments: AccumuloRootMarker.java, AccumuloRootMarkerInputFormat.java, AccumuloRootMarkerOutputFormat.java, AccumuloVertexInputFormat.java, AccumuloVertexOutputFormat.java, ComputeIsRoot.java, DistributedCacheHelper.java, HBaseVertexInputFormat.java, HBaseVertexOutputFormat.java, IdentifyAndMarkRoots.java, SetLongWritable.java, SetTextWritable.java, TableRootMarker.java, TableRootMarkerInputFormat.java, TableRootMarkerOutputFormat.java Four abstract classes that wrap their respective delegate input/output formats for easy hooks into vertex input format subclasses. I've included some sample programs that show two very simple graph algorithms. I have a graph generator that builds out a very simple direct structure, starting with a few 'root' nodes. Root nodes are defined as nodes that is not listed as a child anywhere in the graph. Algorithm 1) AccumuloRootMarker.java --> Accumulo as read/write source. Every vertex starts thinking it's a root. At superstep 0, send a message down to each child as a non-root notification. After superstep 1, only root nodes will have never been messaged. Algorithm 2) TableRootMarker --> HBase as read/write source. Expands on A1 by bundling the notification logic followed by root node propagation. Once we've marked the appropriate nodes as roots, tell every child which roots it can be traced back to via one or more spanning trees. This will take N + 2 supersteps where N is the maximum number of hops from any root to any leaf, plus 2 supersteps for the initial root flagging. I've included all relevant code plus DistributedCacheHelper.java for recursive cache file and archive searches. It is more hadoop centric than giraph in particular, but these jobs use it so I figured why not commit here. These have been tested through local JobRunner, pseudo-distributed on the aforementioned hardware, and full distributed on EC2. More details in the comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-152) NPE at PageRankBenchmark
NPE at PageRankBenchmark Key: GIRAPH-152 URL: https://issues.apache.org/jira/browse/GIRAPH-152 Project: Giraph Issue Type: Bug Components: examples Affects Versions: 0.2.0 Environment: Hadoop-0.20.205.0 Linux: Amazon EC2, standard one "Amazon Linux 32 bit" Giraph: compiled from CL 1245205 Reporter: Yury Litvinov 1. I've copied hadoop-0.20.205.0 into Amazon EC2 linux 2. Compiled latest Giraph (giraph-0.2-SNAPSHOT-jar-with-dependencies.jar) from sources (CL 1245205) and copied it to Linux as well: 3. Run this command as suggested in docs (https://cwiki.apache.org/confluence/display/GIRAPH/Quick+Start+Guide) > hadoop-0.20.205.0/bin/hadoop jar > giraph-0.2-SNAPSHOT-jar-with-dependencies.jar > org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 5 -w 1 OBSERVED: {code} Exception in thread "main" java.lang.NullPointerException at org.apache.giraph.benchmark.PageRankBenchmark.run(PageRankBenchmark.java:127) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.giraph.benchmark.PageRankBenchmark.main(PageRankBenchmark.java:162) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-151) IntIntNullIntVertex.initialize method should handle "null" edges argument
IntIntNullIntVertex.initialize method should handle "null" edges argument - Key: GIRAPH-151 URL: https://issues.apache.org/jira/browse/GIRAPH-151 Project: Giraph Issue Type: Bug Components: graph Affects Versions: 0.1.0 Environment: Linux 2.6.18-028stab095.1-PAE Reporter: yavuz gokirmak Priority: Trivial Fix For: 0.1.0 IntIntNullIntVertex.initialize method should handle "null" edges argument because in VertexResolver.resolve method (line:91) vertex.initialize is called with edge argument as null -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-150) PageRankBenchmark accesses wrong conf after GiraphJob is created
PageRankBenchmark accesses wrong conf after GiraphJob is created Key: GIRAPH-150 URL: https://issues.apache.org/jira/browse/GIRAPH-150 Project: Giraph Issue Type: Bug Reporter: Avery Ching Assignee: Avery Ching -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-149) Clone Vertex on loading
Clone Vertex on loading --- Key: GIRAPH-149 URL: https://issues.apache.org/jira/browse/GIRAPH-149 Project: Giraph Issue Type: Bug Components: bsp Affects Versions: 0.1.0, 0.2.0 Reporter: Zechao Shang Priority: Minor AFAIK, it's a documented behavior that Hadoop io reuses instance on loading data. Check BspServiceWorker#readVerticesFromInputSplit, readerVertex maybe reused by RecordReader(at least our SequenceFileVertexReader do), and must be cloned somewhere. In my opinion, our inherited RecordReaders should follow the behavior of Hadoop's RecordReader, and the vertex should be cloned in BspServiceWorker#readVerticesFromInputSplit. Just calling org.apache.hadoop.io.WritableUtils.clone will be fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-148) giraph-site.xml needs Apache head
giraph-site.xml needs Apache head - Key: GIRAPH-148 URL: https://issues.apache.org/jira/browse/GIRAPH-148 Project: Giraph Issue Type: Bug Components: conf and scripts Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.1.0 Attachments: GIRAPH-148.patch I forgot to add the license to the conf file and now rat is failing... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-147) Add Blueprints Tinkerpop support
Add Blueprints Tinkerpop support Key: GIRAPH-147 URL: https://issues.apache.org/jira/browse/GIRAPH-147 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Priority: Minor Got this issue on the old Giraph GitHub (deprecated). Moving it here. jeffg2k opened this issue 2 hours ago Hoping that Giraph might add TinkerPop Blueprint support. :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-146) Maven is running the tests twice during builds
Maven is running the tests twice during builds -- Key: GIRAPH-146 URL: https://issues.apache.org/jira/browse/GIRAPH-146 Project: Giraph Issue Type: Bug Components: build Reporter: Jakob Homan I had a feeling the build time had jumped significantly... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-145) Change partition request log level to debug rather than info
Change partition request log level to debug rather than info Key: GIRAPH-145 URL: https://issues.apache.org/jira/browse/GIRAPH-145 Project: Giraph Issue Type: Improvement Components: bsp Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 {code:title=BasicRPCCommunications.java|borderStyle=solid} if (LOG.isInfoEnabled()) { LOG.info("sendPartitionReq: Sending to " + rpcProxy.getName() + " " + addr + " from " + workerInfo + ", with partition " + partition); }{code} is too chatty. We're seeing thousands and sounds of these lines for larger graphs. This should be at debug level... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-144) GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc)
GiraphJob should not extend Job (users should not be able to call Job methods like waitForCompletion or setMapper..etc) Key: GIRAPH-144 URL: https://issues.apache.org/jira/browse/GIRAPH-144 Project: Giraph Issue Type: Bug Reporter: Dave -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-143) Add support for giraph to have a conf file
Add support for giraph to have a conf file -- Key: GIRAPH-143 URL: https://issues.apache.org/jira/browse/GIRAPH-143 Project: Giraph Issue Type: New Feature Affects Versions: 0.2.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.2.0 Currently one must provide all the Giraph-specific config values either via the command line or snuck into another project's conf file. Any self-respecting Hadoop ecosystem project should have its own conf file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-142) _hadoopBsp should be prefixable via configuration
_hadoopBsp should be prefixable via configuration - Key: GIRAPH-142 URL: https://issues.apache.org/jira/browse/GIRAPH-142 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Assignee: Jakob Homan In multitennant zookeeper clusters, it would be good to be able to specify the base directory that's created for the _hadoopBsp znodes. This would also fix the issue we have with creating that directory in the source root during tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-141) mulitgraph support in giraph
mulitgraph support in giraph Key: GIRAPH-141 URL: https://issues.apache.org/jira/browse/GIRAPH-141 Project: Giraph Issue Type: Improvement Components: graph Reporter: André Kelpe The current vertex API only supports simple graphs, meaning that there can only ever be one edge between two vertices. Many graphs like the road network are in fact multigraphs, where many edges can connect two vertices at the same time. Support for this could be added by introducing an Iterator getEdgeValue() or a similar construct. Maybe introducing a slim object like a Connector between the edge and the vertex is also a good idea, so that you could do something like: for (final Connector conn: getEdgeValues(){ final EdgeWritable edge = conn.getEdge(); final VertexWritable otherVertex = conn.getOther(); // do interesting stuff } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-140) Enforce a maximum number of iterations
Enforce a maximum number of iterations -- Key: GIRAPH-140 URL: https://issues.apache.org/jira/browse/GIRAPH-140 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan While Giraph is still on MR and keeping track of its statistics via Hadoop's counters, there is the danger that a huge number of iterations will negatively impact the cluster's jobtracker by adding counter statistics for each one (basically, the flip side of GIRAPH-52). We should have a configurable maximum number of iterations to prevent this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-139) Change PageRankBenchmark to use be accessible via bin/giraph
Change PageRankBenchmark to use be accessible via bin/giraph Key: GIRAPH-139 URL: https://issues.apache.org/jira/browse/GIRAPH-139 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Currently the PageRankBenchmark has its own main and tool implementation and is difficult to access from the bin/giraph script. It would be better if everything were accessible via bin/giraph. The benchmark is particularly problematic because it uses inner classes for its two actual Vertex implementations, which have to be specified on the command line as their .class name(ie org.apache.giraph.benchmark.PageRankBenchmark$PageRankHashMapVertex) rather than just with dots, as one would expect. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-138) Don't throw stack trace for classes that aren't vertices
Don't throw stack trace for classes that aren't vertices Key: GIRAPH-138 URL: https://issues.apache.org/jira/browse/GIRAPH-138 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Currently if one passes in a class that isn't a vertex, we throw up a complete stack trace: {noformat}[tardis giraph-0.1]$ bin/giraph lib/giraph-0.1.jar org.apache.giraph.benchmark.PageRankBenchmark -w 10 -if org.apache.giraph.benchmark.PseudoRandomVertexInputFormat Exception in thread "main" java.lang.RuntimeException: class org.apache.giraph.benchmark.PageRankBenchmark not org.apache.giraph.graph.BasicVertex at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:858) at org.apache.giraph.graph.GiraphJob.setVertexClass(GiraphJob.java:395) at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:94) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:132) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156){noformat} This type of user error is routine and should be caught and result in a more descriptive error message. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-137) De-duplicate pagerank implementation in PageRankBenchmark
De-duplicate pagerank implementation in PageRankBenchmark - Key: GIRAPH-137 URL: https://issues.apache.org/jira/browse/GIRAPH-137 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Priority: Minor Currently in PageRankBenchmark we have the code for pagerank duplicated in each of the implementations of Vertex: {noformat}public static class PageRankHashMapVertex extends HashMapVertex< LongWritable, DoubleWritable, DoubleWritable, DoubleWritable> { @Override public void compute(Iterator msgIterator) { if (getSuperstep() >= 1) { double sum = 0; while (msgIterator.hasNext()) { sum += msgIterator.next().get(); } DoubleWritable vertexValue = new DoubleWritable((0.15f / getNumVertices()) + 0.85f * sum); setVertexValue(vertexValue); } if (getSuperstep() < getConf().getInt(SUPERSTEP_COUNT, -1)) { long edges = getNumOutEdges(); sendMsgToAllEdges( new DoubleWritable(getVertexValue().get() / edges)); } else { voteToHalt(); } } } public static class PageRankEdgeListVertex extends EdgeListVertex< LongWritable, DoubleWritable, DoubleWritable, DoubleWritable> { @Override public void compute(Iterator msgIterator) { if (getSuperstep() >= 1) { double sum = 0; while (msgIterator.hasNext()) { sum += msgIterator.next().get(); } DoubleWritable vertexValue = new DoubleWritable((0.15f / getNumVertices()) + 0.85f * sum); setVertexValue(vertexValue); } if (getSuperstep() < getConf().getInt(SUPERSTEP_COUNT, -1)) { long edges = getNumOutEdges(); sendMsgToAllEdges( new DoubleWritable(getVertexValue().get() / edges)); } else { voteToHalt(); } } }{noformat} This code can be consolidated into private class and the two implementations just extend that. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-136) Erorr message for bin/giraph could be improved
Erorr message for bin/giraph could be improved -- Key: GIRAPH-136 URL: https://issues.apache.org/jira/browse/GIRAPH-136 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Currently when one just runs bin/giraph without the required jar, the message isn't very helpful: {noformat}[tardis giraph-0.1]$ bin/giraph Can't find user jar to execute.{noformat} It would be better to have a more in-depth message explaining Giraph and what is expected. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-135) Need DISCLAIMER for incubator
Need DISCLAIMER for incubator - Key: GIRAPH-135 URL: https://issues.apache.org/jira/browse/GIRAPH-135 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Assignee: Jakob Homan Releases need to have a DISCLAIMER file. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-134) Fix NOTICE and LICENSE files
Fix NOTICE and LICENSE files Key: GIRAPH-134 URL: https://issues.apache.org/jira/browse/GIRAPH-134 Project: Giraph Issue Type: Improvement Components: documentation Affects Versions: 0.1.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.1.0 Currently both the LICENSE and NOTICE file are out of compliance for an Apache release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-133) Type in JavaDoc in BspCase::remove
Type in JavaDoc in BspCase::remove -- Key: GIRAPH-133 URL: https://issues.apache.org/jira/browse/GIRAPH-133 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Priority: Trivial Configuration is spelled wrong in the javadoc: {noformat}/** * Helper method to remove a path if it exists. * * @param conf Configutation * @param path Path to remove * @throws IOException */ public static void remove(Configuration conf, Path path) throws IOException { FileSystem hdfs = FileSystem.get(conf); hdfs.delete(path, true); }{noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-132) Simplify boolean expression in GraphMapper::map()
Simplify boolean expression in GraphMapper::map() - Key: GIRAPH-132 URL: https://issues.apache.org/jira/browse/GIRAPH-132 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Priority: Trivial The boolean expression in: {noformat}@Override public void map(Object key, Object value, Context context) throws IOException, InterruptedException { // map() only does computation // 1) Run checkpoint per frequency policy. // 2) For every vertex on this mapper, run the compute() function // 3) Wait until all messaging is done. // 4) Check if all vertices are done. If not goto 2). // 5) Dump output. if (done == true) { return; }{noformat} can be simplified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-131) enable creation of test-jars to simplify testing in downstream projects
enable creation of test-jars to simplify testing in downstream projects --- Key: GIRAPH-131 URL: https://issues.apache.org/jira/browse/GIRAPH-131 Project: Giraph Issue Type: Improvement Reporter: André Kelpe Priority: Minor Attachments: GIRAPH-131.patch Attached patch enables the creation of test-jars, which are the tests packaged in a separate jar file. This makes it possible to use the super-useful test infrastructure in MockUtils in downstream projects. If you add the patch, you will get a ${giraph.version}-tests.jar, which can be used for downstream testing like this: org.apache.giraph giraph ${giraph.version} test-jar test P.S.: The patch also resets the version to 0.1-SNAPSHOT as discussed in GIRAPH-129 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-130) Fix Javadoc warnings
Fix Javadoc warnings Key: GIRAPH-130 URL: https://issues.apache.org/jira/browse/GIRAPH-130 Project: Giraph Issue Type: Bug Reporter: Jakob Homan Priority: Minor We've accumulated a fair number of javadoc warnings recently: {noformat}[WARNING] Javadoc Warnings [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:129: warning - @param argument "superstep" is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84: warning - @param argument "vertexIndex" is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java:84: warning - @param argument "msgList" is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32: warning - Tag @link: reference not found: VertexIdMessage [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46: warning - Tag @link: reference not found: messages [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46: warning - Tag @link: reference not found: messages [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/AggregatorWriter.java:60: warning - @param argument "map" is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/GiraphJob.java:432: warning - @param argument "graphPartitionerClass" is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java:46: warning - Tag @link: reference not found: messages [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java:62: warning - @param argument "availableWorkerInfos" is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java:176: warning - @param argument "allPartitionStatsList" is not a parameter name. [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32: warning - Tag @link: reference not found: VertexIdMessage [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32: warning - Tag @link: reference not found: VertexIdMessage [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32: warning - Tag @link: reference not found: VertexIdMessage [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java:146: warning - Tag @link: reference not found: GraphPartitioner [WARNING] /Users/jhoman/repos/giraph/src/main/java/org/apache/giraph/comm/VertexIdMessagesList.java:32: warning - Tag @link: reference not found: VertexIdMessage {noformat} It would be good to fix these. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-129) enable creation of javadoc and sources jars
enable creation of javadoc and sources jars --- Key: GIRAPH-129 URL: https://issues.apache.org/jira/browse/GIRAPH-129 Project: Giraph Issue Type: Improvement Components: build Affects Versions: 0.1.0 Reporter: André Kelpe Priority: Minor Attachments: GIRAPH-129.patch It is pretty useful to enable the creation if javadoc and sources jars during the build, so that people using IDEs like eclipse can easily jump into the code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-128) RPC port from BasicRPCCommunications should be only a starting port, and retried
RPC port from BasicRPCCommunications should be only a starting port, and retried Key: GIRAPH-128 URL: https://issues.apache.org/jira/browse/GIRAPH-128 Project: Giraph Issue Type: Improvement Affects Versions: 0.1.0 Reporter: Avery Ching Assignee: Avery Ching Currently Giraph uses a basic port + the task partition to get the RPC port. This doesn't work well for when there are multiple Giraph jobs running simultaneously in the same Hadoop cluster (port conflict). At the same time, it is nice to use this simple algorithm because it makes it very easy to debug problems (you can find the troublesome mapper from the RPC port name). I will be proposing a simple scheme to retry with another port. I will round the total number of mappers up to the nearest power of 10 (let's that that number Z). Then I will increment the port number by Z, retrying up to 20 tries. If you have enough ports, this scheme would guarantee that up to 20 mappers / node would be supported. It should be sufficient for most clusters. At the same time, we still maintain the easy debugging method since you it's still easy to figure out the mapper partition from the port (port % Z = map partition). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-127) Extending the API with a master.compute() function.
Extending the API with a master.compute() function. --- Key: GIRAPH-127 URL: https://issues.apache.org/jira/browse/GIRAPH-127 Project: Giraph Issue Type: New Feature Components: bsp, examples, graph Reporter: Semih Salihoglu First of all, sorry for the long explanation to this feature. I want to expand the API of Giraph with a new function called master.compute(), that would get called at the master before each superstep and I will try to explain the purpose that it would serve with an example. Let's say we want to implement the following simplified version of the k-means clustering algorithm. Pseudocode below: * Input G(V, E), k, numEdgesThreshold, maxIterations * Algorithm: * int numEdgesCrossingClusters = Integer.MAX_INT; * int iterationNo = 0; * while ((numEdgesCrossingCluster > numEdgesThreshold) && iterationNo < maxIterations) { *iterationNo++; *int[] clusterCenters = pickKClusterCenters(k, G); *findClusterCenters(G, clusterCenters); *numEdgesCrossingClusters = countNumEdgesCrossingClusters(); * } The algorithm goes through the following steps in iterations: 1) Pick k random initial cluster centers 2) Assign each vertex to the cluster center that it's closest to (in Giraph, this can be implemented in message passing similar to how ShortestPaths is implemented): 3) Count the nuimber of edges crossing clusters 4) Go back to step 1, if there are a lot of edges crossing clusters and we haven't exceeded maximum number of iterations yet. In an algorithm like this, step 2 and 3 are where most of the work happens and both parts have very neat message-passing implementations. I'll try to give an overview without going into the details. Let's say we define a Vertex in Giraph to hold a custom Writable object that holds 2 integer values and sends a message with upto 2 integer values. Step 2 is very similar to ShortestPaths algorithm and has two stages: In the first stage, each vertex checks to see whether or not it's one of the cluster centers. If so, it assigns itself the value (id, 0), otherwise it assigns itself (Null, Null). In the 2nd stage, the vertices assign themselves to the minimum distance cluster center by looking at their neighbors (cluster centers, distance) values (received as 2 integer messages) and their current values, and changing their values if they find a lower distance cluster center. This happens in x number of supersteps until every vertex converges. Step 3, counting the number of edges crossing clusters, is also very easy to implement in Giraph. Once each vertex has a cluster center, the number of edges crossing clusters can be counted by an aggregator, let's say called "num-edges-crossing". It would again have two stages: First stage, every vertex just sends its cluster id to all its neighbors. Second stage, every vertex looks at their neighbors' cluster ids in the messages, and for each cluster id that is not equal to its own cluster id, it increments "num-edges-crossing" by 1. The other 2 steps, step 1 and 4, are very simple sequential computations. Step 1 just picks k random vertex ids and puts it into an aggregator. Step 4 just compares "num-edges-crossing" by a threshold and also checks whether or not the algorithm has exceeded maxIterations (not supersteps but iterations of going through Steps 1-4). With the current API, it's not clear where to do these computations. There is a per worker function preSuperstep() that can be implemented, but if we decide to pick a special worker, let's say worker 1, to pick the k vertices then we'd waste an entire superstep where only worker 1 would do work, (by picking k vertices in preSuperstep() and put them into an aggregator), and all other workers would be idle. Trying to do this in worker 1 in postSuperstep() would not work either because, worker 1 needs to know that all the vertices have converged to understand that it's time to pick k vertices or it's time do check in step 4, which would only be available to it in the beginning of the next superstep. A master.compute() extension would run at the master and before the superstep and would modify the aggregator that would keep the k vertices before the aggregators are broadcast to the workers, which are all very short sequential computations, so they would not waste resources the way a preSuperstep() or postSuperstep() approach would do. It would also enable running new algorithms like kmeans that are composed of very vertex-centric computations glued together by small sequential ones. It would basically boost Giraph with sequential computation in a non-wasteful way. I am a phd student at Stanford and I have been working on my own BSP/Pregel implementation since last year. It's called GPS. I haven't distributed it, mainly because in September I learned about Giraph and I decided to sl
[jira] [Created] (GIRAPH-126) Use Collections.emptyList() in BasicRPCCommunications.java
Use Collections.emptyList() in BasicRPCCommunications.java -- Key: GIRAPH-126 URL: https://issues.apache.org/jira/browse/GIRAPH-126 Project: Giraph Issue Type: Improvement Reporter: André Kelpe Priority: Minor I am doing some tests with giraph and I am having some memory problems. While I was browsing through the codebase I saw that you are allocating a new ArrayList (which has an underlying array of 10 elements) for each Vertex, that has no Messages to be delivered. That's a waste of memory and time. This patch replaces it with the EMPTY_LIST of the Collections utility class. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-125) Bug in LongDoubleFloatDoubleVertex.sendMsgToAllEdges()
Bug in LongDoubleFloatDoubleVertex.sendMsgToAllEdges() -- Key: GIRAPH-125 URL: https://issues.apache.org/jira/browse/GIRAPH-125 Project: Giraph Issue Type: Bug Components: graph Affects Versions: 0.1.0 Reporter: Yuanyuan Tian I just found a bug in the sendMsgToAllEdges() function of the LongDoubleFloatDoubleVertex class. The segment of the code that contains the bug is: final LongWritable destVertex = new LongWritable(); final MutableVertex vertex = this; verticesWithEdgeValues.forEachKey(new LongProcedure() { @Override public boolean apply(long destVertexId) { destVertex.set(destVertexId); vertex.sendMsg(destVertex, msg); return true; } }); Here destVertex is a final object, but this single object is reused in the forEachKey function many times. Each time its actual value is changed but the same object is put to the underlying message list (a hashmap) through vertex.sendMsg. Because the single destVertex object has been put into the underlying hashmap again and again, destVertex.set(destVertexId) will change the existing keys in the hashmap. Eventually, every keys added to the hash map will have the same value as the last key. A simple fix is as follows: final MutableVertex vertex = this; verticesWithEdgeValues.forEachKey(new LongProcedure() { @Override public boolean apply(long destVertexId) { vertex.sendMsg(new LongWritable(destVertexId), msg); return true; } }); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-124) Combiner should return Iterable instead of M or null.
Combiner should return Iterable instead of M or null. Key: GIRAPH-124 URL: https://issues.apache.org/jira/browse/GIRAPH-124 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.1.0 Reporter: Claudio Martella Currently VertexCombiner is expected to return a single message combining the input messages, or null in case no message should be sent. The new expected interface should return an Iterable, possibly empty. The number of elements in the returned Iterable is supposed to be smaller than the number of input messages, by the initial definition of a Combiner (defined as a function to reduce I/O by combining multiple messages into 1). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-123) the wiki is not publicly accessible
the wiki is not publicly accessible --- Key: GIRAPH-123 URL: https://issues.apache.org/jira/browse/GIRAPH-123 Project: Giraph Issue Type: Bug Components: documentation Reporter: André Kelpe Priority: Minor When I try to read the documentation on the wiki I end up on a login screen. Can you please make the wiki open for the public. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-122) Roll version back to 0.1
Roll version back to 0.1 Key: GIRAPH-122 URL: https://issues.apache.org/jira/browse/GIRAPH-122 Project: Giraph Issue Type: Improvement Reporter: Jakob Homan Assignee: Jakob Homan Per the vote on the list, we're going to roll Giraph back to 0.1. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-121) BasicVertexResolver should implementation and VertexResolver should be interface
BasicVertexResolver should implementation and VertexResolver should be interface Key: GIRAPH-121 URL: https://issues.apache.org/jira/browse/GIRAPH-121 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Claudio Martella Assignee: Claudio Martella Priority: Trivial After change of naming in Vertex, VertexResolver and BasicVertexResolver naming should be synched. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-120) Add Sebastian Schelter to site
Add Sebastian Schelter to site -- Key: GIRAPH-120 URL: https://issues.apache.org/jira/browse/GIRAPH-120 Project: Giraph Issue Type: Task Reporter: Sebastian Schelter Assignee: Sebastian Schelter -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-119) VertexCombiner should work on Iterable instead of List
VertexCombiner should work on Iterable instead of List Key: GIRAPH-119 URL: https://issues.apache.org/jira/browse/GIRAPH-119 Project: Giraph Issue Type: Improvement Components: graph Affects Versions: 0.70.0 Reporter: Claudio Martella Assignee: Claudio Martella Currently VertexCombiner expects a List. It should be refactored to Iterable to sync with Iterable-based BasicVertex messages logics. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-118) Clarify messages behavior in BasicVertex
Clarify messages behavior in BasicVertex Key: GIRAPH-118 URL: https://issues.apache.org/jira/browse/GIRAPH-118 Project: Giraph Issue Type: Improvement Components: graph Reporter: Claudio Martella Priority: Minor -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-117) DefaultWorkerContext should preserve the method signatures of WorkerContext
DefaultWorkerContext should preserve the method signatures of WorkerContext --- Key: GIRAPH-117 URL: https://issues.apache.org/jira/browse/GIRAPH-117 Project: Giraph Issue Type: Improvement Affects Versions: 0.70.0 Reporter: Sebastian Schelter Assignee: Sebastian Schelter Priority: Trivial DefaultWorkerContext.preApplication() swallows the InstantiationException and IllegalAccessException of WorkerContext.preApplication(). These should be preserved for applications that want to register an aggregator in this method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-116) Make EdgeListVertex the default vertex implementation
Make EdgeListVertex the default vertex implementation - Key: GIRAPH-116 URL: https://issues.apache.org/jira/browse/GIRAPH-116 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching I think this would best for new users as it is much more memory efficient than Vertex with respect to edges (list vs hash map). We seem to be mostly iterating over the edges (as several others had pointed out in earlier JIRAs and emails), so this would provide early users with a more memory efficient implementation without performance loss. If anyone disagrees, please voice your opinions. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-115) Port of the HCC algorithm for identifying all connected components of a graph
Port of the HCC algorithm for identifying all connected components of a graph - Key: GIRAPH-115 URL: https://issues.apache.org/jira/browse/GIRAPH-115 Project: Giraph Issue Type: New Feature Affects Versions: 0.70.0 Reporter: Sebastian Schelter Port of the HCC algorithm that identifies connected components and assigns a componented id (the smallest vertex id in the component) to each vertex. The idea behind the algorithm is very simple: propagate the smallest vertex id along the edges to all vertices of a connected component until convergence. The number of supersteps necessary is equal to the length of the maximum diameter of all components + 1 The original Hadoop-based variant of this algorithm was proposed by Kang, Charalampos, Tsourakakis and Faloutsos in "PEGASUS: Mining Peta-Scale Graphs", 2010 http://www.cs.cmu.edu/~ukang/papers/PegasusKAIS.pdf -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-114) Inconsistent message map handling in BasicRPCCommunications.LargeMessageFlushExecutor
Inconsistent message map handling in BasicRPCCommunications.LargeMessageFlushExecutor - Key: GIRAPH-114 URL: https://issues.apache.org/jira/browse/GIRAPH-114 Project: Giraph Issue Type: Bug Affects Versions: 0.70.0 Reporter: Sebastian Schelter Priority: Critical Attachments: GIRAPH-114.patch I'm currently implementing a simple algorithm to identify all the connected components of a graph. The algorithm ran well in a local IDE unit tests on toy data and in a local single node hadoop instance using a graph of ~100k edges. When I tested it on a real cluster with the wikipedia pagelink graph (5.7M vertices, 130M edges), I ran into strange exceptions like this: {noformat} 2011-12-21 12:03:57,015 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201112131541_0034_m_27_0: java.lang.IllegalStateException: run: Caught an unrecoverable exception flush: Got ExecutionException at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) at org.apache.hadoop.mapred.Child$4.run(Child.java:259) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.Child.main(Child.java:253) Caused by: java.lang.IllegalStateException: flush: Got ExecutionException at org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:946) at org.apache.giraph.graph.BspServiceWorker.finishSuperstep(BspServiceWorker.java:916) at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:588) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:632) ... 7 more Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalStateException: run: Impossible for no messages in 1603276 at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) at java.util.concurrent.FutureTask.get(FutureTask.java:83) at org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:941) ... 10 more Caused by: java.lang.IllegalStateException: run: Impossible for no messages in 1603276 at org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:245) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) {noformat} The exception is thrown because a vertex with no message to send to is found in the datastructure holding the outgoing messages. I tracked this behavior down: In *BasicRPCCommunications:541-546* the map holding the outgoing messages for vertices of a particular machine is created. It's stored in two places _BasicRPCCommunications.outMessages_ and as member variable _outMessagesPerPeer_ of its _PeerConnection_ : {noformat} outMsgMap = new HashMap>(); outMessages.put(addrUnresolved, outMsgMap); PeerConnection peerConnection = new PeerConnection(outMsgMap, peer, isProxy); {noformat} In case that there are a lot of messages available for a particular vertex, a large flush is trigged via _LargeMessageFlushExecutor_ (I guess this only happened in the wikipedia test). During this flush the list of messages for the vertex is sent out and replaced with an empty list in *BasicRPCCommunications:341* {noformat} outMessageList = peerConnection.outMessagesPerPeer.get(destVertex); peerConnection.outMessagesPerPeer.put(destVertex, new MsgList()); {noformat} Now in the last flush that is trigggered at the end of the superstep we encounter an empty message list for the vertex and therefore the exception is thrown in *BasicRPCCommunications:228-247* {noformat} for (Entry> entry : peerConnection.outMessagesPerPeer.entrySet()) { ... if (entry.getValue().isEmpty()) { throw new IllegalStateException(...); } {noformat} Simply removing the list for the vertex when executing the large flush solved the issue (patch to come). I'd like to note that it is generally very dangerous to let different classes have access to a datastructure directly and it produces subtle bugs like this. It would be better to think of a centralized way of handling the datastructure. -- This mes
[jira] [Created] (GIRAPH-113) Change cast to Vertex used in prepareSuperstep() to BasicVertex
Change cast to Vertex used in prepareSuperstep() to BasicVertex --- Key: GIRAPH-113 URL: https://issues.apache.org/jira/browse/GIRAPH-113 Project: Giraph Issue Type: Bug Reporter: Yuanyuan Tian Priority: Minor Hi, I decided to use LongDoubleFloatDoubleVertex in a graph algorithm because it uses more compact and efficient mahout collections. However I run into an error when running the algorithm: java.lang.ClassCastException: org.apache.giraph.graph.LongDoubleFloatDoubleVertex cannot be cast to org.apache.giraph.graph.Vertex at org.apache.giraph.comm.BasicRPCCommunications.prepareSuperstep(BasicRPCCommunications.java:1016) at org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:843) at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:569) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:728) ... 7 more Basically, the problem is that in BasicRPCCommunications.prepareSuperStep(), the LongDoubleFloatDoubleVertex are cast to Vertex in the following code fragment. But LongDoubleFloatDoubleVertex inherits from BasicVertex instead of Vertex. if (vertex != null) { ((MutableVertex) vertex).setVertexId(vertexIndex); partition.putVertex((Vertex) vertex); } else if (originalVertex != null) { partition.removeVertex(originalVertex.getVertexId()); } I did a simple change: cast LongDoubleFloatDoubleVertex to BasicVertex. The problem went away, and the algorithm finished without any error. But I am not sure this change has any implication to other parts of the code. So, I hope to get some comments from the Giraph developers. if (vertex != null) { ((MutableVertex) vertex).setVertexId(vertexIndex); partition.putVertex((BasicVertex) vertex); } else if (originalVertex != null) { partition.removeVertex(originalVertex.getVertexId()); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-112) A bug in LongDoubleFloatDoubleVertex.write(DataOutput out)
A bug in LongDoubleFloatDoubleVertex.write(DataOutput out) -- Key: GIRAPH-112 URL: https://issues.apache.org/jira/browse/GIRAPH-112 Project: Giraph Issue Type: Bug Components: graph Affects Versions: 0.70.0 Environment: Any Reporter: Yuanyuan Tian Fix For: 0.70.0 I found a bug in LongDoubleFloatDoubleVertex.write(DataOutput out) when running a small graph algorithm. The symptom is that a vertex read from a different worker becomes junk after the RPC communication. And the source of the problem is the writing of the messages in LongDoubleFloatDoubleVertex.write(DataOutput out): for(double msg : messageList.elements()) { out.writeDouble(msg); } Here messageList.elements() will returns all the elements currently stored in the mahout DoubleArrayList, even including invalid elements between size and capacity. Therefore, the write() function will write a bunch of invalid messages, which will cause error when reading them back in readfields(). The following is a simple solution: double[] elements=messageList.elements(); for(int i=0; ihttps://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-111) Refactor I/O to be independent of Map/Reduce
Refactor I/O to be independent of Map/Reduce Key: GIRAPH-111 URL: https://issues.apache.org/jira/browse/GIRAPH-111 Project: Giraph Issue Type: Improvement Components: graph Reporter: Ed Kohlwey The I/O mechanisms should probably be abstracted entirely from Map/Reduce in order to support making Giraph an independent framework. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-110) Add guide to setup the enviroment for running the unit tests in a pseudo-distributed hadoop instance
Add guide to setup the enviroment for running the unit tests in a pseudo-distributed hadoop instance Key: GIRAPH-110 URL: https://issues.apache.org/jira/browse/GIRAPH-110 Project: Giraph Issue Type: Improvement Affects Versions: 0.70.0 Reporter: Sebastian Schelter Priority: Minor Giraph should provide a small guide for setting up the local environment to run the unit tests in a pseudo-distributed hadoop instance as there are some non-obvious hurdles to take. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-109) GiraphRunner should provide support for combiners
GiraphRunner should provide support for combiners - Key: GIRAPH-109 URL: https://issues.apache.org/jira/browse/GIRAPH-109 Project: Giraph Issue Type: Improvement Affects Versions: 0.70.0 Reporter: Sebastian Schelter Currently there's no way to tell GiraphRunner that you want to use a Combiner. A simple option should be added, similar to the way in- and outputformats are specified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-108) Refactor code to run independently of Map/Reduce
Refactor code to run independently of Map/Reduce Key: GIRAPH-108 URL: https://issues.apache.org/jira/browse/GIRAPH-108 Project: Giraph Issue Type: Improvement Components: graph Reporter: Ed Kohlwey It would be nice for Giraph to be refactored such that the code could eventually be run outside of map/reduce. This will allow people to write drivers that can run in the cool new resource manager frameworks like Mesos and YARN, and eventually let the application's code base evolve to be independent of map/reduce. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-106) Refactor prepareSuperstep() to make setMessages(Iterable messages) package-private
Refactor prepareSuperstep() to make setMessages(Iterable messages) package-private - Key: GIRAPH-106 URL: https://issues.apache.org/jira/browse/GIRAPH-106 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching GIRAPH-80 revealed that there is some refactoring to make setMessages() package-private (prevent users from messing around with internals). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-107) Refactor prepareSuperstep() to make setMessages(Iterable messages) package-private
Refactor prepareSuperstep() to make setMessages(Iterable messages) package-private - Key: GIRAPH-107 URL: https://issues.apache.org/jira/browse/GIRAPH-107 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Assignee: Avery Ching GIRAPH-80 revealed that there is some refactoring to make setMessages() package-private (prevent users from messing around with internals). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-105) BspServiceMaster.checkWorkers() should return empty lists instead of null
BspServiceMaster.checkWorkers() should return empty lists instead of null - Key: GIRAPH-105 URL: https://issues.apache.org/jira/browse/GIRAPH-105 Project: Giraph Issue Type: Bug Affects Versions: 0.70.0 Reporter: Sebastian Schelter Priority: Minor BspServiceMaster.checkWorkers() is invoked in BspServiceMaster.coordinateSuperstep() and in BspServiceMaster.createInputSplits(). Both check for an empty list to fail the job in case something has gone wrong. However, checkWorkers() returns null in case of problems, causing an NPE in the calling code. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-104) Save half of maximum memory used from messaging
Save half of maximum memory used from messaging --- Key: GIRAPH-104 URL: https://issues.apache.org/jira/browse/GIRAPH-104 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Priority: Critical Currently, the amount of memory that Giraph uses for messaging is huge. This JIRA will reduce the messaging memory by half and provide periodic updates of memory for debugging. Details are below: Refactored RandomMessageBenchmark to an internal vertex class. Added aggregators to RandomMessagesBenchmark to track bytes, messages, and time for the messaging. Adjusted the postSuperstep() to be called after the flush() for more accurate timings. Added periodic minute updates for message flushing (which can take a while, especially on the memory benchmark). This helps to see how progress is going and gives an ETA. Memory optimizations include: - Clear the message list after computation - Free vertex messages on the source as the flush is going on - TreeMap -> HashMap for VertexMutations - Sizing the ArrayList properly in transientInMessages -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-103) Added properties for commonly used package version to pom.xml
Added properties for commonly used package version to pom.xml - Key: GIRAPH-103 URL: https://issues.apache.org/jira/browse/GIRAPH-103 Project: Giraph Issue Type: Improvement Components: build Reporter: Avery Ching Priority: Trivial Attachments: GIRAPH-103.diff -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-102) Create an exception class for Giraph
Create an exception class for Giraph Key: GIRAPH-102 URL: https://issues.apache.org/jira/browse/GIRAPH-102 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Many of the Exceptions are IllegalStateException but could be better as GiraphException and reasonable derivatives. This would 1) Allow us to differentiate exceptions specific to Giraph 2) Allow us to add useful information (i.e. superstep, attempt) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-101) Replace munge with shim layer similar to Pig and Hive
Replace munge with shim layer similar to Pig and Hive - Key: GIRAPH-101 URL: https://issues.apache.org/jira/browse/GIRAPH-101 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Priority: Minor Munge is a hacky way of support multiple versions of Hadoop. The shim layers in Pig and Hive are a cleaner way to do this I think. That being said, since it does work now, it's not a huge priority I guess. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-100) Data input sampling and testing improvements
Data input sampling and testing improvements Key: GIRAPH-100 URL: https://issues.apache.org/jira/browse/GIRAPH-100 Project: Giraph Issue Type: New Feature Components: graph Reporter: Avery Ching It would be really nice to help debug an application by limiting the input data (% of input splits, max vertices per input split). Also, it would be nice for the workers to provide a little more debugging info on how far along they are with processing the input data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-99) Make AdjacencyListVertexReader and its constructor public
Make AdjacencyListVertexReader and its constructor public - Key: GIRAPH-99 URL: https://issues.apache.org/jira/browse/GIRAPH-99 Project: Giraph Issue Type: Wish Components: lib Reporter: Kohei Ozaki Priority: Minor Hi, I'd like to write a class inherited from AdjacencyListVertexReader to make a library using Giraph (like git.io/ALVR), but AdjacencyListVertexReader is a private class and its constructor are private. I guess making it public is useful to handle a more complex input format specified by the data structure of algorithms. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-98) Add Claudio Martella to site
Add Claudio Martella to site Key: GIRAPH-98 URL: https://issues.apache.org/jira/browse/GIRAPH-98 Project: Giraph Issue Type: Task Reporter: Claudio Martella Attachments: GIRAPH-98.diff -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-97) TestIdWithValueTextOutputFormat.java and IdWithValueTextOutputFormat.java missing license header
TestIdWithValueTextOutputFormat.java and IdWithValueTextOutputFormat.java missing license header Key: GIRAPH-97 URL: https://issues.apache.org/jira/browse/GIRAPH-97 Project: Giraph Issue Type: Bug Affects Versions: 0.70.0 Reporter: Claudio Martella Assignee: Claudio Martella Priority: Trivial Fix For: 0.70.0 Attachments: GIRAPH-97.diff As reported by Yingyi Bu on user mailinglist. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-96) Support for Graphs with Huge adjacency lists
Support for Graphs with Huge adjacency lists Key: GIRAPH-96 URL: https://issues.apache.org/jira/browse/GIRAPH-96 Project: Giraph Issue Type: Improvement Reporter: Arun Suresh Currently the vertex initialize() method is passed the complete adjacency list as a HashMap. All the current concrete implementations of Vertex iterate over the adjacency list and recreate new Data Structures within the Vertex instance to hold/manipulate the adjacency list. This would seize to be feasible once the size of the adjacency list becomes really huge. I propose storing the adjacency list and all vertex information (and incoming messages ?) in a distributed data store such as HBase. The adjacency list can be lazily loaded via HBase Scans. I was thinking of an HBase schema where the row Id is a concatenation of VertexID+OutboundVertexId with a single column containing the edge. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-95) vertex resolution expects MutableVertex instead of BasicVertex
vertex resolution expects MutableVertex instead of BasicVertex -- Key: GIRAPH-95 URL: https://issues.apache.org/jira/browse/GIRAPH-95 Project: Giraph Issue Type: Bug Components: graph Reporter: Claudio Martella At the beginning of the superstep, when a message is sent to non-existing vertex, the new vertex is created. This new vertex id is set through setVertexId() which belongs to MutableVertex. Should use initialize() instead. See BspRPCCommunication:948 (on my local trunk) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-94) Loading vertex ranges from HBase
Loading vertex ranges from HBase Key: GIRAPH-94 URL: https://issues.apache.org/jira/browse/GIRAPH-94 Project: Giraph Issue Type: New Feature Reporter: Claudio Martella Assignee: Claudio Martella Loading vertices from an HTable would be an option. A possible schema for storing the graph would be Hexastore (http://www.vldb.org/pvldb/1/1453965.pdf). Also, as vertices whom messages are sent to get created on the fly (if they don't exist already), we could potentially have a HBaseVertex that fetches the adjacency list + vertex value from HBase. That would be kind of a Lazy-load approach, if you can define the initial split as an HBase query. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-93) Hive input / output format
Hive input / output format -- Key: GIRAPH-93 URL: https://issues.apache.org/jira/browse/GIRAPH-93 Project: Giraph Issue Type: New Feature Reporter: Avery Ching Assignee: Avery Ching It would be great to be able to load/store data from/to Hive tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-92) Need outputformat for just vertex ID and value
Need outputformat for just vertex ID and value -- Key: GIRAPH-92 URL: https://issues.apache.org/jira/browse/GIRAPH-92 Project: Giraph Issue Type: New Feature Components: lib Affects Versions: 0.70.0 Reporter: Jakob Homan Assignee: Jakob Homan Fix For: 0.70.0 Attachments: GIRAPH-92.patch We should have an text outputformat that just spits out the vertex id and value without its edges: {noformat}index.html 0.9423{noformat} This would be particularly helpful for further processing by, for instance, Pig. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-91) Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings)
Large-memory improvements (Memory reduced vertex implementation, fast failure, added settings) --- Key: GIRAPH-91 URL: https://issues.apache.org/jira/browse/GIRAPH-91 Project: Giraph Issue Type: Improvement Reporter: Avery Ching Current vertex implementation uses a HashMap for storing the edges, which is quite memory heavy for large graphs. The default settings in Giraph need to be improved for large graphs and heaps of >20G. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-90) LongDoubleFloatDoubleVertex has possibily the iterator() implementation broken
LongDoubleFloatDoubleVertex has possibily the iterator() implementation broken -- Key: GIRAPH-90 URL: https://issues.apache.org/jira/browse/GIRAPH-90 Project: Giraph Issue Type: Bug Components: graph Affects Versions: 0.70.0 Reporter: Claudio Martella Assignee: Claudio Martella Fix For: 0.70.0 iterator() implementation returns LongWritable which is cached in a final variable and set() with the new value at next(). This could be misleading as the user might create a list from the iterator's data. Something similar is happening in the getMsgList() as well. Is this really what we want? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-89) Remove debugging system.out from LongDoubleFloatDoubleVertex
Remove debugging system.out from LongDoubleFloatDoubleVertex Key: GIRAPH-89 URL: https://issues.apache.org/jira/browse/GIRAPH-89 Project: Giraph Issue Type: Bug Reporter: Jakob Homan Line 137: {{System.out.println("in getNumVertices!");}} looks like a debugging line and should be removed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (GIRAPH-88) Message count not updated properly after GIRAPH-11
Message count not updated properly after GIRAPH-11 -- Key: GIRAPH-88 URL: https://issues.apache.org/jira/browse/GIRAPH-88 Project: Giraph Issue Type: Bug Reporter: Avery Ching Email from s...@apache.org Hi, I updated to the latest trunk (after the GIRAPH-11 commit) and wanted to continue to work on GIRAPH-51 where I use a small toy graph to test SimpleShortestPathVertex. Unfortunately my code did not work anymore and I guess I tracked it down to the fact that vertex that voted to halt are not reacted anymore when new messages arrive. In SimpleShortestPathVertex every vertex always votes to halt and only gets reactivated when a shorter path to it has been found. However my test run always finished after superstep 0. I don't know too much about Giraph's internals yet, but my guess is that the number of sent messages is not tracked correctly anymore. Therefore giraph finishes the algorithm (as all vertices voted to halt) although there should still be messages in the pipeline. I think I tracked it down to this behavior: GraphMapper declares a variable workerSentMessages = 0 and never increases it. This variable is given to BspServiceWorker.finishSuperstep() which writes it to zookeeper and uses it to compute the GlobalStats afterwards, which are used to decide whether a new superstep has to be scheduled. As it has never been increased, the algorithm will always stop when all vertices voted to halt. It would be great if someone could confirm/disprove this speculation and help me to continue work on GIRAPH-51 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira