Thank you both for your responses. Steve, I faced the same problem when I created the Long input format files. I tried running the code linked by Young above, using the *SimplePageRankInputFormat.java* as well as the *SimplePageRankVertex.java* in the repo.
For the Twitter dataset, I added some *MasterCompute* code to log the number of vertices that existed at each superstep. The results, however, look pretty similar to the previous iteration: Current step is 1 - 40103281 existed in the previous superstep 0Current step is 2 - 40103281 existed in the previous superstep 1 Current step is 3 - 40383589 existed in the previous superstep 2 Current step is 31 - 40383589 existed in the previous superstep 30 It seems that a subset of vertices still only become active after the first superstep, despite all vertices being initialized in superstep 0. I cant think of a reason why - thoughts ? Thanks, Kenrick On Wed, Apr 29, 2015 at 2:33 PM, Young Han <[email protected]> wrote: > For the initialization issue, you can define a (nested) class that extends > DefaultVertexValueFactory (from org.apache.giraph.factories) and add > "-Dgiraph.vertexValueFactoryClass=org.apache.giraph.examples.AlgClass\$AlgVertexValueFactory" > after "org.apache.giraph.GiraphRunner" in your hadoop jar command. > > Also, the reason those input formats don't work is because PageRank is > using LongWritable for vertex id and DoubleWritable for vertex value. As > Roman pointed out, you have to have an input class that matches it (even if > the input dataset has no "double" vertex values). An example (for Giraph > 1.0.0) can be found here: > https://github.com/xvz/graph-processing/blob/master/giraph-1.0.0/giraph-examples/src/main/java/org/apache/giraph/examples/SimplePageRankInputFormat.java > and an example command that uses it here: > https://github.com/xvz/graph-processing/blob/master/benchmark/giraph/pagerank.sh#L50 > > Young > > On Wed, Apr 29, 2015 at 11:24 AM, Steven Harenberg <[email protected]> > wrote: > >> Hey Kenrick, >> >> First, your commands above are wrong since you are specifying adjacency >> list format with the -vif argument and since I believe >> *LongLongNullTextInputFormat >> *refers to adjacency list format. However, even with the right commands >> there will be issues and more things you need to do. >> >> I did get it the edgelist input format to work by creating a >> LongNullTextEdgeInputFormat.java file just like the >> giraph-core/src/main/java/org/apache/giraph/io/formats/IntNullTextEdgeInputFormat.java >> file, but with longs instead of ints (this also required creating a >> LongPair class). >> >> However, I would advise against using an edgelist input format in Giraph >> as there are major underlying issues that I never figured out how to >> resolve. Namely, for an edgelist format, Giraph only considers a vertex >> active in the first superstep if it has an outgoing edge. This means that >> vertices with only incoming edges won't be initialized with correct values >> during things like PageRank, SSSP, or WCC and hence will output incorrect >> results. (You can see my previous thread here: >> http://mail-archives.apache.org/mod_mbox/giraph-user/201502.mbox/%3CCAHv2Baw7zFJ-s7dtNMv5dkNxz_zE436krE%2B6G4r3tp-HVgjW2g%40mail.gmail.com%3E >> ) >> >> The above issue can be avoided with adjacency list format by specifying >> the vertex with no neighbors. For example, if vertex v has only incoming >> edges, then you make sure there is a line with just v and no neighbors >> listed ( >> http://mail-archives.apache.org/mod_mbox/giraph-user/201408.mbox/%[email protected]%3E >> ). >> >> If you figure out how to resolve the edgelist input issue please let me >> know. >> >> Regards, >> Steve >> >> >> On Sat, Apr 25, 2015 at 9:54 PM, Kenrick Fernandes <[email protected] >> > wrote: >> >>> Hi Roman, >>> >>> Thanks for the quick response. There is no vertex data in this >>> dataset though, and the vertex IDs posted above would fit in a >>> Long. Would you advise changing the PageRankComputation >>> formats, or working on a new input format ? >>> >>> Thanks, >>> Kenrick >>> >>> On Sat, Apr 25, 2015 at 7:40 PM, Roman Shaposhnik <[email protected]> >>> wrote: >>> >>>> One of the slightly annoying things in Giraph is that you have >>>> to manually match your input format to your computation. In >>>> your case, PageRankComputation requires LongWritable for >>>> vertex ID and DoubleWritable for vertex Data. You may need >>>> to hack one of the existing formats slightly. >>>> >>>> >>>> Thanks, >>>> Roman. >>>> >>>> On Sat, Apr 25, 2015 at 2:58 PM, Kenrick Fernandes >>>> <[email protected]> wrote: >>>> > Hello, >>>> > >>>> > Im trying to get Giraph to read the Twitter dataset as input for the >>>> > SimplePageRankComputation program. The dataset format looks like this: >>>> > 61578010 61147436 >>>> > 61578037 61147436 >>>> > 61578040 61147436 >>>> > (vertex id's, with pairs representing edges) >>>> > >>>> > When I run the command with >>>> > -vif org.apache.giraph.io.formats.IntIntNullTextInputFormat, I get >>>> this >>>> > error : >>>> > java.lang.IllegalArgumentException: checkClassTypes: vertex index >>>> types not >>>> > assignable, computation - class org.apache.hadoop.io.LongWritable, >>>> > VertexInputFormat - class org.apache.hadoop.io.IntWritable >>>> > >>>> > So I tried running the command with >>>> > -vif org.apache.giraph.io.formats.LongLongNullTextInputFormat and I >>>> get a >>>> > different one: >>>> > java.lang.IllegalArgumentException: checkClassTypes: vertex value >>>> types not >>>> > assignable, computation - class org.apache.hadoop.io.DoubleWritable, >>>> > VertexInputFormat - class org.apache.hadoop.io.LongWritable >>>> > >>>> > I dont understand why the types in the input show up as different >>>> formats in >>>> > each error. Also, as far as I could tell, there is no input format for >>>> > DoubleDouble. Is there a different way to get the graph into Giraph >>>> without >>>> > having to write custom input code ? Thoughts would be much >>>> appreciated. >>>> > >>>> > ----- >>>> > Reference Command: >>>> > hadoop jar >>>> giraph-examples-1.1.0-for-hadoop-1.1.2-jar-with-dependencies.jar >>>> > org.apache.giraph.GiraphRunner >>>> > org.apache.giraph.examples.PageRankComputation -vif >>>> > org.apache.giraph.io.formats.LongLongNullTextInputFormat -vip >>>> > /user/kenrick/twitter/input -op /user/kenrick/twitter/output -w 30 >>>> > ----- >>>> > >>>> > Thanks, >>>> > Kenrick >>>> >>> >>> >> >
