Hmm.. it looks like a failure during graph loading. Did you forget a .txt in the input path?
Young On Mon, Mar 31, 2014 at 1:17 PM, ghufran malik <[email protected]>wrote: > Hi, > > Thanks for the speedy response! > > It didn't work for me :(. > > I updated the ConnectComponentsVertex class with yours and added in the > new ConnectedComponentsInputFormat class. They are both in the > giraph-examples/src/main/java/org/apache/giraph/examples package. > To compile the example package: > I cd'd to ~/Downloads/giraph-folder/giraph-1.0.0/giraph-examples > and then typed "mvn compile" which resulted in BUILD SUCCESS. As a sanity > check I checked the jar to make sure it had the > ConnectedComponentsInputFormat class in it, and it did. > > I then updated my graph by taking out the vertex values so in the end I > had: > > > 1 2 > 2 1 3 4 > 3 2 > 4 2 > > where the numbers are separated out by tab space ([\t]). > > The command I ran was: > > hadoop jar > /home/ghufran/Downloads/giraph-folder/giraph-1.0.0/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-0.20.203.0-jar-with-dependencies.jar > org.apache.giraph.GiraphRunner > org.apache.giraph.examples.ConnectedComponentsVertex -vif > org.apache.giraph.examples.ConnectedComponentsInputFormat -vip > /user/ghufran/input/my_graph -of > org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op > /user/ghufran/giraph-output -w 1 > > > but I ended up with the output: > > 14/03/31 17:43:49 INFO utils.ConfigurationUtils: No edge input format > specified. Ensure your InputFormat does not require one. > 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format > vertex index type is not known > 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format > vertex value type is not known > 14/03/31 17:43:49 WARN job.GiraphConfigurationValidator: Output format > edge value type is not known > 14/03/31 17:43:49 INFO job.GiraphJob: run: Since checkpointing is disabled > (default), do not allow any task retries (setting mapred.map.max.attempts = > 0, old value = 4) > 14/03/31 17:43:50 INFO mapred.JobClient: Running job: job_201403311622_0002 > 14/03/31 17:43:51 INFO mapred.JobClient: map 0% reduce 0% > 14/03/31 17:44:08 INFO mapred.JobClient: map 50% reduce 0% > 14/03/31 17:54:54 INFO mapred.JobClient: map 0% reduce 0% > 14/03/31 17:54:59 INFO mapred.JobClient: Job complete: > job_201403311622_0002 > 14/03/31 17:54:59 INFO mapred.JobClient: Counters: 6 > 14/03/31 17:54:59 INFO mapred.JobClient: Job Counters > 14/03/31 17:54:59 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=656429 > 14/03/31 17:54:59 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 14/03/31 17:54:59 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 14/03/31 17:54:59 INFO mapred.JobClient: Launched map tasks=2 > 14/03/31 17:54:59 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 > 14/03/31 17:54:59 INFO mapred.JobClient: Failed map tasks=1 > > Any ideas to why this happened? Do you think I need to update the hadoop I > am using? > > Kind regards, > > Ghufran > > > On Mon, Mar 31, 2014 at 5:11 PM, Young Han <[email protected]> wrote: > >> Hey, >> >> Sure, I've uploaded the 1.0.0 classes I'm using: >> http://pastebin.com/0cTdWrR4 >> http://pastebin.com/jWgVAzH6 >> >> They both go into giraph-examples/src/main/java/org/apache/giraph/examples >> >> Note that the input format it accepts is of the form "src dst1 dst2 dst3 >> ..."---there is no vertex value. So your test graph would be: >> >> 1 2 >> 2 1 3 4 >> 3 2 >> 4 2 >> >> The command I'm using is: >> >> hadoop jar >> "$GIRAPH_DIR"/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-1.0.2-jar-with-dependencies.jar >> org.apache.giraph.GiraphRunner \ >> org.apache.giraph.examples.ConnectedComponentsVertex \ >> -vif org.apache.giraph.examples.ConnectedComponentsInputFormat \ >> -vip /user/${USER}/input/${inputgraph} \ >> -of org.apache.giraph.io.formats.IdWithValueTextOutputFormat \ >> -op /user/${USER}/giraph-output/ \ >> -w 1 >> >> You'll want to change $GIRAPH_DIR, ${inputgraph}, and also the JAR file >> name since you're using Hadoop 0.20.203. >> >> Young >> >> >> On Mon, Mar 31, 2014 at 12:00 PM, ghufran malik >> <[email protected]>wrote: >> >>> Hi Young, >>> >>> I'd just like to say first thank you for your help it's much appreciated! >>> >>> I did the sanity check and everything seems fine I see the correct >>> results. >>> >>> Yes I hadn't noticed that before that is strange, I don't know how that >>> happened as on the quick start guide ( >>> https://giraph.apache.org/quick_start.html#qs_section_2) it says hadoop >>> 0.20.203 was the assumed default. I have both Giraph 1.1.0 and Giraph 1.0.0 >>> and my Giraph 1.0.0 is compiled to 0.20.203. >>> >>> I edited the code as you said for Giraph 1.1.0 but still received the >>> same error as before, so I thought it may be due to the hadoop version it >>> was compiled for. So I decided to try modify the code in Giraph 1.0.0 >>> instead, however since I do not have the correct input format class and the >>> vertex object is not instantiated in the ConnectedComponents class of >>> Giraph 1.0.0, I was wondering if you could send me the full classes for >>> both the ConnectedComponents class and the InputFormat so that I know code >>> wise everything should be correct. >>> >>> I will be trying to implement the InputFormat class and >>> ConnectedComponents in the meantime and if I get it working before you >>> respond I'll update this post. >>> >>> Thanks >>> >>> Ghufran. >>> >>> >>> On Sun, Mar 30, 2014 at 5:41 PM, Young Han <[email protected]>wrote: >>> >>>> Hey, >>>> >>>> As a sanity check, is the graph really loaded on HDFS? Do you see the >>>> correct results if you do "hadoop dfs -cat /user/ghufran/in/my_graph.txt"? >>>> (Where hadoop is your hadoop binary). >>>> >>>> Also, I noticed that your Giraph has been compiled for Hadoop 1.x, >>>> while the logs show Hadoop 0.20.203.0. Maybe that could be the cause too? >>>> >>>> Finally, this may be completely irrelevant, but I had issues running >>>> connected components on Giraph 1.0.0 and I fixed it by changing the >>>> algorithm and the input format. The input format you're using on 1.1.0 >>>> looks correct to me. The algorithm change I did was to the first "if" block >>>> in ConnectedComponentsComputation: >>>> >>>> if (getSuperstep() == 0) { currentComponent = >>>> vertex.getId().get(); vertex.setValue(new >>>> IntWritable(currentComponent)); sendMessageToAllEdges(vertex, >>>> vertex.getValue()); vertex.voteToHalt(); return; } >>>> >>>> I forget what error this change solved, so it may not help in your case. >>>> >>>> Young >>>> >>>> >>>> >>>> On Sun, Mar 30, 2014 at 6:13 AM, ghufran malik <[email protected] >>>> > wrote: >>>> >>>>> Hello, >>>>> >>>>> I am a final year Bsc Computer Science Student who is using Apache >>>>> Giraph for my final year project and dissertation and would appreciate >>>>> very >>>>> much if someone could help me with the following issue. >>>>> >>>>> I am using Apache Giraph 1.1.0 Snapshot with Hadoop 0.20.203.0 and am >>>>> having trouble running the ConnectedComponents example. I use the >>>>> following >>>>> command: >>>>> >>>>> hadoop jar >>>>> /home/ghufran/Downloads/Giraph2/giraph/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar >>>>> org.apache.giraph.GiraphRunner >>>>> org.apache.giraph.examples.ConnectedComponentsComputation -vif >>>>> org.apache.giraph.io.formats.IntIntNullTextVertexInputFormat -vip >>>>> /user/ghufran/in/my_graph.txt -vof >>>>> org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op >>>>> /user/ghufran/outCC -w 1 >>>>> >>>>> >>>>> I believe it gets stuck in the InputSuperstep as the following is >>>>> displayed in terminal when the command is running: >>>>> >>>>> 14/03/30 10:48:46 INFO mapred.JobClient: map 100% reduce 0% >>>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - >>>>> 109.01MB, >>>>> average 109.01MB >>>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - >>>>> 109.01MB, >>>>> average 109.01MB >>>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - >>>>> 108.78MB, >>>>> average 108.78MB >>>>> .... >>>>> >>>>> which I traced back to the following if statement in the toString() >>>>> method of core.org.apache.job.CombinedWorkerProgress: >>>>> >>>>> if (isInputSuperstep()) { >>>>> sb.append("Loading data: "); >>>>> sb.append(verticesLoaded).append(" vertices loaded, "); >>>>> sb.append(vertexInputSplitsLoaded).append( >>>>> " vertex input splits loaded; "); >>>>> sb.append(edgesLoaded).append(" edges loaded, "); >>>>> sb.append(edgeInputSplitsLoaded).append(" edge input splits >>>>> loaded"); >>>>> >>>>> sb.append("; min free memory on worker ").append( >>>>> workerWithMinFreeMemory).append(" - ").append( >>>>> DECIMAL_FORMAT.format(minFreeMemoryMB)).append("MB, average >>>>> ").append( >>>>> DECIMAL_FORMAT.format(freeMemoryMB)).append("MB"); >>>>> >>>>> So it seems to me that it's not loading in the InputFormat correctly. >>>>> So I am assuming there's something wrong with my input format class or, >>>>> probably more likely, something wrong with the graph I passed in? >>>>> >>>>> I pass in a small graph that has the format vertex id, vertex value, >>>>> neighbours separated by tabs, my graph is shown below: >>>>> >>>>> 1 0 2 >>>>> 2 1 1 3 4 >>>>> 3 2 2 >>>>> 4 3 2 >>>>> >>>>> The full output is shown below after I ran my command is shown below. >>>>> If anyone could explain to me why I am not getting the expected output I >>>>> would greatly appreciate it. >>>>> >>>>> Many thanks, >>>>> >>>>> Ghufran >>>>> >>>>> >>>>> FULL OUTPUT: >>>>> >>>>> >>>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge input format >>>>> specified. Ensure your InputFormat does not require one. >>>>> 14/03/30 10:48:06 INFO utils.ConfigurationUtils: No edge output format >>>>> specified. Ensure your OutputFormat does not require one. >>>>> 14/03/30 10:48:06 INFO job.GiraphJob: run: Since checkpointing is >>>>> disabled (default), do not allow any task retries (setting >>>>> mapred.map.max.attempts = 0, old value = 4) >>>>> 14/03/30 10:48:07 INFO job.GiraphJob: run: Tracking URL: >>>>> http://ghufran:50030/jobdetails.jsp?jobid=job_201403301044_0001 >>>>> 14/03/30 10:48:45 INFO >>>>> job.HaltApplicationUtils$DefaultHaltInstructionsWriter: >>>>> writeHaltInstructions: To halt after next superstep execute: >>>>> 'bin/halt-application --zkServer ghufran:22181 --zkNode >>>>> /_hadoopBsp/job_201403301044_0001/_haltComputation' >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment: >>>>> host.name=ghufran >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.version=1.7.0_51 >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.vendor=Oracle Corporation >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.home=/usr/lib/jvm/java-7-oracle/jre >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.class.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../conf:/usr/lib/jvm/java-7-oracle/lib/tools.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/..:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../hadoop-core-0.20.203.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjrt-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/aspectjtools-1.6.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-1.7.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-beanutils-core-1.8.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-cli-1.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-codec-1.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-collections-3.2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-configuration-1.6.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-daemon-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-digester-1.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-el-1.0.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-httpclient-3.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-lang-2.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-1.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-logging-api-1.0.4.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-math-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/commons-net-1.4.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/core-3.1.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/hsqldb-1.8.0.10.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-core-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jackson-mapper-asl-1.0.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-compiler-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jasper-runtime-5.5.12.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jets3t-0.6.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jetty-util-6.1.26.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsch-0.1.42.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/junit-4.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/kfs-0.2.2.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/log4j-1.2.15.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/mockito-all-1.8.5.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/oro-2.0.8.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/servlet-api-2.5-20081211.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-api-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/slf4j-log4j12-1.4.3.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/xmlenc-0.52.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-2.1.jar:/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/jsp-2.1/jsp-api-2.1.jar >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.library.path=/home/ghufran/Downloads/hadoop-0.20.203.0/bin/../lib/native/Linux-amd64-64 >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.io.tmpdir=/tmp >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:java.compiler=<NA> >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment:os.name >>>>> =Linux >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:os.arch=amd64 >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:os.version=3.8.0-35-generic >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client environment: >>>>> user.name=ghufran >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:user.home=/home/ghufran >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Client >>>>> environment:user.dir=/home/ghufran/Downloads/hadoop-0.20.203.0/bin >>>>> 14/03/30 10:48:45 INFO zookeeper.ZooKeeper: Initiating client >>>>> connection, connectString=ghufran:22181 sessionTimeout=60000 >>>>> watcher=org.apache.giraph.job.JobProgressTracker@209fa588 >>>>> 14/03/30 10:48:45 INFO mapred.JobClient: Running job: >>>>> job_201403301044_0001 >>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Opening socket connection >>>>> to server ghufran/127.0.1.1:22181. Will not attempt to authenticate >>>>> using SASL (unknown error) >>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Socket connection >>>>> established to ghufran/127.0.1.1:22181, initiating session >>>>> 14/03/30 10:48:45 INFO zookeeper.ClientCnxn: Session establishment >>>>> complete on server ghufran/127.0.1.1:22181, sessionid = >>>>> 0x1451263c44c0002, negotiated timeout = 600000 >>>>> 14/03/30 10:48:45 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - >>>>> 109.01MB, >>>>> average 109.01MB >>>>> 14/03/30 10:48:46 INFO mapred.JobClient: map 100% reduce 0% >>>>> 14/03/30 10:48:50 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - >>>>> 109.01MB, >>>>> average 109.01MB >>>>> 14/03/30 10:48:55 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - >>>>> 109.01MB, >>>>> average 109.01MB >>>>> 14/03/30 10:49:00 INFO job.JobProgressTracker: Data from 1 workers - >>>>> Loading data: 0 vertices loaded, 0 vertex input splits loaded; 0 edges >>>>> loaded, 0 edge input splits loaded; min free memory on worker 1 - >>>>> 108.78MB, >>>>> average 108.78MB >>>>> >>>>> >>>> >>> >> >
