Hi Claudio,
The patch worked !! :-)
Just to be clear, I am running Giraph (1.0.0), not git cloned.
and hadoop 2.0.0-cdh4.1.1
I applied your patch and rebuilt the giraph source code with this command,
mvn -Phadoop_2.0.0 clean compile package test install verify
This built correctly, with no exceptions and no tests failed.
I then ran the giraph example, which ran successfully with this command
[root@localhost giraph]# hadoop jar
/usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-
alpha-jar-with-dependencies.jar org.apache.giraph.GiraphRunner
org.apache.giraph.examples.SimpleShortestPathsVertex -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip
/user/root/input/tiny_graph.txt -of
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/root/output/shortestpaths -w 1
I then deleted the output hadoop fs -rm -R
/user/root/output/shortestpaths
I then restarted my HBase daemons, and ran the giraph example again, and it
worked successfully again,no errors, no exceptions, no tasks failed, and output
produced correctly.
Using 'netstat -an | grep 22181' I can see that ZooKeeper is listening on port
22181.
Thank you very much for your help :-)
Ken
From: [email protected]
Date: Wed, 4 Sep 2013 19:21:37 +0200
Subject: Re: FileNotFoundException: File
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
To: [email protected]
Giraph is shipped with Zookeeper 3.3.3, and it is run, if an existing zookeeper
is not used through the giraph.zkServerList parameter, with its own
configuration listening on port 22181.
On Wed, Sep 4, 2013 at 7:11 PM, Ken Williams <[email protected]> wrote:
Hmmmmmmmm. Interesting.
Is Giraph (1.0.0) supposed to come with its own version of ZooKeeper ?
The only version of ZooKeeper I have installed is the one that came with HBase,
and the config file it uses /etc/zookeeper/conf/zoo.cfg specifies
clientPort=2181This is the only zoo.cfg file on my machine.
[root@localhost]# cat /etc/zookeeper/conf/zoo.cfg ....maxClientCnxns=50# The
number of milliseconds of each tick
tickTime=2000# The number of ticks that the initial # synchronization phase can
takeinitLimit=10# The number of ticks that can pass between
# sending a request and getting an acknowledgementsyncLimit=5# the directory
where the snapshot is stored.dataDir=/var/lib/zookeeper# the port at which the
clients will connect
clientPort=2181server.1=localhost:2888:3888[root@localhost Downloads]#
From: [email protected]
Date: Wed, 4 Sep 2013 12:13:50 +0200
Subject: Re: FileNotFoundException: File
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
To: [email protected]
That should in principle not be the case, as the zookeeper started by Giraph
listens on a different port than the default. See parameter
giraph.zkServerPort, which defaults to 22181.
On Wed, Sep 4, 2013 at 11:40 AM, Ken Williams <[email protected]> wrote:
Hi Claudio,
I think I have fixed the problem.
HBase runs with its own copy of ZooKeeper which listens on port 2181. So,
when I tried to start ZooKeeper for Giraph it also tried to listen on port 2181
and found it was already in use, and then it terminated - which is why
Giraph failed. If I stop the HBase daemons (including its copy of ZooKeeper)
then Giraph runs fine.
Essentially there is a conflict between running ZooKeeper for Giraph, if
there is
already ZooKeeper running for HBase.
I will try the patch and get back to you.
Thanks for all your help,
Ken
From: [email protected]
Date: Tue, 3 Sep 2013 17:01:01 +0200
Subject: Re: FileNotFoundException: File
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
To: [email protected]
try with the attached patch applied to trunk, without the mentioned -D
giraph.zkManagerDirectory.
On Tue, Sep 3, 2013 at 3:25 PM, Ken Williams <[email protected]> wrote:
Hi Claudio,
I tried this but it made no difference. The map tasks still fail, still no
output, and still anexception in the log files - FileNotFoundException: File
/tmp/giraph/_zkServer does not exist.
[root@localhost giraph]# hadoop jar
/usr/local/giraph/giraph-examples/target/giraph-examples-1.0.0-for-hadoop-2.0.0-alpha-jar-with-dependencies.jar
org.apache.giraph.GiraphRunner -Dgiraph.zkManagerDirectory='/tmp/giraph/'
org.apache.giraph.examples.SimpleShortestPathsVertex -vif
org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip
/user/root/input/tiny_graph.txt -of
org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op
/user/root/output/shortestpaths -w 1
13/09/03 14:19:58 INFO utils.ConfigurationUtils: No edge input format
specified. Ensure your InputFormat does not require one.13/09/03 14:19:58 WARN
job.GiraphConfigurationValidator: Output format vertex index type is not known
13/09/03 14:19:58 WARN job.GiraphConfigurationValidator: Output format vertex
value type is not known13/09/03 14:19:58 WARN job.GiraphConfigurationValidator:
Output format edge value type is not known
13/09/03 14:19:58 INFO job.GiraphJob: run: Since checkpointing is disabled
(default), do not allow any task retries (setting mapred.map.max.attempts = 0,
old value = 4)13/09/03 14:19:58 WARN mapred.JobClient: Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the same.
13/09/03 14:20:01 INFO mapred.JobClient: Running job:
job_201308291126_003913/09/03 14:20:02 INFO mapred.JobClient: map 0% reduce
0%13/09/03 14:20:12 INFO mapred.JobClient: Job complete: job_201308291126_0039
13/09/03 14:20:12 INFO mapred.JobClient: Counters: 613/09/03 14:20:12 INFO
mapred.JobClient: Job Counters 13/09/03 14:20:12 INFO mapred.JobClient:
Failed map tasks=113/09/03 14:20:12 INFO mapred.JobClient: Launched map
tasks=2
13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps in
occupied slots (ms)=1632713/09/03 14:20:12 INFO mapred.JobClient: Total
time spent by all reduces in occupied slots (ms)=0
13/09/03 14:20:12 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=013/09/03 14:20:12 INFO mapred.JobClient:
Total time spent by all reduces waiting after reserving slots (ms)=0
[root@localhost giraph]#
When I try to run Zookeeper it still gives me an 'Address already in use'
exception.
[root@localhost giraph]# /usr/lib/zookeeper/bin/zkServer.sh start-foreground
JMX enabled by defaultUsing config:
/usr/lib/zookeeper/bin/../conf/zoo.cfg2013-09-03 14:23:37,882 [myid:] - INFO
[main:QuorumPeerConfig@101] - Reading configuration from:
/usr/lib/zookeeper/bin/../conf/zoo.cfg
2013-09-03 14:23:37,888 [myid:] - ERROR [main:QuorumPeerConfig@283] - Invalid
configuration, only one server specified (ignoring)2013-09-03 14:23:37,889
[myid:] - INFO [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set
to 3
2013-09-03 14:23:37,889 [myid:] - INFO [main:DatadirCleanupManager@79] -
autopurge.purgeInterval set to 02013-09-03 14:23:37,890 [myid:] - INFO
[main:DatadirCleanupManager@101] - Purge task is not scheduled.
2013-09-03 14:23:37,890 [myid:] - WARN [main:QuorumPeerMain@118] - Either no
config or no quorum defined in config, running in standalone mode2013-09-03
14:23:37,904 [myid:] - INFO [main:QuorumPeerConfig@101] - Reading
configuration from: /usr/lib/zookeeper/bin/../conf/zoo.cfg
2013-09-03 14:23:37,905 [myid:] - ERROR [main:QuorumPeerConfig@283] - Invalid
configuration, only one server specified (ignoring)2013-09-03 14:23:37,905
[myid:] - INFO [main:ZooKeeperServerMain@100] - Starting server
2013-09-03 14:23:37,920 [myid:] - INFO [main:Environment@100] - Server
environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34
GMT2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server
environment:host.name=localhost.localdomain
2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server
environment:java.version=1.6.0_312013-09-03 14:23:37,921 [myid:] - INFO
[main:Environment@100] - Server environment:java.vendor=Sun Microsystems Inc.
2013-09-03 14:23:37,921 [myid:] - INFO [main:Environment@100] - Server
environment:java.home=/usr/java/jdk1.6.0_31/jre2013-09-03 14:23:37,921 [myid:]
- INFO [main:Environment@100] - Server
environment:java.class.path=/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.3-cdh4.1.1.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/usr/lib/zookeeper/bin/../conf:
2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server
environment:java.library.path=/usr/java/jdk1.6.0_31/jre/lib/i386/client:/usr/java/jdk1.6.0_31/jre/lib/i386:/usr/java/jdk1.6.0_31/jre/../lib/i386:/usr/java/packages/lib/i386:/lib:/usr/lib
2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server
environment:java.io.tmpdir=/tmp2013-09-03 14:23:37,922 [myid:] - INFO
[main:Environment@100] - Server environment:java.compiler=<NA>
2013-09-03 14:23:37,922 [myid:] - INFO [main:Environment@100] - Server
environment:os.name=Linux2013-09-03 14:23:37,922 [myid:] - INFO
[main:Environment@100] - Server environment:os.arch=i386
2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server
environment:os.version=2.6.32-279.14.1.el6.i6862013-09-03 14:23:37,923 [myid:]
- INFO [main:Environment@100] - Server environment:user.name=root
2013-09-03 14:23:37,923 [myid:] - INFO [main:Environment@100] - Server
environment:user.home=/root2013-09-03 14:23:37,923 [myid:] - INFO
[main:Environment@100] - Server environment:user.dir=/usr/local/giraph-1.0.0
2013-09-03 14:23:37,934 [myid:] - INFO [main:ZooKeeperServer@726] - tickTime
set to 20002013-09-03 14:23:37,934 [myid:] - INFO [main:ZooKeeperServer@735] -
minSessionTimeout set to -12013-09-03 14:23:37,935 [myid:] - INFO
[main:ZooKeeperServer@744] - maxSessionTimeout set to -1
2013-09-03 14:23:37,970 [myid:] - INFO [main:NIOServerCnxnFactory@99] -
binding to port 0.0.0.0/0.0.0.0:21812013-09-03 14:23:37,972 [myid:] - ERROR
[main:ZooKeeperServerMain@68] - Unexpected exception, exiting abnormally
java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native
Method) at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100)
at
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115)
at
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)
at
org.apache.zookeeper.server.ZooKeeperServerMain.main(ZooKeeperServerMain.java:53)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:121)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:79)
[root@localhost giraph]#
Thank you for any help,
Ken
From: [email protected]
Date: Tue, 3 Sep 2013 12:43:59 +0200
Subject: Re: FileNotFoundException: File
_bsp/_defaultZkManagerDir/job_201308291126_0029/_zkServer does not exist.
To: [email protected]
can you try defining the zookeeper manager directory from the command line?
like this -D giraph.zkManagerDirectory=/path/in/hdfs/foobar
you'll have to delete this directory by hand before each job. Just to see if it
solves the problem. Then I could know how to fix it.
On Tue, Sep 3, 2013 at 12:32 PM, Ken Williams <[email protected]> wrote:
Hi Pradeep,
Yes, the zookeeper server is definitely running, I can connect to it with the
command-line client [root@localhost giraph]# zkCli.sh -server 127.0.0.1:2181
Connecting to 127.0.0.1:21812013-09-03 11:15:45,987 [myid:] - INFO
[main:Environment@100] - Client
environment:zookeeper.version=3.4.3-cdh4.1.1--1, built on 10/16/2012 17:34 GMT
2013-09-03 11:15:45,990 [myid:] - INFO [main:Environment@100] - Client
environment:host.name=localhost.localdomain2013-09-03 11:15:45,990 [myid:] -
INFO [main:Environment@100] - Client environment:java.version=1.6.0_31
......WatchedEvent state:SyncConnected type:None path:null[zk:
127.0.0.1:2181(CONNECTED) 0] ls /[hbase, zookeeper][zk:
127.0.0.1:2181(CONNECTED) 1]
However, I am a bit confused. If I look in the zookeeper log-file I see this
port 2181 'Address already in use' error,
2013-09-03 10:52:24,412 [myid:] - INFO [main:ZooKeeperServer@735] -
minSessionTimeout set to -1
2013-09-03 10:52:24,413 [myid:] - INFO [main:ZooKeeperServer@744] -
maxSessionTimeout set to -12013-09-03 10:52:24,436 [myid:] - INFO
[main:NIOServerCnxnFactory@99] - binding to port 0.0.0.0/0.0.0.0:2181
2013-09-03 10:52:24,447 [myid:] - ERROR [main:ZooKeeperServerMain@68] -
Unexpected exception, exiting abnormallyjava.net.BindException: Address already
in use at sun.nio.ch.Net.bind(Native Method)
at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126) at
sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:52)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:100)
at
org.apache.zookeeper.server.ZooKeeperServerMain.runFromConfig(ZooKeeperServerMain.java:115)
at
org.apache.zookeeper.server.ZooKeeperServerMain.initializeAndRun(ZooKeeperServerMain.java:91)
The process listening on port 2181 is 2892, which turns out to be HBase.
[root@localhost giraph]# fuser 2181/tcp2181/tcp:
2892[root@localhost giraph]# ps aux | grep 2892
hbase 2892 0.1 3.2 719592 119624 ? Sl Aug29 7:35
/usr/java/jdk1.6.0_31/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx500m
-XX:+UseConcMarkSweepGC -Dhbase.log.dir=/var/log/hbase
-Dhbase.log.file=hbase-hbase-master-localhost.localdomain.log
-Dhbase.home.dir=/usr/lib/hbase/bin/..
......
So I am not sure what my zookeeper client is connecting to. It seems to be
connecting to a zookeeper server but when I do 'ps' I cannot see a zookeeper
server running.
Here is my zoo.cfg file,
maxClientCnxns=50# The number of milliseconds of each ticktickTime=2000# The
number of ticks that the initial synchronization phase can take
initLimit=10# The number of ticks that can pass between # sending a request and
getting an acknowledgementsyncLimit=5# the directory where the snapshot is
stored.
dataDir=/var/lib/zookeeper# the port at which the clients will
connectclientPort=2181server.1=localhost:2888:3888
Thanks for any help,
Ken
--
Claudio Martella
[email protected]
--
Claudio Martella
[email protected]
--
Claudio Martella
[email protected]
--
Claudio Martella
[email protected]