Hello,

I use hbase (0.90.4) as my storage for pages crawled by Nutch 2.2.1.
Everything worked fine but today I saw some weird things and exception when
I tried to inject urls into hbase table (webpage_webpage). When I start my
hbase, there are no ERRORs or exceptions in a log file.

The problem occures when I try to run Nutch's crawl script where an
injection of urls to hbase is included. Then I can see general exception:

/InjectorJob: java.lang.RuntimeException: job failed: name=[webpage]inject
/opt/ir/nutch/urls, jobid=job_local1968557823_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)/

I use hbase (0.90.4) as my storage for pages crawled by Nutch 2.2.1.
Everything worked fine but today I saw some weird things and exception when
I tried to inject urls into hbase table (webpage_webpage). When I start my
hbase, there are no ERRORs or exceptions in a log file.

The problem occures when I try to run Nutch's crawl script where an
injection of urls to hbase is included. Then I can see general exception:

InjectorJob: java.lang.RuntimeException: job failed: name=[webpage]inject
/opt/ir/nutch/urls, jobid=job_local1968557823_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:233)
at org.apache.nutch.crawl.InjectorJob.inject(InjectorJob.java:251)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:273)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.InjectorJob.main(InjectorJob.java:282)

and when I look into log file, last few lines contains these weird things
that I don't understand... timeouts and session ids.

/2013-09-21 17:40:02,644 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region
webpage_webpage,,1379778002576.65b68e6e75c02138edb6370309096186. on
localhost,59401,1379777960672
2013-09-21 17:40:03,993 INFO org.apache.zookeeper.server.NIOServerCnxn:
Accepted socket connection from /127.0.0.1:42450
2013-09-21 17:40:03,993 INFO org.apache.zookeeper.server.NIOServerCnxn:
Client attempting to establish new session at /127.0.0.1:42450
2013-09-21 17:40:03,996 INFO org.apache.zookeeper.server.NIOServerCnxn:
Established session 0x141412cd9bc0005 with negotiated timeout 40000 for
client /127.0.0.1:42450
2013-09-21 17:40:04,136 INFO org.apache.zookeeper.server.NIOServerCnxn:
Accepted socket connection from /127.0.0.1:42451
2013-09-21 17:40:04,136 INFO org.apache.zookeeper.server.NIOServerCnxn:
Client attempting to establish new session at /127.0.0.1:42451
2013-09-21 17:40:04,138 INFO org.apache.zookeeper.server.NIOServerCnxn:
Established session 0x141412cd9bc0006 with negotiated timeout 40000 for
client /127.0.0.1:42451
2013-09-21 17:40:05,229 WARN org.apache.zookeeper.server.NIOServerCnxn:
EndOfStreamException: Unable to read additional data from client sessionid
0x141412cd9bc0004, likely client has closed socket
2013-09-21 17:40:05,230 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /127.0.0.1:42446 which had sessionid
0x141412cd9bc0004
2013-09-21 17:40:05,231 WARN org.apache.zookeeper.server.NIOServerCnxn:
EndOfStreamException: Unable to read additional data from client sessionid
0x141412cd9bc0005, likely client has closed socket
2013-09-21 17:40:05,232 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /127.0.0.1:42450 which had sessionid
0x141412cd9bc0005
2013-09-21 17:40:05,232 WARN org.apache.zookeeper.server.NIOServerCnxn:
EndOfStreamException: Unable to read additional data from client sessionid
0x141412cd9bc0006, likely client has closed socket
2013-09-21 17:40:05,232 INFO org.apache.zookeeper.server.NIOServerCnxn:
Closed socket connection for client /127.0.0.1:42451 which had sessionid
0x141412cd9bc0006
2013-09-21 17:40:44,000 INFO org.apache.zookeeper.server.ZooKeeperServer:
Expiring session 0x141412cd9bc0004, timeout of 40000ms exceeded
2013-09-21 17:40:44,001 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x141412cd9bc0004
2013-09-21 17:40:46,001 INFO org.apache.zookeeper.server.ZooKeeperServer:
Expiring session 0x141412cd9bc0005, timeout of 40000ms exceeded
2013-09-21 17:40:46,001 INFO org.apache.zookeeper.server.ZooKeeperServer:
Expiring session 0x141412cd9bc0006, timeout of 40000ms exceeded
2013-09-21 17:40:46,001 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x141412cd9bc0005
2013-09-21 17:40:46,002 INFO
org.apache.zookeeper.server.PrepRequestProcessor: Processed session
termination for sessionid: 0x141412cd9bc0006/

I really don't know where is the problem because everything worked fine last
few days...

I have these properties in my hbase-site.xml file, maybe it could be
helpful:

<property>
     <name>hbase.rootdir</name>
     <value>file:///data/hbase</value>
</property>
<property>
     <name>hbase.zookeeper.property.dataDir</name>
     <value>/data/hbase</value>
</property>
<property>
     <name>hbase.zookeeper.property.maxClientCnxns</name>
     <value>1500</value>
</property>
<property>
     <name>hbase.zookeeper.quorum</name>
     <value>localhost</value>
</property>



--
View this message in context: 
http://lucene.472066.n3.nabble.com/hBase-Nutch-timeout-or-session-expiration-while-injecting-tp4091375.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to