Dear all,
I uploaded Nutch 2.1 and tried to get it started but no luck so far. I am
running it on local with Hbase 0.90.6.
The project successfully builds, I have setup all configs as per the notch wiki
but I am getting the following exceptions:
12/10/23 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:java.library.path=/Users/mouradk/work/Apps/apache-nutch-2.1/runtime/local/lib/native/Mac_OS_X-x86_64-64
12/10/23 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=/var/folders/99/6nldm8mn2h7d7gwx2jl8v7gh0000gn/T/
12/10/23 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
12/10/23 11:27:54 INFO zookeeper.ZooKeeper: Client environment:os.name=Mac OS X
12/10/23 11:27:54 INFO zookeeper.ZooKeeper: Client environment:os.arch=x86_64
12/10/23 11:27:54 INFO zookeeper.ZooKeeper: Client environment:os.version=10.8.2
12/10/23 11:27:54 INFO zookeeper.ZooKeeper: Client environment:user.name=mouradk
12/10/23 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:user.home=/Users/mouradk
12/10/23 11:27:54 INFO zookeeper.ZooKeeper: Client
environment:user.dir=/Users/mouradk/work/Apps/apache-nutch-2.1
12/10/23 11:27:54 INFO zookeeper.ZooKeeper: Initiating client connection,
connectString=localhost:2181 sessionTimeout=180000 watcher=hconnection
12/10/23 11:27:54 INFO zookeeper.ClientCnxn: Opening socket connection to
server localhost/127.0.0.1:2181
12/10/23 11:27:54 INFO zookeeper.ClientCnxn: Socket connection established to
localhost/127.0.0.1:2181, initiating session
12/10/23 11:27:54 INFO zookeeper.ClientCnxn: Session establishment complete on
server localhost/127.0.0.1:2181, sessionid = 0x13a8d1f88a10006, negotiated
timeout = 40000
Exception in thread "main" org.apache.gora.util.GoraException:
java.lang.RuntimeException: java.net.MalformedURLException
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:167)
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:135)
at
org.apache.nutch.storage.StorageUtils.createWebStore(StorageUtils.java:75)
at org.apache.nutch.crawl.InjectorJob.run(InjectorJob.java:214)
at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:136)
at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawler.main(Crawler.java:257)
Caused by: java.lang.RuntimeException: java.net.MalformedURLException
at
org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:125)
at
org.apache.gora.store.DataStoreFactory.initializeDataStore(DataStoreFactory.java:102)
at
org.apache.gora.store.DataStoreFactory.createDataStore(DataStoreFactory.java:161)
... 8 more
Caused by: java.net.MalformedURLException
at java.net.URL.<init>(URL.java:601)
at java.net.URL.<init>(URL.java:464)
at java.net.URL.<init>(URL.java:413)
at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown
Source)
at
org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:770)
at
org.apache.gora.hbase.store.HBaseStore.readMapping(HBaseStore.java:524)
at
org.apache.gora.hbase.store.HBaseStore.initialize(HBaseStore.java:111)
... 10 more
I googled around but cannot find an answer, id there some thing with with my
conf? or the urls ? I get the same error whether I try to inject or crawl with
the following commands:
./runtime/local/bin/nutch crawl urls -dir crawl -depth 3 -topN 5
./runtime/local/bin/nutch inject ./urls/
Help much appreciated,
Mourad