Update: hadoop and hbase jar version is not right. After updating jars in 'lib/' directory and rebuild, now it's throwing:
org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family mtdt: does not exist in region crawl,,1264048608430 in table {NAME => 'crawl', FAMILIES => [{NAME => 'bas', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'cnt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'cnttyp', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'fchi', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'fcht', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'hdrs', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'ilnk', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'modt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'mtdt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'olnk', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'prsstt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'prtstt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'prvfch', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'prvsig', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'repr', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'rtrs', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'scr', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'sig', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'stt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'ttl', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'txt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:2381) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1241) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1834) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:995) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.doCall(HConnectionManager.java:1193) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1115) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1201) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:605) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:470) at org.apache.nutch.crawl.Injector$UrlMapper.map(Injector.java:92) at org.apache.nutch.crawl.Injector$UrlMapper.map(Injector.java:62) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) It's wield for the error message shows the Column family 'mtdt' does exist. On Tue, Jan 12, 2010 at 3:43 PM, xiao yang <yangxiao9...@gmail.com> wrote: > Hi, Doğacan > > I have checked out Nutchbase from > http://svn.apache.org/repos/asf/lucene/nutch/branches/nutchbase/ > My Hbase version is 0.20.2. > > createtable succeeded, but inject doesn't work. > > $bin/nutch createtable *crawl* > > Here is the status of Hbase: > hbase(main):014:0> list > 10/01/12 15:37:43 DEBUG client.HConnectionManager$TableServers: Cache hit > for row <> in tableName .META.: location server 10.214.10.146:34592, > location region name .META.,,1 > *crawl* > > 1 row(s) in 0.0110 seconds > > $bin/nutch inject crawl urls > Injector: starting > Injector: urlDir: urls > Injecting new users failed! > > Here is the log: > > 2010-01-12 15:38:57,515 WARN mapred.LocalJobRunner - job_local_0001 > java.lang.reflect.UndeclaredThrowableException > at $Proxy0.getRegionInfo(Unknown Source) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:874) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:515) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:491) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:565) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:524) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:491) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:565) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:528) > at > org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:491) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:123) > at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:101) > at org.apache.nutch.crawl.Injector$UrlMapper.setup(Injector.java:102) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:518) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:303) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) > Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: > java.lang.NullPointerException > at java.lang.Class.searchMethods(Class.java:2646) > at java.lang.Class.getMethod0(Class.java:2670) > at java.lang.Class.getMethod(Class.java:1603) > at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:643) > at > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > > at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:720) > at > org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:329) > ... 17 more > 2010-01-12 15:38:57,806 WARN crawl.Injector - Injecting new users failed! > > What's the problem? > Thanks! > Xiao > > 2009/8/17 Doğacan Güney (JIRA) <j...@apache.org>: > > > > > [ > https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12743919#action_12743919] > > > > Doğacan Güney commented on NUTCH-650: > > ------------------------------------- > > > > I just committed code to branch nutchbase. The scoring API did not turn > out as clean as I expected but I decided to put in what I have. Also, I made > some changes so that web UI also works. > > > > I am leaving this issue open because I will add documentation tomorrow. > Meanwhile, > > > > To download: > > > > svn co http://svn.apache.org/repos/asf/lucene/nutch/branches/nutchbase > > > > Usage: > > > > After starting hbase 0.20 (checkout rev. 804408 from hbase branch 0.20), > create a webtable with > > > > bin/nutch createtable webtable > > > > After that, usage is similar. > > > > bin/nutch inject webtable url_dir # inject urls > > > > for as many cycles as you want; > > bin/nutch generate webtable #-topN N works > > bin/nutch fetch webtable # -threads N works > > bin/nutch parse webtable > > bin/nutch updatetable webtable > > > > bin/nutch index <index> webtable > > or > > bin/nutch solrindex <solr url> webtable > > > > To use solr, use this schema file > > http://www.ceng.metu.edu.tr/~e1345172/schema.xml<http://www.ceng.metu.edu.tr/%7Ee1345172/schema.xml> > > > > > > Again, a note of warning: This is extremely new code. I hope people will > test and use it but there is no guarantee that it will work :) > > > > > >> Hbase Integration > >> ----------------- > >> > >> Key: NUTCH-650 > >> URL: https://issues.apache.org/jira/browse/NUTCH-650 > >> Project: Nutch > >> Issue Type: New Feature > >> Affects Versions: 1.0.0 > >> Reporter: Doğacan Güney > >> Assignee: Doğacan Güney > >> Fix For: 1.1 > >> > >> Attachments: hbase-integration_v1.patch, hbase_v2.patch, > malformedurl.patch, meta.patch, meta2.patch, nofollow-hbase.patch, > nutch-habase.patch, searching.diff, slash.patch > >> > >> > >> This issue will track nutch/hbase integration > > > > -- > > This message is automatically generated by JIRA. > > - > > You can reply to this email to add a comment to the issue online. > > > > > >