[ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803186#action_12803186 ]
Xiao Yang commented on NUTCH-650: --------------------------------- Exception: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: org.apache.hadoop.hbase.regionserver.NoSuchColumnFamilyException: Column family mtdt: does not exist in region crawl,,1264048608430 in table {NAME => 'crawl', FAMILIES => [{NAME => 'bas', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'cnt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'cnttyp', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'fchi', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'fcht', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'hdrs', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'ilnk', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'modt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'mtdt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'olnk', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'prsstt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'prtstt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'prvfch', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'prvsig', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'repr', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'rtrs', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'scr', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'sig', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'stt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'ttl', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}, {NAME => 'txt', COMPRESSION => 'NONE', VERSIONS => '3', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} at org.apache.hadoop.hbase.regionserver.HRegion.checkFamily(HRegion.java:2381) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1241) at org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1208) at org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.java:1834) at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648) at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:995) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$2.doCall(HConnectionManager.java:1193) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1115) at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1201) at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:605) at org.apache.hadoop.hbase.client.HTable.put(HTable.java:470) at org.apache.nutch.crawl.Injector$UrlMapper.map(Injector.java:92) at org.apache.nutch.crawl.Injector$UrlMapper.map(Injector.java:62) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) This may be caused by invalid column family names: I found some names end with colon while some doesn't in package org.apache.nutch.util.hbase.WebTableColumns Is this a bug? public interface WebTableColumns { public static final String BASE_URL_STR = "bas"; public static final String STATUS_STR = "stt"; public static final String FETCH_TIME_STR = "fcht"; public static final String RETRIES_STR = "rtrs"; public static final String FETCH_INTERVAL_STR = "fchi"; public static final String SCORE_STR = "scr"; public static final String MODIFIED_TIME_STR = "modt"; public static final String SIGNATURE_STR = "sig"; public static final String CONTENT_STR = "cnt"; public static final String CONTENT_TYPE_STR = "cnttyp:"; public static final String TITLE_STR = "ttl:"; public static final String OUTLINKS_STR = "olnk:"; public static final String INLINKS_STR = "ilnk:"; public static final String PARSE_STATUS_STR = "prsstt:"; public static final String PROTOCOL_STATUS_STR = "prtstt:"; public static final String TEXT_STR = "txt:"; public static final String REPR_URL_STR = "repr:"; public static final String HEADERS_STR = "hdrs:"; public static final String METADATA_STR = "mtdt:"; > Hbase Integration > ----------------- > > Key: NUTCH-650 > URL: https://issues.apache.org/jira/browse/NUTCH-650 > Project: Nutch > Issue Type: New Feature > Affects Versions: 1.0.0 > Reporter: Doğacan Güney > Assignee: Doğacan Güney > Fix For: 1.1 > > Attachments: hbase-integration_v1.patch, hbase_v2.patch, > malformedurl.patch, meta.patch, meta2.patch, nofollow-hbase.patch, > nutch-habase.patch, searching.diff, slash.patch > > > This issue will track nutch/hbase integration -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.