Sounds like quite a puzzle. You mentioned that you can read data written through manual Puts from the shell - but not data from the Import. There must be something different about the data itself once it's in the table. Can you compare a row that was imported to a row that was manually written - or show them to us?
On Wed, May 27, 2015 at 7:09 AM, <[email protected]> wrote: > So more experimentation over the long weekend on this. > > If I load sample data into the new cluster table manually through the > shell, column filters work as expected. > > Obviously not a solution to the problem. Anyone have any ideas or things > I should be looking at? The regionserver logs show nothing unusual. > > Is there another export/import chain I could try? > > Thanks, > Zack > > > On Sun, May 24, 2015, at 11:43 AM, [email protected] wrote: >> Hello all- >> >> I'm hoping someone can point me in the right direction as I've exhausted >> all my knowledge and abilities on the topic... >> >> I've inherited an old, poorly configured and brittle CDH4 cluster >> running HBase 0.92. I'm attempting to migrate the data to a new Ambari >> cluster running HBase 0.98. I'm attempting to do this without changing >> anything on the old cluster as I have hard enough time keeping it >> running as is. Also, due to configuration issues with the old cluster >> (on AWS), a direct HBase to HBase table copy, or even HDFS to HDFS copy >> is out of the question at the moment. >> >> I was able to use the export task on the old cluster to dump the HBase >> tables to HDFS, which I then distcp s3n copied up to S3, then back down >> to the new cluster, then used the HBase importer. This appears to work >> fine... >> >> ... except that on the new cluster table scans with column filters do >> not work. >> >> A sample row looks something this: >> A:9223370612274019807:twtr:56935907581904486 column=x:twitter:username, >> timestamp=1424592575087, value=Bilo Selhi >> >> Unfortunately, even though I can see the column is properly defined, I >> cannot filter on it: >> >> hbase(main):015:0> scan 'content' , {LIMIT=>10, >> COLUMNS=>'x:twitter:username'} >> ROW COLUMN+CELL >> 0 row(s) in 352.7990 seconds >> >> Any ideas what the heck is going here? >> >> Here's the rough process I used for the export/import: >> Old cluster: >> $ hbase org.apache.hadoop.hbase.mapreduce.Driver export content >> hdfs:///hbase_content >> $ hadoop distcp -Dfs.s3n.awsAccessKeyId='xxxx' >> -Dfs.s3n.awsSecretAccessKey='xxxx' -i hdfs:///hbase_content >> s3n://hbase_content >> >> New cluster: >> $ hadoop distcp -Dfs.s3n.awsAccessKeyId='xxxx' >> -Dfs.s3n.awsSecretAccessKey='xxxx' -i s3n://hbase_content >> hdfs:///hbase_content >> $ hbase -Dhbase.import.version=0.94 >> org.apache.hadoop.hbase.mapreduce.Driver import content >> hdfs:///hbase_content >> >> Thanks! >> Z
