Re:Re:Got Duplicate Records for the Same Row Key from a Snapshot

shanghaihyj Fri, 18 May 2018 01:15:07 -0700

We find that the metadata of offline regions are included in the snapshot.


When we query a table, offline regions are not considered.
When we query a snapshot of this table, offline regions are included.
These offline regions refer to the same data in HDFS.  That is why duplicate 
records are returned from the snapshot.


Any suggestion how to handle this gracefully ?



At 2018-05-17 19:04:17, "shanghaihyj" <shanghai...@163.com> wrote:
>We are loading data from the HBase table or its snapshot by hbase-rdd 
>(https://github.com/unicredit/hbase-rdd). It uses TableInputFormat / 
>TableSnapshotInputFormat as the underlying input format.
>The scaner has max version set to 1.
>
>
>
>At 2018-05-17 15:35:08, "shanghaihyj" <shanghai...@163.com> wrote:
>
>When we query a table by a particular row key, there is only one row returned 
>by HBase, which is expected.
>However, when we query a snapshot for that same table, by the same particular 
>row key, five duplicate rows are returned.  Why ?
>
>
>
>
>In the log of the master server, we see some snapshot-related error:
>===================== ERROR START =====================
>ERROR [master:sh-bs-3-b8-namenode-17-208:60000.archivedHFileCleaner] 
>snapshot.SnapshotHFileCleaner: Exception while checking if files were valid, 
>keeping them just in case.
>./hbase-root-master-sh-bs-3-b8-namenode-17-208.log.7:org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException:
> Couldn't read snapshot info 
>from:hdfs://master1.hh:8020/hbase/.hbase-snapshot/.tmp/hb_anchor_original_total_7days_stat_1526423587063/.snapshotinfo
>./hbase-root-master-sh-bs-3-b8-namenode-17-208.log.7-   at 
>org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:325)
>./hbase-root-master-sh-bs-3-b8-namenode-17-208.log.7-   at 
>org.apache.hadoop.hbase.snapshot.SnapshotReferenceUtil.getHFileNames(SnapshotReferenceUtil.java:328)
>./hbase-root-master-sh-bs-3-b8-namenode-17-208.log.7-   at 
>org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner$1.filesUnderSnapshot(SnapshotHFileCleaner.java:85)
>./hbase-root-master-sh-bs-3-b8-namenode-17-208.log.7-   at 
>org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getSnapshotsInProgress(SnapshotFileCache.java:303)
>./hbase-root-master-sh-bs-3-b8-namenode-17-208.log.7-   at 
>org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache.getUnreferencedFiles(SnapshotFileCache.java:194)
>./hbase-root-master-sh-bs-3-b8-namenode-17-208.log.7-   at 
>org.apache.hadoop.hbase.master.snapshot.SnapshotHFileCleaner.getDeletableFiles(SnapshotHFileCleaner.java:62)
>./hbase-root-master-sh-bs-3-b8-namenode-17-208.log.7-   at 
>org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteFiles(CleanerChore.java:233)
>./hbase-root-master-sh-bs-3-b8-namenode-17-208.log.7-   at 
>org.apache.hadoop.hbase.master.cleaner.CleanerChore.checkAndDeleteEntries(CleanerChore.java:157)
>...
>===================== ERROR END =====================
>And we find a related issue for this error: 
>https://issues.apache.org/jira/browse/HBASE-16464?attachmentSortBy=fileName
>
>
>However, there is no proof that the error in the log is related to our problem 
>of having duplicate records from a snapshot.
>Our HBase version is 0.98.18-hadoop2.
>
>
>Could you help give some hint why we are having duplicate records from the 
>snapshot ?
>
>
>
>
>
>
>
>
>
>
>
>
>

Re:Re:Got Duplicate Records for the Same Row Key from a Snapshot

Reply via email to