[jira] [Updated] (HBASE-15557) Add guidance on HashTable/SyncTable to the RefGuide

2020-04-07 Thread Nick Dimiduk (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nick Dimiduk updated HBASE-15557:
-
Issue Type: Task  (was: Bug)

> Add guidance on HashTable/SyncTable to the RefGuide
> ---
>
> Key: HBASE-15557
> URL: https://issues.apache.org/jira/browse/HBASE-15557
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 1.2.0
>Reporter: Sean Busbey
>Assignee: Wellington Chevreuil
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-15557.master.001.patch, 
> HBASE-15557.master.002.patch
>
>
> The docs for SyncTable are insufficient. Brief description from [~davelatham] 
> HBASE-13639 comment:
> {quote}
> Sorry for the lack of better documentation, Abhishek Soni. Thanks for 
> bringing it up. I'll try to provide a better explanation. You may have 
> already seen it, but if not, the design doc linked in the description above 
> may also give you some better clues as to how it should be used.
> Briefly, the feature is intended to start with a pair of tables in remote 
> clusters that are already substantially similar and make them identical by 
> comparing hashes of the data and copying only the diffs instead of having to 
> copy the entire table. So it is targeted at a very specific use case (with 
> some work it could generalize to cover things like CopyTable and 
> VerifyRepliaction but it's not there yet). To use it, you choose one table to 
> be the "source", and the other table is the "target". After the process is 
> complete the target table should end up being identical to the source table.
> In the source table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the 
> source table and an output directory in HDFS. HashTable will scan the source 
> table, break the data up into row key ranges (default of 8kB per range) and 
> produce a hash of the data for each range.
> Make the hashes available to the target cluster - I'd recommend using DistCp 
> to copy it across.
> In the target table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where 
> you put the hashes, and the names of the source and destination tables. You 
> will likely also need to specify the source table's ZK quorum via the 
> --sourcezkcluster option. SyncTable will then read the hash information, and 
> compute the hashes of the same row ranges for the target table. For any row 
> range where the hash fails to match, it will open a remote scanner to the 
> source table, read the data for that range, and do Puts and Deletes to the 
> target table to update it to match the source.
> I hope that clarifies it a bit. Let me know if you need a hand. If anyone 
> wants to work on getting some documentation into the book, I can try to write 
> some more but would love a hand on turning it into an actual book patch.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-15557) Add guidance on HashTable/SyncTable to the RefGuide

2019-03-04 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15557:
---
Fix Version/s: (was: 2.2.0)

> Add guidance on HashTable/SyncTable to the RefGuide
> ---
>
> Key: HBASE-15557
> URL: https://issues.apache.org/jira/browse/HBASE-15557
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.2.0
>Reporter: Sean Busbey
>Assignee: Wellington Chevreuil
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-15557.master.001.patch, 
> HBASE-15557.master.002.patch
>
>
> The docs for SyncTable are insufficient. Brief description from [~davelatham] 
> HBASE-13639 comment:
> {quote}
> Sorry for the lack of better documentation, Abhishek Soni. Thanks for 
> bringing it up. I'll try to provide a better explanation. You may have 
> already seen it, but if not, the design doc linked in the description above 
> may also give you some better clues as to how it should be used.
> Briefly, the feature is intended to start with a pair of tables in remote 
> clusters that are already substantially similar and make them identical by 
> comparing hashes of the data and copying only the diffs instead of having to 
> copy the entire table. So it is targeted at a very specific use case (with 
> some work it could generalize to cover things like CopyTable and 
> VerifyRepliaction but it's not there yet). To use it, you choose one table to 
> be the "source", and the other table is the "target". After the process is 
> complete the target table should end up being identical to the source table.
> In the source table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the 
> source table and an output directory in HDFS. HashTable will scan the source 
> table, break the data up into row key ranges (default of 8kB per range) and 
> produce a hash of the data for each range.
> Make the hashes available to the target cluster - I'd recommend using DistCp 
> to copy it across.
> In the target table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where 
> you put the hashes, and the names of the source and destination tables. You 
> will likely also need to specify the source table's ZK quorum via the 
> --sourcezkcluster option. SyncTable will then read the hash information, and 
> compute the hashes of the same row ranges for the target table. For any row 
> range where the hash fails to match, it will open a remote scanner to the 
> source table, read the data for that range, and do Puts and Deletes to the 
> target table to update it to match the source.
> I hope that clarifies it a bit. Let me know if you need a hand. If anyone 
> wants to work on getting some documentation into the book, I can try to write 
> some more but would love a hand on turning it into an actual book patch.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-15557) Add guidance on HashTable/SyncTable to the RefGuide

2019-03-03 Thread Guanghao Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guanghao Zhang updated HBASE-15557:
---
Fix Version/s: 2.2.0

> Add guidance on HashTable/SyncTable to the RefGuide
> ---
>
> Key: HBASE-15557
> URL: https://issues.apache.org/jira/browse/HBASE-15557
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.2.0
>Reporter: Sean Busbey
>Assignee: Wellington Chevreuil
>Priority: Critical
> Fix For: 3.0.0, 2.2.0
>
> Attachments: HBASE-15557.master.001.patch, 
> HBASE-15557.master.002.patch
>
>
> The docs for SyncTable are insufficient. Brief description from [~davelatham] 
> HBASE-13639 comment:
> {quote}
> Sorry for the lack of better documentation, Abhishek Soni. Thanks for 
> bringing it up. I'll try to provide a better explanation. You may have 
> already seen it, but if not, the design doc linked in the description above 
> may also give you some better clues as to how it should be used.
> Briefly, the feature is intended to start with a pair of tables in remote 
> clusters that are already substantially similar and make them identical by 
> comparing hashes of the data and copying only the diffs instead of having to 
> copy the entire table. So it is targeted at a very specific use case (with 
> some work it could generalize to cover things like CopyTable and 
> VerifyRepliaction but it's not there yet). To use it, you choose one table to 
> be the "source", and the other table is the "target". After the process is 
> complete the target table should end up being identical to the source table.
> In the source table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the 
> source table and an output directory in HDFS. HashTable will scan the source 
> table, break the data up into row key ranges (default of 8kB per range) and 
> produce a hash of the data for each range.
> Make the hashes available to the target cluster - I'd recommend using DistCp 
> to copy it across.
> In the target table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where 
> you put the hashes, and the names of the source and destination tables. You 
> will likely also need to specify the source table's ZK quorum via the 
> --sourcezkcluster option. SyncTable will then read the hash information, and 
> compute the hashes of the same row ranges for the target table. For any row 
> range where the hash fails to match, it will open a remote scanner to the 
> source table, read the data for that range, and do Puts and Deletes to the 
> target table to update it to match the source.
> I hope that clarifies it a bit. Let me know if you need a hand. If anyone 
> wants to work on getting some documentation into the book, I can try to write 
> some more but would love a hand on turning it into an actual book patch.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-15557) Add guidance on HashTable/SyncTable to the RefGuide

2018-11-07 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-15557:

Summary: Add guidance on HashTable/SyncTable to the RefGuide  (was: 
document SyncTable in ref guide)

> Add guidance on HashTable/SyncTable to the RefGuide
> ---
>
> Key: HBASE-15557
> URL: https://issues.apache.org/jira/browse/HBASE-15557
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.2.0
>Reporter: Sean Busbey
>Assignee: Wellington Chevreuil
>Priority: Critical
> Attachments: HBASE-15557.master.001.patch, 
> HBASE-15557.master.002.patch
>
>
> The docs for SyncTable are insufficient. Brief description from [~davelatham] 
> HBASE-13639 comment:
> {quote}
> Sorry for the lack of better documentation, Abhishek Soni. Thanks for 
> bringing it up. I'll try to provide a better explanation. You may have 
> already seen it, but if not, the design doc linked in the description above 
> may also give you some better clues as to how it should be used.
> Briefly, the feature is intended to start with a pair of tables in remote 
> clusters that are already substantially similar and make them identical by 
> comparing hashes of the data and copying only the diffs instead of having to 
> copy the entire table. So it is targeted at a very specific use case (with 
> some work it could generalize to cover things like CopyTable and 
> VerifyRepliaction but it's not there yet). To use it, you choose one table to 
> be the "source", and the other table is the "target". After the process is 
> complete the target table should end up being identical to the source table.
> In the source table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the 
> source table and an output directory in HDFS. HashTable will scan the source 
> table, break the data up into row key ranges (default of 8kB per range) and 
> produce a hash of the data for each range.
> Make the hashes available to the target cluster - I'd recommend using DistCp 
> to copy it across.
> In the target table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where 
> you put the hashes, and the names of the source and destination tables. You 
> will likely also need to specify the source table's ZK quorum via the 
> --sourcezkcluster option. SyncTable will then read the hash information, and 
> compute the hashes of the same row ranges for the target table. For any row 
> range where the hash fails to match, it will open a remote scanner to the 
> source table, read the data for that range, and do Puts and Deletes to the 
> target table to update it to match the source.
> I hope that clarifies it a bit. Let me know if you need a hand. If anyone 
> wants to work on getting some documentation into the book, I can try to write 
> some more but would love a hand on turning it into an actual book patch.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HBASE-15557) Add guidance on HashTable/SyncTable to the RefGuide

2018-11-07 Thread Sean Busbey (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Busbey updated HBASE-15557:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

merged. Thanks again [~wchevreuil] this is a great doc addition!

maybe for follow-on, this bit sounds like an error condition we should detect?

{code}
+.Set sourcezkcluster to the actual source cluster ZK quorum
+[NOTE]
+
+Although not required, if sourcezkcluster is not set, SyncTable will connect 
to local HBase cluster for both source and target,
+which does not give any meaningful result.
{code}


> Add guidance on HashTable/SyncTable to the RefGuide
> ---
>
> Key: HBASE-15557
> URL: https://issues.apache.org/jira/browse/HBASE-15557
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Affects Versions: 1.2.0
>Reporter: Sean Busbey
>Assignee: Wellington Chevreuil
>Priority: Critical
> Fix For: 3.0.0
>
> Attachments: HBASE-15557.master.001.patch, 
> HBASE-15557.master.002.patch
>
>
> The docs for SyncTable are insufficient. Brief description from [~davelatham] 
> HBASE-13639 comment:
> {quote}
> Sorry for the lack of better documentation, Abhishek Soni. Thanks for 
> bringing it up. I'll try to provide a better explanation. You may have 
> already seen it, but if not, the design doc linked in the description above 
> may also give you some better clues as to how it should be used.
> Briefly, the feature is intended to start with a pair of tables in remote 
> clusters that are already substantially similar and make them identical by 
> comparing hashes of the data and copying only the diffs instead of having to 
> copy the entire table. So it is targeted at a very specific use case (with 
> some work it could generalize to cover things like CopyTable and 
> VerifyRepliaction but it's not there yet). To use it, you choose one table to 
> be the "source", and the other table is the "target". After the process is 
> complete the target table should end up being identical to the source table.
> In the source table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the 
> source table and an output directory in HDFS. HashTable will scan the source 
> table, break the data up into row key ranges (default of 8kB per range) and 
> produce a hash of the data for each range.
> Make the hashes available to the target cluster - I'd recommend using DistCp 
> to copy it across.
> In the target table's cluster, run 
> org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where 
> you put the hashes, and the names of the source and destination tables. You 
> will likely also need to specify the source table's ZK quorum via the 
> --sourcezkcluster option. SyncTable will then read the hash information, and 
> compute the hashes of the same row ranges for the target table. For any row 
> range where the hash fails to match, it will open a remote scanner to the 
> source table, read the data for that range, and do Puts and Deletes to the 
> target table to update it to match the source.
> I hope that clarifies it a bit. Let me know if you need a hand. If anyone 
> wants to work on getting some documentation into the book, I can try to write 
> some more but would love a hand on turning it into an actual book patch.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)