[GitHub] [hbase] Apache9 commented on a change in pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

2021-12-16 Thread GitBox


Apache9 commented on a change in pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#discussion_r770644789



##
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##
@@ -0,0 +1,175 @@
+
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still 
used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary 
directories first, renaming
+those files to the actual store directory at operation commit time. That's a 
simple and convenient
+way to separate transient from already finalised files that are ready to serve 
client reads with data.
+This approach works well with strong consistent file systems, but with the 
popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename 
operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the 
most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer 
implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles 
and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as 
in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved 
into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches 
between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by 
default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard 
approach using temporary
+directories and renames.
+
+### FILE
+
+A file tracker implementation that creates new files straight in the store 
directory, avoiding the
+need for rename operations. It keeps a list of committed hfiles in memory, 
backed by meta files, in
+each store directory. Whenever a new hfile is committed, the list of _tracked 
files_ in the given
+store is updated and a new meta file is written with this list contents, 
discarding the previous
+meta file now containing an out dated list.
+
+### MIGRATION
+
+A special implementation to be used when swapping between Store File Tracking 
implementations on
+pre-existing tables that already contain data, and therefore, files being 
tracked under an specific
+logic.
+
+== Usage
+
+For fresh deployments that don't yet contain any user data, *FILE* 
implementation can be just set as
+value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* 
configuration, prior
+to the first hbase start. Omitting this property sets the *DEFAULT* 
implementation.
+
+### Switching implementations globally

Review comment:
   Filed HBASE-26586, HBASE-26587 and HBASE-26588 for these things. Will 
work on them soon once the feature branch is merged :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hbase] Apache9 commented on a change in pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

2021-12-15 Thread GitBox


Apache9 commented on a change in pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#discussion_r770275306



##
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##
@@ -0,0 +1,175 @@
+
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still 
used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary 
directories first, renaming
+those files to the actual store directory at operation commit time. That's a 
simple and convenient
+way to separate transient from already finalised files that are ready to serve 
client reads with data.
+This approach works well with strong consistent file systems, but with the 
popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename 
operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the 
most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer 
implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles 
and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as 
in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved 
into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches 
between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by 
default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard 
approach using temporary
+directories and renames.
+
+### FILE
+
+A file tracker implementation that creates new files straight in the store 
directory, avoiding the
+need for rename operations. It keeps a list of committed hfiles in memory, 
backed by meta files, in
+each store directory. Whenever a new hfile is committed, the list of _tracked 
files_ in the given
+store is updated and a new meta file is written with this list contents, 
discarding the previous
+meta file now containing an out dated list.
+
+### MIGRATION
+
+A special implementation to be used when swapping between Store File Tracking 
implementations on
+pre-existing tables that already contain data, and therefore, files being 
tracked under an specific
+logic.
+
+== Usage
+
+For fresh deployments that don't yet contain any user data, *FILE* 
implementation can be just set as
+value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* 
configuration, prior
+to the first hbase start. Omitting this property sets the *DEFAULT* 
implementation.
+
+### Switching implementations globally

Review comment:
   Oh, looking at the code, MigrateStoreFileTrackerProcedure will not 
always set the SFT impl to default, if we set a MIGRATION store file tracker 
globally, RollingUpgradeChore will set the SFT implementation to MIGRATION.
   
   But this does not work actually, the trick here is that we will not actually 
reopen all the regions in MigrateStoreFileTrackerProcedure, because we think 
that we does not change anything actually. So this should be a bug, we should 
not set the SFT implementation to anything other than DEFAULT...
   
   So for me, first, we could implement a special admin API to change the SFT 
implementation, to hide the intermediate MIGRATION state. This can be done with 
a special procedure which schedule two ModifyTableProcedure as sub procedures. 
And we could also implement 

[GitHub] [hbase] Apache9 commented on a change in pull request #3942: HBASE-26265 Update ref guide to mention the new store file tracker im…

2021-12-14 Thread GitBox


Apache9 commented on a change in pull request #3942:
URL: https://github.com/apache/hbase/pull/3942#discussion_r768775708



##
File path: src/main/asciidoc/_chapters/store_file_tracking.adoc
##
@@ -0,0 +1,175 @@
+
+/**
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+
+[[storefiletracking]]
+= Store File Tracking
+:doctype: book
+:numbered:
+:toc: left
+:icons: font
+:experimental:
+
+== Overview
+
+This feature introduces an abstraction layer to track store files still 
used/needed by store
+engines, allowing for plugging different approaches of identifying store
+files required by the given store.
+
+Historically, HBase internals have relied on creating hfiles on temporary 
directories first, renaming
+those files to the actual store directory at operation commit time. That's a 
simple and convenient
+way to separate transient from already finalised files that are ready to serve 
client reads with data.
+This approach works well with strong consistent file systems, but with the 
popularity of less consistent
+file systems, mainly Object Store file systems, dependency on rename 
operations starts to introduce
+performance penalties. Amazon S3 Object Store, in particular, has been the 
most affected deployment,
+due to the its lack of atomic renames, requiring an additional locking layer 
implemented by HBOSS,
+to guarantee consistency and integrity of operations.
+
+With *Store File Tracking*, decision on where to originally create new hfiles 
and how to proceed upon
+commit is delegated to the specific Store File Tracking implementation.
+It can be set at individual Table or Column Family configurations, as well as 
in processes
+*hbase-site.xml* configuration file.
+
+NOTE: When specified in *hbase_site.xml*, this configuration is also saved 
into tables configuration
+at table creation time. This is to avoid dangerous configuration mismatches 
between processes, which
+could potentially lead to data loss.
+
+== Available Implementations
+
+Store File Tracking initial version provides three builtin implementations:
+
+* DEFAULT
+* FILE
+* MIGRATION
+
+### DEFAULT
+
+As per the name, this is the Store File Tracking implementation used by 
default when now explicit
+configuration has been defined. The DEFAULT tracker implements the standard 
approach using temporary
+directories and renames.
+
+### FILE
+
+A file tracker implementation that creates new files straight in the store 
directory, avoiding the
+need for rename operations. It keeps a list of committed hfiles in memory, 
backed by meta files, in
+each store directory. Whenever a new hfile is committed, the list of _tracked 
files_ in the given
+store is updated and a new meta file is written with this list contents, 
discarding the previous
+meta file now containing an out dated list.
+
+### MIGRATION
+
+A special implementation to be used when swapping between Store File Tracking 
implementations on
+pre-existing tables that already contain data, and therefore, files being 
tracked under an specific
+logic.
+
+== Usage
+
+For fresh deployments that don't yet contain any user data, *FILE* 
implementation can be just set as
+value for *hbase.store.file-tracker.impl* property in global *hbase-site.xml* 
configuration, prior
+to the first hbase start. Omitting this property sets the *DEFAULT* 
implementation.
+
+### Switching implementations globally

Review comment:
   Have you tried this operation? I do not think it works in this way...
   
   The global config will only effect new tables. So I think here you need to 
alter the tables one by one...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org