[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358057459 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/TimelineLayout.java ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.table; + +import org.apache.hudi.common.model.TimelineLayoutVersion; +import org.apache.hudi.common.table.timeline.HoodieInstant; +import org.apache.hudi.common.util.collection.Pair; + +import java.io.Serializable; +import java.util.HashMap; +import java.util.Map; +import java.util.stream.Collectors; +import java.util.stream.Stream; + + +/** + * Timeline Layout responsible for applying specific filters when generating timeline instants. + */ +public abstract class TimelineLayout implements Serializable { + + private static final Map LAYOUT_MAP = new HashMap<>(); + + static { +LAYOUT_MAP.put(new TimelineLayoutVersion(TimelineLayoutVersion.VERSION_0), new TimelineLayoutV0()); +LAYOUT_MAP.put(new TimelineLayoutVersion(TimelineLayoutVersion.VERSION_1), new TimelineLayoutV1()); + } + + public static TimelineLayout getLayout(TimelineLayoutVersion version) { +return LAYOUT_MAP.get(version); + } + + public abstract Stream filterHoodieInstants(Stream instantStream); + + /** + * Table Layout where state transitions are managed by renaming files. + */ + private static class TimelineLayoutV0 extends TimelineLayout { Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358011854 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java ## @@ -414,23 +436,41 @@ public String getCommitActionType() { /** * Helper method to scan all hoodie-instant metafiles and construct HoodieInstant objects. * - * @param fs FileSystem - * @param metaPath Meta Path where hoodie instants are present * @param includedExtensions Included hoodie extensions + * @param applyLayoutVersionFilters Depending on Timeline layout version, if there are multiple states for the same + * action instant, only include the highest state * @return List of Hoodie Instants generated * @throws IOException in case of failure */ - public static List scanHoodieInstantsFromFileSystem(FileSystem fs, Path metaPath, - Set includedExtensions) throws IOException { -return Arrays.stream(HoodieTableMetaClient.scanFiles(fs, metaPath, path -> { - // Include only the meta files with extensions that needs to be included - String extension = FSUtils.getFileExtension(path.getName()); - return includedExtensions.contains(extension); -})).sorted(Comparator.comparing( -// Sort the meta-data by the instant time (first part of the file name) -fileStatus -> FSUtils.getInstantTime(fileStatus.getPath().getName( -// create HoodieInstantMarkers from FileStatus, which extracts properties -.map(HoodieInstant::new).collect(Collectors.toList()); + public List scanHoodieInstantsFromFileSystem(Set includedExtensions, + boolean applyLayoutVersionFilters) throws IOException { +return scanHoodieInstantsFromFileSystem(new Path(metaPath), includedExtensions, applyLayoutVersionFilters); + } + + /** + * Helper method to scan all hoodie-instant metafiles and construct HoodieInstant objects. + * + * @param timelinePath MetaPath where instant files are stored + * @param includedExtensions Included hoodie extensions + * @param applyLayoutVersionFilters Depending on Timeline layout version, if there are multiple states for the same + * action instant, only include the highest state + * @return List of Hoodie Instants generated + * @throws IOException in case of failure + */ + public List scanHoodieInstantsFromFileSystem(Path timelinePath, Set includedExtensions, + boolean applyLayoutVersionFilters) throws IOException { +Stream instantStream = Arrays.stream( +HoodieTableMetaClient +.scanFiles(getFs(), timelinePath, path -> { + // Include only the meta files with extensions that needs to be included + String extension = FSUtils.getFileExtension(path.getName()); + return includedExtensions.contains(extension); +})).map(HoodieInstant::new); + +if (applyLayoutVersionFilters) { + instantStream = TimelineLayout.getLayout(getTimelineLayoutVersion()).filterHoodieInstants(instantStream); Review comment: The key case here is for archival where we need all instants without filtering. May be introduce couple of factory methods which instantiate HoodieActiveTimeline w/o filtering ? HUDI-414 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358009213 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -112,6 +116,10 @@ public static void createHoodieProperties(FileSystem fs, Path metadataFolder, Pr if (!properties.containsKey(HOODIE_ARCHIVELOG_FOLDER_PROP_NAME)) { properties.setProperty(HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, DEFAULT_ARCHIVELOG_FOLDER); } + if (!properties.containsKey(HOODIE_TIMELINE_LAYOUT_VERSION)) { Review comment: The version in hoodie.properties is treated as default configuration. MetaClient Writers override this config with version-1 (unless they opt out). MetaClient readers will always use latest version (version - 1) which is backwards compatible. Keeping a version in hoodie.properties would also hopefully be useful in future migrations This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007869 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java ## @@ -274,25 +286,35 @@ public synchronized HoodieArchivedTimeline getArchivedTimeline() { */ public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, String tableType, String tableName, String archiveLogFolder) throws IOException { -HoodieTableType type = HoodieTableType.valueOf(tableType); -Properties properties = new Properties(); -properties.put(HoodieTableConfig.HOODIE_TABLE_NAME_PROP_NAME, tableName); -properties.put(HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME, type.name()); -properties.put(HoodieTableConfig.HOODIE_ARCHIVELOG_FOLDER_PROP_NAME, archiveLogFolder); -return HoodieTableMetaClient.initDatasetAndGetMetaClient(hadoopConf, basePath, properties); +return initTableType(hadoopConf, basePath, HoodieTableType.valueOf(tableType), tableName, +archiveLogFolder, null, null); } /** * Helper method to initialize a given path, as a given storage type and table name. */ public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, HoodieTableType tableType, String tableName, String payloadClassName) throws IOException { +return initTableType(hadoopConf, basePath, tableType, tableName, null, payloadClassName, null); + } + + public static HoodieTableMetaClient initTableType(Configuration hadoopConf, String basePath, + HoodieTableType tableType, String tableName, String archiveLogFolder, String payloadClassName, + Integer timelineLayoutVersion) throws IOException { Properties properties = new Properties(); properties.setProperty(HoodieTableConfig.HOODIE_TABLE_NAME_PROP_NAME, tableName); properties.setProperty(HoodieTableConfig.HOODIE_TABLE_TYPE_PROP_NAME, tableType.name()); -if (tableType == HoodieTableType.MERGE_ON_READ) { +if ((tableType == HoodieTableType.MERGE_ON_READ) && (payloadClassName != null)) { Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007840 ## File path: hudi-common/src/main/avro/HoodieArchivedMetaEntry.avsc ## @@ -37,6 +37,7 @@ "default": null }, { + /** DEPRECATED **/ Review comment: Yeah, This is was intended to be equivalent of HoodieCommitMetadata but since we are using HoodieCommitMetadata for storing results of compaction, this is never used. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007852 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/TimelineLayoutVersion.java ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import com.google.common.base.Preconditions; + +import java.io.Serializable; +import java.util.Objects; + +/** + * Metadata Layout Version. Add new version when timeline format changes + */ +public class TimelineLayoutVersion implements Serializable, Comparable { + + public static final Integer VERSION_0 = 0; // pre 0.5.1 version format + public static final Integer VERSION_1 = 1; // current version with no renames + + public static final Integer CURR_VERSION = VERSION_1; + public static final TimelineLayoutVersion LATEST_TIMELINE_LAYOUT_VERSION = new TimelineLayoutVersion(CURR_VERSION); Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007673 ## File path: hudi-common/src/main/avro/HoodieArchivedMetaEntry.avsc ## @@ -74,6 +75,27 @@ "name":"version", "type":["int", "null"], "default": 1 + }, + { + "name":"hoodieCompactionPlan", + "type":[ +"null", +"HoodieCompactionPlan" + ], + "default": null + }, + { + "name":"hoodieCleanerPlan", + "type":[ +"null", +"HoodieCleanerPlan" + ], + "default": null + }, + { + "name":"actionState", Review comment: Yes, both actionType and instantTime are already available. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007647 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -733,6 +746,11 @@ public HoodieWriteConfig build() { setDefaultOnCondition(props, !isConsistencyGuardSet, ConsistencyGuardConfig.newBuilder().fromProperties(props).build()); + String layoutVersion = props.getProperty(OVERRIDDEN_TIMELINE_LAYOUT_VERSION); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007621 ## File path: hudi-client/src/main/java/org/apache/hudi/config/HoodieWriteConfig.java ## @@ -48,6 +49,7 @@ public class HoodieWriteConfig extends DefaultHoodieConfig { public static final String TABLE_NAME = "hoodie.table.name"; + private static final String OVERRIDDEN_TIMELINE_LAYOUT_VERSION = "hoodie.timeline.layout.version"; Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007569 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java ## @@ -32,7 +35,24 @@ * * @see HoodieTimeline */ -public class HoodieInstant implements Serializable { +public class HoodieInstant implements Serializable, Comparable { + + /** + * A COMPACTION action eventually becomes COMMIT when completed. So, when grouping instants + * for state transitions, this needs to be taken into account + */ + private static final Map COMPARABLE_ACTIONS = new ImmutableMap.Builder() + .put(HoodieTimeline.COMPACTION_ACTION, HoodieTimeline.COMMIT_ACTION).build(); + + public static final Comparator ACTION_COMPARATOR = + Comparator.comparing(instant -> getCompatibleAction(instant.getAction())); + + public static final Comparator COMPARATOR = Comparator.comparing(HoodieInstant::getTimestamp) + .thenComparing(ACTION_COMPARATOR).thenComparing(HoodieInstant::getState); + + public static final String getCompatibleAction(String action) { Review comment: Picked Comparable. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007553 ## File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java ## @@ -792,23 +798,25 @@ public void restoreToInstant(final String instantTime) throws HoodieRollbackExce .filter(instant -> HoodieActiveTimeline.GREATER.test(instant.getTimestamp(), instantTime)) .collect(Collectors.toList()); // Start a rollback instant for all commits to be rolled back -String startRollbackInstant = startInstant(); +String startRollbackInstant = HoodieActiveTimeline.createNewCommitTime(); Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007581 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -61,11 +62,11 @@ public static final SimpleDateFormat COMMIT_FORMATTER = new SimpleDateFormat("MMddHHmmss"); - public static final Set VALID_EXTENSIONS_IN_ACTIVE_TIMELINE = - new HashSet<>(Arrays.asList(new String[] {COMMIT_EXTENSION, INFLIGHT_COMMIT_EXTENSION, DELTA_COMMIT_EXTENSION, - INFLIGHT_DELTA_COMMIT_EXTENSION, SAVEPOINT_EXTENSION, INFLIGHT_SAVEPOINT_EXTENSION, CLEAN_EXTENSION, - INFLIGHT_CLEAN_EXTENSION, REQUESTED_CLEAN_EXTENSION, INFLIGHT_COMPACTION_EXTENSION, - REQUESTED_COMPACTION_EXTENSION, INFLIGHT_RESTORE_EXTENSION, RESTORE_EXTENSION})); + public static final Set VALID_EXTENSIONS_IN_ACTIVE_TIMELINE = new HashSet<>(Arrays.asList( + new String[]{COMMIT_EXTENSION, INFLIGHT_COMMIT_EXTENSION, REQUESTED_COMMIT_EXTENTSION, DELTA_COMMIT_EXTENSION, + INFLIGHT_DELTA_COMMIT_EXTENSION, REQUESTED_DELTA_COMMIT_EXTENTSION, SAVEPOINT_EXTENSION, Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007546 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/DatasetsCommand.java ## @@ -85,6 +86,8 @@ public String createTable( @CliOption(key = {"tableName"}, mandatory = true, help = "Hoodie Table Name") final String name, @CliOption(key = {"tableType"}, unspecifiedDefaultValue = "COPY_ON_WRITE", help = "Hoodie Table Type. Must be one of : COPY_ON_WRITE or MERGE_ON_READ") final String tableTypeStr, + @CliOption(key = {"archiveLogFolder"}, help = "Folder Name for storing archived timeline") String archiveFolder, + @CliOption(key = {"layoutVersion"}, help = "Specific Layouyt Version to use") Integer layoutVersion, Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r358007597 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -61,11 +62,11 @@ public static final SimpleDateFormat COMMIT_FORMATTER = new SimpleDateFormat("MMddHHmmss"); - public static final Set VALID_EXTENSIONS_IN_ACTIVE_TIMELINE = - new HashSet<>(Arrays.asList(new String[] {COMMIT_EXTENSION, INFLIGHT_COMMIT_EXTENSION, DELTA_COMMIT_EXTENSION, - INFLIGHT_DELTA_COMMIT_EXTENSION, SAVEPOINT_EXTENSION, INFLIGHT_SAVEPOINT_EXTENSION, CLEAN_EXTENSION, - INFLIGHT_CLEAN_EXTENSION, REQUESTED_CLEAN_EXTENSION, INFLIGHT_COMPACTION_EXTENSION, - REQUESTED_COMPACTION_EXTENSION, INFLIGHT_RESTORE_EXTENSION, RESTORE_EXTENSION})); + public static final Set VALID_EXTENSIONS_IN_ACTIVE_TIMELINE = new HashSet<>(Arrays.asList( + new String[]{COMMIT_EXTENSION, INFLIGHT_COMMIT_EXTENSION, REQUESTED_COMMIT_EXTENTSION, DELTA_COMMIT_EXTENSION, Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r356340283 ## File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java ## @@ -824,14 +843,14 @@ private String startInstant() { "Found commits after time :" + lastCommit + ", please rollback greater commits first"); } -List inflights = - inflightCommitTimeline.getInstants().map(HoodieInstant::getTimestamp).collect(Collectors.toList()); +List inflights = inflightAndRequestedCommitTimeline.getInstants().map(HoodieInstant::getTimestamp) Review comment: Yes, it should work fine. For restoring (reverting completed) compactions, we would still delete log files as part of first rolling-back the inflight state of the compaction before we try to rollback requested state. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354245375 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java ## @@ -30,7 +34,23 @@ * * @see HoodieTimeline */ -public class HoodieInstant implements Serializable { +public class HoodieInstant implements Serializable, Comparable { + + /** + * A COMPACTION action eventually becomes COMMIT when completed. So, when grouping instants + * for state transitions, this needs to be taken into account + */ + private static final Set comparableActions = Arrays.stream(new String[] { HoodieTimeline.COMMIT_ACTION, Review comment: Added a jira to track this : https://jira.apache.org/jira/browse/HUDI-384 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354243394 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -312,8 +332,11 @@ public HoodieInstant transitionCompactionInflightToComplete(HoodieInstant inflig } private void createFileInAuxiliaryFolder(HoodieInstant instant, Option data) { -Path fullPath = new Path(metaClient.getMetaAuxiliaryPath(), instant.getFileName()); -createFileInPath(fullPath, data); +if (metaClient.getMetadataVersion().isNullVersion()) { + // No need for auxiliary folder for newer versions Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354241852 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -212,25 +224,24 @@ public void saveAsComplete(HoodieInstant instant, Option data) { log.info("Completed " + instant); } - public void revertToInflight(HoodieInstant instant) { -log.info("Reverting " + instant + " to inflight "); -revertStateTransition(instant, HoodieTimeline.getInflightInstant(instant)); -log.info("Reverted " + instant + " to inflight"); - } - - public HoodieInstant revertToRequested(HoodieInstant instant) { -log.warn("Reverting " + instant + " to requested "); -HoodieInstant requestedInstant = HoodieTimeline.getRequestedInstant(instant); -revertStateTransition(instant, HoodieTimeline.getRequestedInstant(instant)); -log.warn("Reverted " + instant + " to requested"); -return requestedInstant; + public HoodieInstant revertToInflight(HoodieInstant instant) { +log.info("Reverting instant to inflight " + instant); +HoodieInstant inflight = HoodieTimeline.getInflightInstant(instant, metaClient.getTableType()); +revertCompleteToInflight(instant, inflight); Review comment: Deletes are atomic in GCS but not so in S3. Thought about this. It shouldn't bring in new failure scenarios here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354242118 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -260,8 +271,13 @@ private void deleteInstantFile(HoodieInstant instant) { /** BEGIN - COMPACTION RELATED META-DATA MANAGEMENT **/ - public Option getInstantAuxiliaryDetails(HoodieInstant instant) { -Path detailPath = new Path(metaClient.getMetaAuxiliaryPath(), instant.getFileName()); + public Option getPlanDetailsInBytes(HoodieInstant instant) { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354240466 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java ## @@ -250,7 +253,10 @@ static HoodieInstant getCompactionInflightInstant(final String timestamp) { return new HoodieInstant(State.INFLIGHT, COMPACTION_ACTION, timestamp); } - static HoodieInstant getInflightInstant(final HoodieInstant instant) { + static HoodieInstant getInflightInstant(final HoodieInstant instant, final HoodieTableType tableType) { +if ((tableType == HoodieTableType.MERGE_ON_READ) && instant.getAction().equals(COMMIT_ACTION)) { Review comment: Added This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354240802 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -151,17 +174,6 @@ public HoodieTimeline getDeltaCommitTimeline() { (Function> & Serializable) this::getInstantDetails); } - /** Review comment: Reverted. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354236756 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java ## @@ -420,17 +427,50 @@ public String getCommitActionType() { * @return List of Hoodie Instants generated * @throws IOException in case of failure */ - public static List scanHoodieInstantsFromFileSystem(FileSystem fs, Path metaPath, - Set includedExtensions) throws IOException { -return Arrays.stream(HoodieTableMetaClient.scanFiles(fs, metaPath, path -> { - // Include only the meta files with extensions that needs to be included - String extension = FSUtils.getFileExtension(path.getName()); - return includedExtensions.contains(extension); -})).sorted(Comparator.comparing( -// Sort the meta-data by the instant time (first part of the file name) -fileStatus -> FSUtils.getInstantTime(fileStatus.getPath().getName( -// create HoodieInstantMarkers from FileStatus, which extracts properties -.map(HoodieInstant::new).collect(Collectors.toList()); + public static List scanHoodieInstantsFromFileSystem( + FileSystem fs, Path metaPath, Set includedExtensions) throws IOException { +return scanHoodieInstantsFromFileSystem(fs, metaPath, includedExtensions, true); + } + + /** + * Helper method to scan all hoodie-instant metafiles and construct HoodieInstant objects + * + * @param fs FileSystem + * @param metaPath Meta Path where hoodie instants are present + * @param includedExtensions Included hoodie extensions + * @param excludeIntermediateStates If there are multiple states for the same action instant, + * only include the highest state + * @return List of Hoodie Instants generated + * @throws IOException in case of failure + */ + public static List scanHoodieInstantsFromFileSystem( + FileSystem fs, Path metaPath, Set includedExtensions, boolean excludeIntermediateStates) + throws IOException { +Stream instantStream = Arrays.stream( +HoodieTableMetaClient +.scanFiles(fs, metaPath, path -> { + // Include only the meta files with extensions that needs to be included + String extension = FSUtils.getFileExtension(path.getName()); + return includedExtensions.contains(extension); +})).map(HoodieInstant::new); + +if (excludeIntermediateStates) { + // Remove intermediate states for each (ts,action) pair + instantStream = dedupeInstants(instantStream); +} +return instantStream.sorted().collect(Collectors.toList()); + } + + public static Stream dedupeInstants(Stream instantStream) { +return instantStream.collect(Collectors.groupingBy(x -> Pair.of(x.getTimestamp(), +x.getAction().equals(HoodieTimeline.COMPACTION_ACTION) ? HoodieTimeline.COMMIT_ACTION : x.getAction( +.entrySet().stream().map(e -> e.getValue().stream().reduce((x, y) -> { Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354235768 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java ## @@ -420,17 +427,50 @@ public String getCommitActionType() { * @return List of Hoodie Instants generated * @throws IOException in case of failure */ - public static List scanHoodieInstantsFromFileSystem(FileSystem fs, Path metaPath, - Set includedExtensions) throws IOException { -return Arrays.stream(HoodieTableMetaClient.scanFiles(fs, metaPath, path -> { - // Include only the meta files with extensions that needs to be included - String extension = FSUtils.getFileExtension(path.getName()); - return includedExtensions.contains(extension); -})).sorted(Comparator.comparing( -// Sort the meta-data by the instant time (first part of the file name) -fileStatus -> FSUtils.getInstantTime(fileStatus.getPath().getName( -// create HoodieInstantMarkers from FileStatus, which extracts properties -.map(HoodieInstant::new).collect(Collectors.toList()); + public static List scanHoodieInstantsFromFileSystem( + FileSystem fs, Path metaPath, Set includedExtensions) throws IOException { +return scanHoodieInstantsFromFileSystem(fs, metaPath, includedExtensions, true); + } + + /** + * Helper method to scan all hoodie-instant metafiles and construct HoodieInstant objects + * + * @param fs FileSystem + * @param metaPath Meta Path where hoodie instants are present + * @param includedExtensions Included hoodie extensions + * @param excludeIntermediateStates If there are multiple states for the same action instant, + * only include the highest state + * @return List of Hoodie Instants generated + * @throws IOException in case of failure + */ + public static List scanHoodieInstantsFromFileSystem( + FileSystem fs, Path metaPath, Set includedExtensions, boolean excludeIntermediateStates) + throws IOException { +Stream instantStream = Arrays.stream( +HoodieTableMetaClient +.scanFiles(fs, metaPath, path -> { + // Include only the meta files with extensions that needs to be included + String extension = FSUtils.getFileExtension(path.getName()); + return includedExtensions.contains(extension); +})).map(HoodieInstant::new); + +if (excludeIntermediateStates) { + // Remove intermediate states for each (ts,action) pair + instantStream = dedupeInstants(instantStream); +} +return instantStream.sorted().collect(Collectors.toList()); + } + + public static Stream dedupeInstants(Stream instantStream) { Review comment: I moved this logic to TableLayout. Let me know if interface needs more changes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354232466 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMetadataVersion.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import com.google.common.base.Preconditions; +import java.io.Serializable; +import java.util.Objects; + +/** + * Metadata Layout Version. Add new version when timeline format changes Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354232343 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java ## @@ -108,16 +110,25 @@ private void close() { */ public boolean archiveIfRequired(final JavaSparkContext jsc) throws IOException { try { - List instantsToArchive = getInstantsToArchive(jsc).collect(Collectors.toList()); + List instantsToDelete = getInstantsToArchive(jsc).collect(Collectors.toList()); + List instantsToArchive = instantsToDelete.stream() + .filter(HoodieInstant::isCompleted).collect(Collectors.toList()); + + Preconditions.checkArgument(instantsToArchive.isEmpty() == instantsToDelete.isEmpty(), Review comment: Made code changes to allow pending instants also be archived. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354231831 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -342,60 +365,94 @@ public HoodieInstant transitionCleanInflightToComplete(HoodieInstant inflightIns * Transition Clean State from requested to inflight * * @param requestedInstant requested instant + * @param data Optional data to be stored * @return commit instant */ - public HoodieInstant transitionCleanRequestedToInflight(HoodieInstant requestedInstant) { + public HoodieInstant transitionCleanRequestedToInflight(HoodieInstant requestedInstant, Option data) { Preconditions.checkArgument(requestedInstant.getAction().equals(HoodieTimeline.CLEAN_ACTION)); Preconditions.checkArgument(requestedInstant.isRequested()); HoodieInstant inflight = new HoodieInstant(State.INFLIGHT, CLEAN_ACTION, requestedInstant.getTimestamp()); -transitionState(requestedInstant, inflight, Option.empty()); +transitionState(requestedInstant, inflight, data); return inflight; } private void transitionState(HoodieInstant fromInstant, HoodieInstant toInstant, Option data) { Preconditions.checkArgument(fromInstant.getTimestamp().equals(toInstant.getTimestamp())); -Path commitFilePath = new Path(metaClient.getMetaPath(), toInstant.getFileName()); try { - // Re-create the .inflight file by opening a new file and write the commit metadata in - Path inflightCommitFile = new Path(metaClient.getMetaPath(), fromInstant.getFileName()); - createFileInMetaPath(fromInstant.getFileName(), data); - boolean success = metaClient.getFs().rename(inflightCommitFile, commitFilePath); - if (!success) { -throw new HoodieIOException("Could not rename " + inflightCommitFile + " to " + commitFilePath); + if (metaClient.getMetadataVersion().isNullVersion()) { +// Re-create the .inflight file by opening a new file and write the commit metadata in +createFileInMetaPath(fromInstant.getFileName(), data, false); +Path fromInstantPath = new Path(metaClient.getMetaPath(), fromInstant.getFileName()); +Path toInstantPath = new Path(metaClient.getMetaPath(), toInstant.getFileName()); +boolean success = metaClient.getFs().rename(fromInstantPath, toInstantPath); +if (!success) { + throw new HoodieIOException("Could not rename " + fromInstantPath + " to " + toInstantPath); +} + } else { +// Ensures old state exists in timeline +System.out.println("Checking for file exists ?" + new Path(metaClient.getMetaPath(), +fromInstant.getFileName())); +Preconditions.checkArgument(metaClient.getFs().exists(new Path(metaClient.getMetaPath(), +fromInstant.getFileName(; +// Use Write Once to create Target File +writeFileOnceInPath(new Path(metaClient.getMetaPath(), toInstant.getFileName()), data); +System.out.println("Create new file for toInstant ?" + new Path(metaClient.getMetaPath(), toInstant.getFileName())); } } catch (IOException e) { throw new HoodieIOException("Could not complete " + fromInstant, e); } } - private void revertStateTransition(HoodieInstant curr, HoodieInstant revert) { - Preconditions.checkArgument(curr.getTimestamp().equals(revert.getTimestamp())); -Path revertFilePath = new Path(metaClient.getMetaPath(), revert.getFileName()); + private void revertCompleteToInflight(HoodieInstant completed, HoodieInstant inflight) { + Preconditions.checkArgument(completed.getTimestamp().equals(inflight.getTimestamp())); +Path inFlightCommitFilePath = new Path(metaClient.getMetaPath(), inflight.getFileName()); +Path commitFilePath = new Path(metaClient.getMetaPath(), completed.getFileName()); try { - if (!metaClient.getFs().exists(revertFilePath)) { -Path currFilePath = new Path(metaClient.getMetaPath(), curr.getFileName()); -boolean success = metaClient.getFs().rename(currFilePath, revertFilePath); -if (!success) { - throw new HoodieIOException("Could not rename " + currFilePath + " to " + revertFilePath); + if (metaClient.getMetadataVersion().isNullVersion()) { +if (!metaClient.getFs().exists(inFlightCommitFilePath)) { + boolean success = metaClient.getFs().rename(commitFilePath, inFlightCommitFilePath); + if (!success) { +throw new HoodieIOException( +"Could not rename " + commitFilePath + " to " + inFlightCommitFilePath); + } } -log.info("Renamed " + currFilePath + " to " + revertFilePath); + } else { +Path requestedInstantFilePath = new
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354230170 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -433,6 +494,32 @@ private void createFileInPath(Path fullPath, Option content) { } } + /** + * Creates a new file in timeline with overwrite set to false. This ensures + * files are created only once and never rewritten + * @param fullPath File Path + * @param content Content to be stored + */ + private void writeFileOnceInPath(Path fullPath, Option content) { Review comment: @vinothchandar : Currently, we employ overwrite semantics for inflight commit/delta-commit files. We first create an empty inflight file and then overwrite it later with workload profile data. I can try cleaning that up in hoodie write client. Let me know your thoughts ? @n3nash : Renamed the method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354216985 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieActiveTimeline.java ## @@ -342,60 +365,94 @@ public HoodieInstant transitionCleanInflightToComplete(HoodieInstant inflightIns * Transition Clean State from requested to inflight * * @param requestedInstant requested instant + * @param data Optional data to be stored * @return commit instant */ - public HoodieInstant transitionCleanRequestedToInflight(HoodieInstant requestedInstant) { + public HoodieInstant transitionCleanRequestedToInflight(HoodieInstant requestedInstant, Option data) { Preconditions.checkArgument(requestedInstant.getAction().equals(HoodieTimeline.CLEAN_ACTION)); Preconditions.checkArgument(requestedInstant.isRequested()); HoodieInstant inflight = new HoodieInstant(State.INFLIGHT, CLEAN_ACTION, requestedInstant.getTimestamp()); -transitionState(requestedInstant, inflight, Option.empty()); +transitionState(requestedInstant, inflight, data); return inflight; } private void transitionState(HoodieInstant fromInstant, HoodieInstant toInstant, Option data) { Preconditions.checkArgument(fromInstant.getTimestamp().equals(toInstant.getTimestamp())); -Path commitFilePath = new Path(metaClient.getMetaPath(), toInstant.getFileName()); try { - // Re-create the .inflight file by opening a new file and write the commit metadata in - Path inflightCommitFile = new Path(metaClient.getMetaPath(), fromInstant.getFileName()); - createFileInMetaPath(fromInstant.getFileName(), data); - boolean success = metaClient.getFs().rename(inflightCommitFile, commitFilePath); - if (!success) { -throw new HoodieIOException("Could not rename " + inflightCommitFile + " to " + commitFilePath); + if (metaClient.getMetadataVersion().isNullVersion()) { +// Re-create the .inflight file by opening a new file and write the commit metadata in +createFileInMetaPath(fromInstant.getFileName(), data, false); +Path fromInstantPath = new Path(metaClient.getMetaPath(), fromInstant.getFileName()); +Path toInstantPath = new Path(metaClient.getMetaPath(), toInstant.getFileName()); +boolean success = metaClient.getFs().rename(fromInstantPath, toInstantPath); +if (!success) { + throw new HoodieIOException("Could not rename " + fromInstantPath + " to " + toInstantPath); +} + } else { +// Ensures old state exists in timeline +System.out.println("Checking for file exists ?" + new Path(metaClient.getMetaPath(), Review comment: Thanks. Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354216781 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableMetaClient.java ## @@ -420,17 +427,50 @@ public String getCommitActionType() { * @return List of Hoodie Instants generated * @throws IOException in case of failure */ - public static List scanHoodieInstantsFromFileSystem(FileSystem fs, Path metaPath, - Set includedExtensions) throws IOException { -return Arrays.stream(HoodieTableMetaClient.scanFiles(fs, metaPath, path -> { - // Include only the meta files with extensions that needs to be included - String extension = FSUtils.getFileExtension(path.getName()); - return includedExtensions.contains(extension); -})).sorted(Comparator.comparing( -// Sort the meta-data by the instant time (first part of the file name) -fileStatus -> FSUtils.getInstantTime(fileStatus.getPath().getName( -// create HoodieInstantMarkers from FileStatus, which extracts properties -.map(HoodieInstant::new).collect(Collectors.toList()); + public static List scanHoodieInstantsFromFileSystem( + FileSystem fs, Path metaPath, Set includedExtensions) throws IOException { +return scanHoodieInstantsFromFileSystem(fs, metaPath, includedExtensions, true); + } + + /** + * Helper method to scan all hoodie-instant metafiles and construct HoodieInstant objects + * + * @param fs FileSystem + * @param metaPath Meta Path where hoodie instants are present + * @param includedExtensions Included hoodie extensions + * @param excludeIntermediateStates If there are multiple states for the same action instant, + * only include the highest state + * @return List of Hoodie Instants generated + * @throws IOException in case of failure + */ + public static List scanHoodieInstantsFromFileSystem( + FileSystem fs, Path metaPath, Set includedExtensions, boolean excludeIntermediateStates) + throws IOException { +Stream instantStream = Arrays.stream( +HoodieTableMetaClient +.scanFiles(fs, metaPath, path -> { + // Include only the meta files with extensions that needs to be included + String extension = FSUtils.getFileExtension(path.getName()); + return includedExtensions.contains(extension); +})).map(HoodieInstant::new); + +if (excludeIntermediateStates) { + // Remove intermediate states for each (ts,action) pair + instantStream = dedupeInstants(instantStream); +} +return instantStream.sorted().collect(Collectors.toList()); + } + + public static Stream dedupeInstants(Stream instantStream) { +return instantStream.collect(Collectors.groupingBy(x -> Pair.of(x.getTimestamp(), +x.getAction().equals(HoodieTimeline.COMPACTION_ACTION) ? HoodieTimeline.COMMIT_ACTION : x.getAction( Review comment: Refactored to reuse this logic. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r350891309 ## File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java ## @@ -731,6 +739,12 @@ public boolean rollbackToSavepoint(String savepointTime) { * file, */ public boolean rollback(final String commitTime) throws HoodieRollbackException { +try { Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r351017008 ## File path: hudi-common/src/main/java/org/apache/hudi/common/model/HoodieMetadataVersion.java ## @@ -0,0 +1,77 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hudi.common.model; + +import com.google.common.base.Preconditions; +import java.io.Serializable; +import java.util.Objects; + +/** + * Metadata Layout Version. Add new version when timeline format changes + */ +public class HoodieMetadataVersion implements Serializable, Comparable { + + public static Integer VERSION_0 = 0; // pre 0.5.1 version format + public static Integer VERSION_1 = 1; // current version with no renames + + public static Integer CURR_VERSION = VERSION_1; + + private Integer version; + + public HoodieMetadataVersion(Integer version) { +Preconditions.checkArgument(version <= CURR_VERSION); +Preconditions.checkArgument(version >= VERSION_0); +this.version = version; + } + + /** + * For Pre 0.5.1 release, there was no metadata version. This method is used to detect + * this case. + * @return + */ + public boolean isNullVersion() { +return Objects.equals(version, VERSION_0); + } + + public Integer getVersion() { +return version; + } + + @Override + public boolean equals(Object o) { +if (this == o) { Review comment: If all the callers do use Objects.equals(), then we should be ok. But, then there are still equals calls in Collections (e:g HashMap) where they use obj1.equals(obj2). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354204782 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTimeline.java ## @@ -250,7 +253,10 @@ static HoodieInstant getCompactionInflightInstant(final String timestamp) { return new HoodieInstant(State.INFLIGHT, COMPACTION_ACTION, timestamp); } - static HoodieInstant getInflightInstant(final HoodieInstant instant) { + static HoodieInstant getInflightInstant(final HoodieInstant instant, final HoodieTableType tableType) { Review comment: Thought about keeping in metaclient but the current structure is such that all file-name generation depending on state is present in HoodieTimeline. Metaclient APIs are currently one level higher dealing with timelines, table and others. So, leaving it this way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r350897448 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java ## @@ -192,6 +210,7 @@ private boolean deleteArchivedInstants(List archivedInstants) thr return i.isCompleted() && (i.getAction().equals(HoodieTimeline.COMMIT_ACTION) || (i.getAction().equals(HoodieTimeline.DELTA_COMMIT_ACTION))); }).max(Comparator.comparing(HoodieInstant::getTimestamp))); +log.info("Last Committed Instant =" + latestCommitted); Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r351014321 ## File path: hudi-client/src/test/java/org/apache/hudi/TestClientRollback.java ## @@ -217,10 +217,12 @@ public void testRollbackCommit() throws Exception { // simulate partial failure, where .inflight was not deleted, but data files were. HoodieTestUtils.createInflightCommitFiles(basePath, commitTime3); + Thread.sleep(1000); Review comment: Modified rollback() code to use the non-conflicting creatNewCommitTime API and also removed these lines This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354196674 ## File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java ## @@ -345,6 +346,9 @@ public static SparkConf registerClasses(SparkConf conf) { final List fileIDPrefixes = IntStream.range(0, parallelism).mapToObj(i -> FSUtils.createNewFileIdPfx()).collect(Collectors.toList()); +table.getActiveTimeline().transitionRequestedToInflight(new HoodieInstant(State.REQUESTED, Review comment: Agree. This would make state transitions more manageable. Filed a jira to track this as a follow-up : https://jira.apache.org/jira/browse/HUDI-383 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r351022082 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieInstant.java ## @@ -30,7 +34,23 @@ * * @see HoodieTimeline */ -public class HoodieInstant implements Serializable { +public class HoodieInstant implements Serializable, Comparable { + + /** + * A COMPACTION action eventually becomes COMMIT when completed. So, when grouping instants + * for state transitions, this needs to be taken into account + */ + private static final Set comparableActions = Arrays.stream(new String[] { HoodieTimeline.COMMIT_ACTION, + HoodieTimeline.COMPACTION_ACTION}).collect(Collectors.toSet()); + + public static final Comparator comparator = Comparator.comparing(HoodieInstant::getTimestamp) Review comment: Enums have a natural ordering defined by their ordinal. In our case, State is an enum and Requested < Inflight < Completed This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354200362 ## File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java ## @@ -896,17 +919,22 @@ private void finishRestore(final Timer.Context context, Map stats = doRollbackAndGetStats(commitToRollback); - Map> statToCommit = new HashMap<>(); - finishRollback(context, stats, Arrays.asList(commitToRollback), startRollbackTime); + // Create a Hoodie table which encapsulated the commits and files visible + HoodieTable table = HoodieTable.getHoodieTable( + createMetaClient(true), config, jsc); + List rollbackInstants = table.getActiveTimeline().getCommitsTimeline().getInstants() + .filter(instant -> HoodieActiveTimeline.EQUAL.test(instant.getTimestamp(), commitToRollback)) + .collect(Collectors.toList()); + if (!rollbackInstants.isEmpty()) { +Preconditions.checkArgument(rollbackInstants.size() == 1); +List stats = doRollbackAndGetStats(rollbackInstants.get(0)); Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r350889000 ## File path: hudi-client/src/main/java/org/apache/hudi/HoodieCleanClient.java ## @@ -92,7 +92,7 @@ protected HoodieCleanMetadata clean(String startCleanTime) throws HoodieIOExcept if ((cleanerPlan.getFilesToBeDeletedPerPartition() != null) && !cleanerPlan.getFilesToBeDeletedPerPartition().isEmpty()) { final HoodieTable hoodieTable = HoodieTable.getHoodieTable(createMetaClient(true), config, jsc); -return runClean(hoodieTable, startCleanTime); +return runClean(hoodieTable, HoodieTimeline.getCleanRequestedInstant(startCleanTime), cleanerPlan); Review comment: No, it only creates a Requested instant. The loop earlier in the same function will take care of inflight clean instants This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r351019450 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -125,6 +133,13 @@ public HoodieTableType getTableType() { return DEFAULT_TABLE_TYPE; } + public HoodieMetadataVersion getTableVersion() { +if (props.containsKey(HOODIE_TABLE_VERSION_PROP_NAME)) { Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r351017850 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -52,12 +53,15 @@ public static final String HOODIE_TABLE_TYPE_PROP_NAME = "hoodie.table.type"; public static final String HOODIE_RO_FILE_FORMAT_PROP_NAME = "hoodie.table.ro.file.format"; public static final String HOODIE_RT_FILE_FORMAT_PROP_NAME = "hoodie.table.rt.file.format"; + public static final String HOODIE_TABLE_VERSION_PROP_NAME = "hoodie.table.version"; public static final String HOODIE_PAYLOAD_CLASS_PROP_NAME = "hoodie.compaction.payload.class"; public static final String HOODIE_ARCHIVELOG_FOLDER_PROP_NAME = "hoodie.archivelog.folder"; public static final HoodieTableType DEFAULT_TABLE_TYPE = HoodieTableType.COPY_ON_WRITE; public static final HoodieFileFormat DEFAULT_RO_FILE_FORMAT = HoodieFileFormat.PARQUET; public static final HoodieFileFormat DEFAULT_RT_FILE_FORMAT = HoodieFileFormat.HOODIE_LOG; + public static final Integer DEFAULT_TABLE_VERSION = HoodieMetadataVersion.VERSION_0; Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r351017736 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/HoodieTableConfig.java ## @@ -52,12 +53,15 @@ public static final String HOODIE_TABLE_TYPE_PROP_NAME = "hoodie.table.type"; public static final String HOODIE_RO_FILE_FORMAT_PROP_NAME = "hoodie.table.ro.file.format"; public static final String HOODIE_RT_FILE_FORMAT_PROP_NAME = "hoodie.table.rt.file.format"; + public static final String HOODIE_TABLE_VERSION_PROP_NAME = "hoodie.table.version"; Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r354211539 ## File path: hudi-common/src/main/java/org/apache/hudi/common/table/timeline/HoodieDefaultTimeline.java ## @@ -92,15 +91,15 @@ public HoodieTimeline filterInflightsAndRequested() { } @Override - public HoodieTimeline filterInflightsExcludingCompaction() { + public HoodieTimeline filterPendingExcludingCompaction() { return new HoodieDefaultTimeline(instants.stream().filter(instant -> { - return instant.isInflight() && (!instant.getAction().equals(HoodieTimeline.COMPACTION_ACTION)); + return (!instant.isCompleted()) && (!instant.getAction().equals(HoodieTimeline.COMPACTION_ACTION)); Review comment: Yes, that was the intention. One of the places where we are using is rolling back pending commits This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r351012476 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java ## @@ -169,7 +180,14 @@ public boolean archiveIfRequired(final JavaSparkContext jsc) throws IOException }).limit(commitTimeline.countInstants() - minCommitsToKeep)); } -return instants; +// For archiving and cleaning instants, we need to include intermediate state files if they exist +HoodieActiveTimeline rawActiveTimeline = new HoodieActiveTimeline(metaClient, false); +Map, List> groupByTsAction = rawActiveTimeline.getInstants() +.collect(Collectors.groupingBy(x -> Pair.of(x.getTimestamp(), +x.getAction().equals(HoodieTimeline.COMPACTION_ACTION) ? HoodieTimeline.COMMIT_ACTION : x.getAction(; + +return instants.flatMap(hoodieInstant -> Review comment: we would want to archive all types of actions eventually. So, keeping it this way. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r351012174 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java ## @@ -169,7 +180,14 @@ public boolean archiveIfRequired(final JavaSparkContext jsc) throws IOException }).limit(commitTimeline.countInstants() - minCommitsToKeep)); } -return instants; +// For archiving and cleaning instants, we need to include intermediate state files if they exist +HoodieActiveTimeline rawActiveTimeline = new HoodieActiveTimeline(metaClient, false); +Map, List> groupByTsAction = rawActiveTimeline.getInstants() +.collect(Collectors.groupingBy(x -> Pair.of(x.getTimestamp(), +x.getAction().equals(HoodieTimeline.COMPACTION_ACTION) ? HoodieTimeline.COMMIT_ACTION : x.getAction(; Review comment: refactored a bit to be used here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r350895583 ## File path: hudi-client/src/main/java/org/apache/hudi/io/HoodieCommitArchiveLog.java ## @@ -169,7 +180,14 @@ public boolean archiveIfRequired(final JavaSparkContext jsc) throws IOException }).limit(commitTimeline.countInstants() - minCommitsToKeep)); } -return instants; +// For archiving and cleaning instants, we need to include intermediate state files if they exist +HoodieActiveTimeline rawActiveTimeline = new HoodieActiveTimeline(metaClient, false); +Map, List> groupByTsAction = rawActiveTimeline.getInstants() +.collect(Collectors.groupingBy(x -> Pair.of(x.getTimestamp(), Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r351015157 ## File path: hudi-client/src/test/java/org/apache/hudi/index/TestHbaseIndex.java ## @@ -168,13 +169,19 @@ public void testTagLocationAndDuplicateUpdate() throws Exception { HoodieWriteConfig config = getConfig(); HBaseIndex index = new HBaseIndex(config); HoodieWriteClient writeClient = new HoodieWriteClient(jsc, config); -writeClient.startCommit(); +writeClient.startCommitWithTime(newCommitTime); metaClient = HoodieTableMetaClient.reload(metaClient); HoodieTable hoodieTable = HoodieTable.getHoodieTable(metaClient, config, jsc); JavaRDD writeStatues = writeClient.upsert(writeRecords, newCommitTime); JavaRDD javaRDD1 = index.tagLocation(writeRecords, jsc, hoodieTable); + // Duplicate upsert and ensure correctness is maintained +// We are trying to approximately imitate the case when the RRD is recomputed. For RRD creating, driver code is not Review comment: Thanks @vinothchandar Linkedin days :) @n3nash : Sure, We can discuss this f2f This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r350888183 ## File path: hudi-cli/src/main/java/org/apache/hudi/cli/commands/CompactionCommand.java ## @@ -96,14 +96,15 @@ public String compactionsAll( if (!instant.getAction().equals(HoodieTimeline.COMPACTION_ACTION)) { try { // This could be a completed compaction. Assume a compaction request file is present but skip if fails - workload = AvroUtils.deserializeCompactionPlan(activeTimeline - .getInstantAuxiliaryDetails(HoodieTimeline.getCompactionRequestedInstant(instant.getTimestamp())).get()); + workload = AvroUtils.deserializeCompactionPlan( Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [incubator-hudi] bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset
bvaradar commented on a change in pull request #1009: [HUDI-308] Avoid Renames for tracking state transitions of all actions on dataset URL: https://github.com/apache/incubator-hudi/pull/1009#discussion_r350892589 ## File path: hudi-client/src/main/java/org/apache/hudi/HoodieWriteClient.java ## @@ -824,14 +843,14 @@ private String startInstant() { "Found commits after time :" + lastCommit + ", please rollback greater commits first"); } -List inflights = - inflightCommitTimeline.getInstants().map(HoodieInstant::getTimestamp).collect(Collectors.toList()); +List inflights = inflightAndRequestedCommitTimeline.getInstants().map(HoodieInstant::getTimestamp) Review comment: Yes, For requested instant, the rollback is essentially essentially removing the instant file as no side-effect has happened This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services