[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/703 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...
Github user manishgupta88 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/703#discussion_r110108215 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/CarbonCompactionUtil.java --- @@ -351,4 +351,33 @@ private static int getDimensionDefaultCardinality(CarbonDimension dimension) { } return cardinality; } + + /** + * This method will check for any restructured block in the blocks selected for compaction + * + * @param segmentMapping + * @param dataFileMetadataSegMapping + * @param tableLastUpdatedTime + * @return + */ + public static boolean checkIfAnyRestructuredBlockExists(Map segmentMapping, + Map> dataFileMetadataSegMapping, long tableLastUpdatedTime) { +boolean restructuredBlockExists = false; +for (Map.Entry taskMap : segmentMapping.entrySet()) { + String segmentId = taskMap.getKey(); + List listMetadata = dataFileMetadataSegMapping.get(segmentId); + for (DataFileFooter dataFileFooter : listMetadata) { +// if schema modified timestamp is greater than footer stored schema timestamp, --- End diff -- yes...because the entry will be added in schema evolution entry and in case of any failure we need to revert back the schema --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/703#discussion_r110098777 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/RowResultMerger.java --- @@ -57,15 +42,9 @@ /** * This is the Merger class responsible for the merging of the segments. */ -public class RowResultMerger { +public class RowResultMerger extends AbstractResultProcessor { --- End diff -- May be you can rename the class to `RowResultMergerProcessor` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/703#discussion_r110098333 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/CompactionResultSortProcessor.java --- @@ -0,0 +1,407 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.spark.merger; + +import java.io.File; +import java.io.IOException; +import java.util.Arrays; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.encoder.Encoding; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; +import org.apache.carbondata.core.scan.result.iterator.RawResultIterator; +import org.apache.carbondata.core.scan.wrappers.ByteArrayWrapper; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.processing.model.CarbonLoadModel; +import org.apache.carbondata.processing.newflow.row.CarbonRow; +import org.apache.carbondata.processing.sortandgroupby.exception.CarbonSortKeyAndGroupByException; +import org.apache.carbondata.processing.sortandgroupby.sortdata.SortDataRows; +import org.apache.carbondata.processing.sortandgroupby.sortdata.SortIntermediateFileMerger; +import org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters; +import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel; +import org.apache.carbondata.processing.store.CarbonFactHandler; +import org.apache.carbondata.processing.store.CarbonFactHandlerFactory; +import org.apache.carbondata.processing.store.SingleThreadFinalSortFilesMerger; +import org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException; +import org.apache.carbondata.processing.util.CarbonDataProcessorUtil; + +/** + * This class will process the query result and convert the data + * into a format compatible for data load + */ +public class CompactionResultSortProcessor extends AbstractResultProcessor { + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(CompactionResultSortProcessor.class.getName()); + /** + * carbon load model that contains all the required information for load + */ + private CarbonLoadModel carbonLoadModel; + /** + * carbon table + */ + private CarbonTable carbonTable; + /** + * sortDataRows instance for sorting each row read ad writing to sort temp file + */ + private SortDataRows sortDataRows; + /** + * final merger for merge sort + */ + private SingleThreadFinalSortFilesMerger finalMerger; + /** + * data handler VO object + */ + private CarbonFactHandler dataHandler; + /** + * segment properties for getting dimension cardinality and other required information of a block + */ + private SegmentProperties segmentProperties; + /** + * compaction type to decide whether taskID need to be extracted from carbondata files + */ + private CompactionType compactionType; + /** + * boolean mapping for no dictionary columns in schema + */ + private boolean[] noDictionaryColMapping; + /** + * agg type defined for measures + */ + private char[] aggType; + /** + * segment id + */ + private String segmentId; + /** + * temp store location to be sued during data load + */ + private String tempStoreLocation; + /**
[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/703#discussion_r110097631 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/CompactionResultSortProcessor.java --- @@ -0,0 +1,407 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.spark.merger; + +import java.io.File; +import java.io.IOException; +import java.util.Arrays; +import java.util.List; + +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.encoder.Encoding; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; +import org.apache.carbondata.core.scan.result.iterator.RawResultIterator; +import org.apache.carbondata.core.scan.wrappers.ByteArrayWrapper; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.processing.model.CarbonLoadModel; +import org.apache.carbondata.processing.newflow.row.CarbonRow; +import org.apache.carbondata.processing.sortandgroupby.exception.CarbonSortKeyAndGroupByException; +import org.apache.carbondata.processing.sortandgroupby.sortdata.SortDataRows; +import org.apache.carbondata.processing.sortandgroupby.sortdata.SortIntermediateFileMerger; +import org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters; +import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel; +import org.apache.carbondata.processing.store.CarbonFactHandler; +import org.apache.carbondata.processing.store.CarbonFactHandlerFactory; +import org.apache.carbondata.processing.store.SingleThreadFinalSortFilesMerger; +import org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException; +import org.apache.carbondata.processing.util.CarbonDataProcessorUtil; + +/** + * This class will process the query result and convert the data + * into a format compatible for data load + */ +public class CompactionResultSortProcessor extends AbstractResultProcessor { + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(CompactionResultSortProcessor.class.getName()); + /** + * carbon load model that contains all the required information for load + */ + private CarbonLoadModel carbonLoadModel; + /** + * carbon table + */ + private CarbonTable carbonTable; + /** + * sortDataRows instance for sorting each row read ad writing to sort temp file + */ + private SortDataRows sortDataRows; + /** + * final merger for merge sort + */ + private SingleThreadFinalSortFilesMerger finalMerger; + /** + * data handler VO object + */ + private CarbonFactHandler dataHandler; + /** + * segment properties for getting dimension cardinality and other required information of a block + */ + private SegmentProperties segmentProperties; + /** + * compaction type to decide whether taskID need to be extracted from carbondata files + */ + private CompactionType compactionType; + /** + * boolean mapping for no dictionary columns in schema + */ + private boolean[] noDictionaryColMapping; + /** + * agg type defined for measures + */ + private char[] aggType; + /** + * segment id + */ + private String segmentId; + /** + * temp store location to be sued during data load + */ + private String tempStoreLocation; + /**
[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/703#discussion_r110096123 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/CarbonCompactionUtil.java --- @@ -351,4 +351,33 @@ private static int getDimensionDefaultCardinality(CarbonDimension dimension) { } return cardinality; } + + /** + * This method will check for any restructured block in the blocks selected for compaction + * + * @param segmentMapping + * @param dataFileMetadataSegMapping + * @param tableLastUpdatedTime + * @return + */ + public static boolean checkIfAnyRestructuredBlockExists(Map segmentMapping, + Map> dataFileMetadataSegMapping, long tableLastUpdatedTime) { +boolean restructuredBlockExists = false; +for (Map.Entry taskMap : segmentMapping.entrySet()) { + String segmentId = taskMap.getKey(); + List listMetadata = dataFileMetadataSegMapping.get(segmentId); + for (DataFileFooter dataFileFooter : listMetadata) { +// if schema modified timestamp is greater than footer stored schema timestamp, --- End diff -- even for table rename also are we updating the schema timestamp ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/703#discussion_r110093866 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/AbstractResultProcessor.java --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.merger; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.CarbonMetadata; +import org.apache.carbondata.core.metadata.CarbonTableIdentifier; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.scan.result.iterator.RawResultIterator; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.core.util.path.CarbonStorePath; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.processing.datatypes.GenericDataType; +import org.apache.carbondata.processing.model.CarbonLoadModel; +import org.apache.carbondata.processing.store.CarbonDataFileAttributes; +import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel; + +/** + * This class contains the common methods required for result processing during compaction based on + * restructure and mormal scenarios + */ +public abstract class AbstractResultProcessor { + + /** + * This method will perform the desired tasks of merging the selected slices + * + * @param resultIteratorList + * @return + */ + public abstract boolean execute(List resultIteratorList); + + /** + * This method will create a model object for carbon fact data handler + * + * @param loadModel + * @return + */ --- End diff -- Move this method to CarbonFactDataHandlerModel class --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/703#discussion_r110093494 --- Diff: integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/AbstractResultProcessor.java --- @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.merger; + +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.metadata.CarbonMetadata; +import org.apache.carbondata.core.metadata.CarbonTableIdentifier; +import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; +import org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema; +import org.apache.carbondata.core.mutate.CarbonUpdateUtil; +import org.apache.carbondata.core.scan.result.iterator.RawResultIterator; +import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.carbondata.core.util.DataTypeUtil; +import org.apache.carbondata.core.util.path.CarbonStorePath; +import org.apache.carbondata.core.util.path.CarbonTablePath; +import org.apache.carbondata.processing.datatypes.GenericDataType; +import org.apache.carbondata.processing.model.CarbonLoadModel; +import org.apache.carbondata.processing.store.CarbonDataFileAttributes; +import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel; + +/** + * This class contains the common methods required for result processing during compaction based on + * restructure and mormal scenarios --- End diff -- typo `mormal` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...
GitHub user manishgupta88 opened a pull request: https://github.com/apache/incubator-carbondata/pull/703 [CARBONDATA-780] Alter table support for compaction through sort step Alter table need to support compaction process where complete data need to be sorted again and then written to file. Currently in compaction process data is directly given to writer step where it is splitted into columns and written. But as columns are sorted from left to right, on dropping a column data will again become unorganized as dropped column data will not be considered during compaction. In these scenarios complete data need to be sorted again and then submitted to writer step. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishgupta88/incubator-carbondata compaction_restructure_support Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/703.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #703 commit b108c22024f6381385f0c394ea6ebe515a2e96b4 Author: ravikiran Date: 2017-03-15T15:07:26Z Added class to handle sorting of data for compaction scenario commit 11f80e3f22f68332ced85ae8da3a122d0a52447e Author: manishgupta88 Date: 2017-03-15T13:54:05Z Handling for compaction for restructure case. Handled to completely sort the data again if any restructured block is selected for compaction --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---