[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...

2017-04-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/703


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...

2017-04-06 Thread manishgupta88
Github user manishgupta88 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/703#discussion_r110108215
  
--- Diff: 
integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/CarbonCompactionUtil.java
 ---
@@ -351,4 +351,33 @@ private static int 
getDimensionDefaultCardinality(CarbonDimension dimension) {
 }
 return cardinality;
   }
+
+  /**
+   * This method will check for any restructured block in the blocks 
selected for compaction
+   *
+   * @param segmentMapping
+   * @param dataFileMetadataSegMapping
+   * @param tableLastUpdatedTime
+   * @return
+   */
+  public static boolean checkIfAnyRestructuredBlockExists(Map segmentMapping,
+  Map> dataFileMetadataSegMapping, long 
tableLastUpdatedTime) {
+boolean restructuredBlockExists = false;
+for (Map.Entry taskMap : 
segmentMapping.entrySet()) {
+  String segmentId = taskMap.getKey();
+  List listMetadata = 
dataFileMetadataSegMapping.get(segmentId);
+  for (DataFileFooter dataFileFooter : listMetadata) {
+// if schema modified timestamp is greater than footer stored 
schema timestamp,
--- End diff --

yes...because the entry will be added in schema evolution entry and in case 
of any failure we need to revert back the schema


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...

2017-04-06 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/703#discussion_r110098777
  
--- Diff: 
integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/RowResultMerger.java
 ---
@@ -57,15 +42,9 @@
 /**
  * This is the Merger class responsible for the merging of the segments.
  */
-public class RowResultMerger {
+public class RowResultMerger extends AbstractResultProcessor {
--- End diff --

May be you can rename the class to `RowResultMergerProcessor`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...

2017-04-06 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/703#discussion_r110098333
  
--- Diff: 
integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/CompactionResultSortProcessor.java
 ---
@@ -0,0 +1,407 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.merger;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.encoder.Encoding;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure;
+import org.apache.carbondata.core.scan.result.iterator.RawResultIterator;
+import org.apache.carbondata.core.scan.wrappers.ByteArrayWrapper;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.processing.model.CarbonLoadModel;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import 
org.apache.carbondata.processing.sortandgroupby.exception.CarbonSortKeyAndGroupByException;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortDataRows;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortIntermediateFileMerger;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters;
+import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel;
+import org.apache.carbondata.processing.store.CarbonFactHandler;
+import org.apache.carbondata.processing.store.CarbonFactHandlerFactory;
+import 
org.apache.carbondata.processing.store.SingleThreadFinalSortFilesMerger;
+import 
org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+import org.apache.carbondata.processing.util.CarbonDataProcessorUtil;
+
+/**
+ * This class will process the query result and convert the data
+ * into a format compatible for data load
+ */
+public class CompactionResultSortProcessor extends AbstractResultProcessor 
{
+
+  /**
+   * LOGGER
+   */
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(CompactionResultSortProcessor.class.getName());
+  /**
+   * carbon load model that contains all the required information for load
+   */
+  private CarbonLoadModel carbonLoadModel;
+  /**
+   * carbon table
+   */
+  private CarbonTable carbonTable;
+  /**
+   * sortDataRows instance for sorting each row read ad writing to sort 
temp file
+   */
+  private SortDataRows sortDataRows;
+  /**
+   * final merger for merge sort
+   */
+  private SingleThreadFinalSortFilesMerger finalMerger;
+  /**
+   * data handler VO object
+   */
+  private CarbonFactHandler dataHandler;
+  /**
+   * segment properties for getting dimension cardinality and other 
required information of a block
+   */
+  private SegmentProperties segmentProperties;
+  /**
+   * compaction type to decide whether taskID need to be extracted from 
carbondata files
+   */
+  private CompactionType compactionType;
+  /**
+   * boolean mapping for no dictionary columns in schema
+   */
+  private boolean[] noDictionaryColMapping;
+  /**
+   * agg type defined for measures
+   */
+  private char[] aggType;
+  /**
+   * segment id
+   */
+  private String segmentId;
+  /**
+   * temp store location to be sued during data load
+   */
+  private String tempStoreLocation;
+  /**
  

[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...

2017-04-06 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/703#discussion_r110097631
  
--- Diff: 
integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/CompactionResultSortProcessor.java
 ---
@@ -0,0 +1,407 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.spark.merger;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.encoder.Encoding;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure;
+import org.apache.carbondata.core.scan.result.iterator.RawResultIterator;
+import org.apache.carbondata.core.scan.wrappers.ByteArrayWrapper;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.processing.model.CarbonLoadModel;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import 
org.apache.carbondata.processing.sortandgroupby.exception.CarbonSortKeyAndGroupByException;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortDataRows;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortIntermediateFileMerger;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters;
+import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel;
+import org.apache.carbondata.processing.store.CarbonFactHandler;
+import org.apache.carbondata.processing.store.CarbonFactHandlerFactory;
+import 
org.apache.carbondata.processing.store.SingleThreadFinalSortFilesMerger;
+import 
org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+import org.apache.carbondata.processing.util.CarbonDataProcessorUtil;
+
+/**
+ * This class will process the query result and convert the data
+ * into a format compatible for data load
+ */
+public class CompactionResultSortProcessor extends AbstractResultProcessor 
{
+
+  /**
+   * LOGGER
+   */
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(CompactionResultSortProcessor.class.getName());
+  /**
+   * carbon load model that contains all the required information for load
+   */
+  private CarbonLoadModel carbonLoadModel;
+  /**
+   * carbon table
+   */
+  private CarbonTable carbonTable;
+  /**
+   * sortDataRows instance for sorting each row read ad writing to sort 
temp file
+   */
+  private SortDataRows sortDataRows;
+  /**
+   * final merger for merge sort
+   */
+  private SingleThreadFinalSortFilesMerger finalMerger;
+  /**
+   * data handler VO object
+   */
+  private CarbonFactHandler dataHandler;
+  /**
+   * segment properties for getting dimension cardinality and other 
required information of a block
+   */
+  private SegmentProperties segmentProperties;
+  /**
+   * compaction type to decide whether taskID need to be extracted from 
carbondata files
+   */
+  private CompactionType compactionType;
+  /**
+   * boolean mapping for no dictionary columns in schema
+   */
+  private boolean[] noDictionaryColMapping;
+  /**
+   * agg type defined for measures
+   */
+  private char[] aggType;
+  /**
+   * segment id
+   */
+  private String segmentId;
+  /**
+   * temp store location to be sued during data load
+   */
+  private String tempStoreLocation;
+  /**
  

[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...

2017-04-06 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/703#discussion_r110096123
  
--- Diff: 
integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/CarbonCompactionUtil.java
 ---
@@ -351,4 +351,33 @@ private static int 
getDimensionDefaultCardinality(CarbonDimension dimension) {
 }
 return cardinality;
   }
+
+  /**
+   * This method will check for any restructured block in the blocks 
selected for compaction
+   *
+   * @param segmentMapping
+   * @param dataFileMetadataSegMapping
+   * @param tableLastUpdatedTime
+   * @return
+   */
+  public static boolean checkIfAnyRestructuredBlockExists(Map segmentMapping,
+  Map> dataFileMetadataSegMapping, long 
tableLastUpdatedTime) {
+boolean restructuredBlockExists = false;
+for (Map.Entry taskMap : 
segmentMapping.entrySet()) {
+  String segmentId = taskMap.getKey();
+  List listMetadata = 
dataFileMetadataSegMapping.get(segmentId);
+  for (DataFileFooter dataFileFooter : listMetadata) {
+// if schema modified timestamp is greater than footer stored 
schema timestamp,
--- End diff --

even for table rename also are we updating the schema timestamp ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...

2017-04-06 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/703#discussion_r110093866
  
--- Diff: 
integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/AbstractResultProcessor.java
 ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.merger;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.CarbonMetadata;
+import org.apache.carbondata.core.metadata.CarbonTableIdentifier;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.scan.result.iterator.RawResultIterator;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.core.util.path.CarbonStorePath;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.processing.datatypes.GenericDataType;
+import org.apache.carbondata.processing.model.CarbonLoadModel;
+import org.apache.carbondata.processing.store.CarbonDataFileAttributes;
+import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel;
+
+/**
+ * This class contains the common methods required for result processing 
during compaction based on
+ * restructure and mormal scenarios
+ */
+public abstract class AbstractResultProcessor {
+
+  /**
+   * This method will perform the desired tasks of merging the selected 
slices
+   *
+   * @param resultIteratorList
+   * @return
+   */
+  public abstract boolean execute(List 
resultIteratorList);
+
+  /**
+   * This method will create a model object for carbon fact data handler
+   *
+   * @param loadModel
+   * @return
+   */
--- End diff --

Move this method to CarbonFactDataHandlerModel class


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...

2017-04-06 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/703#discussion_r110093494
  
--- Diff: 
integration/spark-common/src/main/java/org/apache/carbondata/spark/merger/AbstractResultProcessor.java
 ---
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.merger;
+
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.CarbonMetadata;
+import org.apache.carbondata.core.metadata.CarbonTableIdentifier;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil;
+import org.apache.carbondata.core.scan.result.iterator.RawResultIterator;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.core.util.path.CarbonStorePath;
+import org.apache.carbondata.core.util.path.CarbonTablePath;
+import org.apache.carbondata.processing.datatypes.GenericDataType;
+import org.apache.carbondata.processing.model.CarbonLoadModel;
+import org.apache.carbondata.processing.store.CarbonDataFileAttributes;
+import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel;
+
+/**
+ * This class contains the common methods required for result processing 
during compaction based on
+ * restructure and mormal scenarios
--- End diff --

typo `mormal`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #703: [CARBONDATA-780] Alter table support...

2017-03-27 Thread manishgupta88
GitHub user manishgupta88 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/703

[CARBONDATA-780] Alter table support for compaction through sort step

Alter table need to support compaction process where complete data need to 
be sorted again and then written to file.
Currently in compaction process data is directly given to writer step where 
it is splitted into columns and written. But as columns are sorted from left to 
right, on dropping a column data will again become unorganized as dropped 
column data will not be considered during compaction. In these scenarios 
complete data need to be sorted again and then submitted to writer step.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/incubator-carbondata 
compaction_restructure_support

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/703.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #703


commit b108c22024f6381385f0c394ea6ebe515a2e96b4
Author: ravikiran 
Date:   2017-03-15T15:07:26Z

Added class to handle sorting of data for compaction scenario

commit 11f80e3f22f68332ced85ae8da3a122d0a52447e
Author: manishgupta88 
Date:   2017-03-15T13:54:05Z

Handling for compaction for restructure case. Handled to completely sort 
the data again if any restructured block is selected for compaction




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---