[carbondata] branch master updated: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new ed7e049 [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto ed7e049 is described below commit ed7e04961c9e4cf038276b154feb9a2f3a105457 Author: ajantha-bhat AuthorDate: Thu Aug 13 22:05:15 2020 +0530 [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto Why is this PR needed? Currently, presto cannot read complex data type stores. Sometimes it gives empty results and some times exception. What changes were proposed in this PR? Supported all the 13 complex primitive types (including binary, refer the testcase added) with non-nested array and struct data type Supported complex type in Direct vector filling flow : Currently, spark integration carbondata will use row level filling for complex type instead of vector filling. But presto supports only vector reading. so need to support complex type in vector filling. Supported complex primitive vector handling in DIRECT_COMPESS, ADAPTIVE_CODEC flows Encoding of all the complex primitive type is either DIRECT_COMPESS or ADAPTIVE_CODEC, it will never use a legacy encoding. so, because of this string, varchar (with/without local dictionary), binary, date vector filling need to handle in DIRECT_COMPESS. Parent column also comes as DIRECT_COMPESS. Extracted data from parent column page here. Supported vector stack in complex column vectorInfo to store all the children vectors. Keep a list of children vector inside CarbonColumnVectorImpl.java Support ComplexStreamReader to fill presto ROW (struct) block and ARRAY block. Handle null value filling by wrapping children vector with ColumnarVectorWrapperDirect Limitations / next work: Some pending TODO 's are, Local dictionary need to handle for string / varchar columns as DIRECT_COMPRESS flow don't have that handling Can support map of all primitive types Can support multilevel nested arrays and struct Does this PR introduce any user interface change? No Is any new testcase added? Yes [Added test case for all 13 primitive type with array and struct, null values and more than one page data] This closes #3887 Co-authored-by: akkio-97 --- .../dimension/v3/DimensionChunkReaderV3.java | 5 + .../impl/LocalDictDimensionDataChunkStore.java | 8 +- .../SafeFixedLengthDimensionDataChunkStore.java| 3 +- .../SafeVariableLengthDimensionDataChunkStore.java | 4 +- .../adaptive/AdaptiveDeltaFloatingCodec.java | 45 ++- .../adaptive/AdaptiveDeltaIntegralCodec.java | 70 ++-- .../encoding/adaptive/AdaptiveFloatingCodec.java | 47 ++- .../encoding/adaptive/AdaptiveIntegralCodec.java | 56 ++-- .../encoding/compress/DirectCompressCodec.java | 227 + .../metadata/datatype/DecimalConverterFactory.java | 109 +-- .../impl/DictionaryBasedVectorResultCollector.java | 18 +- .../scan/executor/impl/AbstractQueryExecutor.java | 22 +- .../core/scan/result/BlockletScannedResult.java| 41 ++- .../scan/result/vector/CarbonColumnVector.java | 18 ++ .../core/scan/result/vector/ColumnVectorInfo.java | 25 ++ .../result/vector/impl/CarbonColumnVectorImpl.java | 54 .../ColumnarVectorWrapperDirectFactory.java| 12 +- ...ColumnarVectorWrapperDirectWithDeleteDelta.java | 6 + .../presto/CarbonColumnVectorWrapper.java | 4 + .../carbondata/presto/CarbonVectorBatch.java | 19 +- .../presto/ColumnarVectorWrapperDirect.java| 20 +- .../presto/PrestoCarbonVectorizedRecordReader.java | 24 +- .../presto/readers/ComplexTypeStreamReader.java| 196 +++ .../presto/readers/SliceStreamReader.java | 2 +- .../PrestoTestNonTransactionalTableFiles.scala | 358 - .../processing/datatypes/PrimitiveDataType.java| 12 +- 26 files changed, 1168 insertions(+), 237 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/DimensionChunkReaderV3.java b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/DimensionChunkReaderV3.java index d53c9d3..2538687 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/DimensionChunkReaderV3.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/DimensionChunkReaderV3.java @@ -257,6 +257,11 @@ public class DimensionChunkReaderV3 extends AbstractDimensionChunkReader { .decodeAndFillVector(pageData.array(), offset, pageMetadata.data_page_length, vectorInfo, nullBitSet
[carbondata] branch master updated: [CARBONDATA-3555] Make move filter related methods under DataMapFilter
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 488a547 [CARBONDATA-3555] Make move filter related methods under DataMapFilter 488a547 is described below commit 488a5470d2a172019f8c608bb0fab0b78ed14bdc Author: kunal642 AuthorDate: Wed Oct 23 16:14:38 2019 +0530 [CARBONDATA-3555] Make move filter related methods under DataMapFilter 1. This PR will make DataMapFilter as a filter holder for the filter expression and the FilterResolver objects so that 2. all the major API's can accept dataMapFilter as an argument. 3. Moved all the filter resolving methods inside DataMapFilter for ease of use 4. Fixed Datasource issue where invalid datamaps are getting pruned This closes #3419 --- .../carbondata/core/datamap/DataMapFilter.java | 110 + .../core/datamap/DataMapStoreManager.java | 15 ++- .../apache/carbondata/core/datamap/Segment.java| 5 + .../carbondata/core/datamap/TableDataMap.java | 2 +- .../indexstore/blockletindex/BlockDataMap.java | 20 +--- .../core/metadata/schema/table/CarbonTable.java| 23 - .../core/metadata/schema/table/TableInfo.java | 4 + .../scan/executor/impl/AbstractQueryExecutor.java | 38 --- .../carbondata/core/scan/model/QueryModel.java | 29 ++ .../core/scan/model/QueryModelBuilder.java | 30 +++--- dev/findbugs-exclude.xml | 8 ++ .../hadoop/api/CarbonFileInputFormat.java | 12 ++- .../carbondata/hadoop/api/CarbonInputFormat.java | 36 --- .../hadoop/api/CarbonTableInputFormat.java | 33 --- .../hadoop/stream/StreamRecordReader.java | 4 +- .../carbondata/hadoop/testutil/StoreCreator.java | 3 +- .../hadoop/util/CarbonInputFormatUtil.java | 7 +- .../hadoop/ft/CarbonTableInputFormatTest.java | 12 ++- .../carbondata/presto/CarbondataPageSource.java| 11 ++- .../carbondata/presto/impl/CarbonTableReader.java | 9 +- ...ryWithColumnMetCacheAndCacheLevelProperty.scala | 7 +- .../filterexpr/FilterProcessorTestCase.scala | 7 ++ .../filterexpr/TestImplicitFilterExpression.scala | 5 +- .../carbondata/spark/rdd/CarbonScanRDD.scala | 38 --- .../command/carbonTableSchemaCommon.scala | 5 +- .../vectorreader/VectorizedCarbonRecordReader.java | 4 +- .../execution/datasources/CarbonFileIndex.scala| 6 +- .../datasources/SparkCarbonFileFormat.scala| 12 ++- .../apache/carbondata/store/SparkCarbonStore.scala | 4 +- .../spark/sql/CarbonDatasourceHadoopRelation.scala | 7 +- .../command/management/CarbonAddLoadCommand.scala | 1 + .../strategy/CarbonLateDecodeStrategy.scala| 2 + .../merger/CarbonCompactionExecutor.java | 4 +- .../carbondata/sdk/file/CarbonReaderBuilder.java | 4 +- .../carbondata/sdk/file/CarbonSchemaReader.java| 52 -- .../apache/carbondata/store/LocalCarbonStore.java | 4 +- 36 files changed, 322 insertions(+), 251 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java index 46f37db..23805e2 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java @@ -19,6 +19,7 @@ package org.apache.carbondata.core.datamap; import java.io.IOException; import java.io.Serializable; +import java.util.ArrayList; import java.util.HashSet; import java.util.Set; @@ -29,7 +30,11 @@ import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; import org.apache.carbondata.core.scan.executor.util.RestructureUtil; import org.apache.carbondata.core.scan.expression.ColumnExpression; import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.filter.FilterExpressionProcessor; +import org.apache.carbondata.core.scan.filter.intf.FilterOptimizer; +import org.apache.carbondata.core.scan.filter.optimizer.RangeFilterOptmizer; import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.model.QueryModel; import org.apache.carbondata.core.util.ObjectSerializationUtil; /** @@ -37,7 +42,9 @@ import org.apache.carbondata.core.util.ObjectSerializationUtil; */ public class DataMapFilter implements Serializable { - private CarbonTable table; + private static final long serialVersionUID = 6276855832288220240L; + + private transient CarbonTable table; private Expression expression; @@ -45,9 +52,16 @@ public class DataMapFilter implements Serializable { private String serializedExpression; + private
[carbondata] branch master updated: [CARBONDATA-3584] Fix Select Query failure for Boolean dictionary column when Codegen is diasbled
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 10149eb [CARBONDATA-3584] Fix Select Query failure for Boolean dictionary column when Codegen is diasbled 10149eb is described below commit 10149eb9f58dce578702cc7e7266671201198412 Author: Indhumathi27 AuthorDate: Fri Nov 15 16:51:28 2019 +0530 [CARBONDATA-3584] Fix Select Query failure for Boolean dictionary column when Codegen is diasbled Problem: Select query fails for boolean dictionary column with CastException when codegen is disabled. Solution: Added Boolean case in getDataBasedOnDataType and decode Boolean in CodegenContext This closes #3463 --- .../org/apache/carbondata/core/util/DataTypeUtil.java | 6 ++ .../org/apache/spark/sql/CarbonDictionaryDecoder.scala | 16 .../booleantype/BooleanDataTypesBaseTest.scala | 17 + 3 files changed, 39 insertions(+) diff --git a/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java b/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java index 660c705..f138323 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java @@ -715,6 +715,12 @@ public final class DataTypeUtil { javaDecVal = javaDecVal.setScale(dimension.getColumnSchema().getScale()); } return getDataTypeConverter().convertFromBigDecimalToDecimal(javaDecVal); + } else if (dataType == DataTypes.BOOLEAN) { +String data8 = new String(dataInBytes, CarbonCommonConstants.DEFAULT_CHARSET_CLASS); +if (data8.isEmpty()) { + return null; +} +return BooleanConvert.parseBoolean(data8); } else { return getDataTypeConverter().convertFromByteToUTF8String(dataInBytes); } diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala index 3b20c2f..9b9d7a6 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala @@ -231,6 +231,17 @@ case class CarbonDictionaryDecoder( | tuple.setValue(UTF8String.fromBytes((byte[])tuple.getValue())); | return tuple; |}""".stripMargin) + val decodeBool = ctx.freshName("deDictBool") + ctx.addNewFunction(decodeStr, +s""" + |private org.apache.spark.sql.DictTuple $decodeBool( + | org.apache.spark.sql.ForwardDictionaryWrapper dict, int surg) + | throws java.io.IOException { + | org.apache.spark.sql.DictTuple tuple = $decodeDictionary(dict, surg); + | tuple.setValue(Boolean.parseBoolean(new String((byte[])tuple.getValue(), + | org.apache.carbondata.core.constants.CarbonCommonConstants.DEFAULT_CHARSET_CLASS))); + | return tuple; + |}""".stripMargin) val resultVars = exprs.zipWithIndex.map { case (expr, index) => @@ -271,6 +282,11 @@ case class CarbonDictionaryDecoder( |org.apache.spark.sql.DictTuple $value = $decodeLong($dictRef, ${ ev.value }); """.stripMargin ExprCode(code, s"$value.getIsNull()", s"((Long)$value.getValue())") + case CarbonDataTypes.BOOLEAN => code += +s""" + |org.apache.spark.sql.DictTuple $value = $decodeBool($dictRef, ${ ev.value }); + """.stripMargin +ExprCode(code, s"$value.getIsNull()", s"((Boolean)$value.getValue())") case _ => code += s""" |org.apache.spark.sql.DictTuple $value = $decodeStr($dictRef, ${ev.value}); diff --git a/integration/spark2/src/test/scala/org/apache/carbondata/spark/testsuite/booleantype/BooleanDataTypesBaseTest.scala b/integration/spark2/src/test/scala/org/apache/carbondata/spark/testsuite/booleantype/BooleanDataTypesBaseTest.scala index c0087a8..82894d4 100644 --- a/integration/spark2/src/test/scala/org/apache/carbondata/spark/testsuite/booleantype/BooleanDataTypesBaseTest.scala +++ b/integration/spark2/src/test/scala/org/apache/carbondata/spark/testsuite/booleantype/BooleanDataTypesBaseTest.scala @@ -154,4 +154,21 @@ class BooleanDataTypesBaseTest extends QueryTest with BeforeAndAfterEach with Be sql("delete from carbon_table where cc=true") checkAnswer(sql("select COUNT(
svn commit: r36459 - /release/carbondata/1.6.1/
Author: kumarvishal09 Date: Thu Oct 24 09:47:09 2019 New Revision: 36459 Log: Upload 1.6.1 release Added: release/carbondata/1.6.1/ release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar (with props) release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.asc release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.sha512 release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar (with props) release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.asc release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.sha512 release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.3.2-hadoop2.7.2.jar (with props) release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.3.2-hadoop2.7.2.jar.asc release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.3.2-hadoop2.7.2.jar.sha512 release/carbondata/1.6.1/apache-carbondata-1.6.1-source-release.zip (with props) release/carbondata/1.6.1/apache-carbondata-1.6.1-source-release.zip.asc release/carbondata/1.6.1/apache-carbondata-1.6.1-source-release.zip.sha512 Added: release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar == Binary file - no diff available. Propchange: release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar -- svn:mime-type = application/octet-stream Added: release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.asc == --- release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.asc (added) +++ release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.asc Thu Oct 24 09:47:09 2019 @@ -0,0 +1,11 @@ +-BEGIN PGP SIGNATURE- + +iQEzBAEBCAAdFiEEsZE8naWI0MngB++fuw0pZv1r+vAFAl2vKLEACgkQuw0pZv1r ++vA60ggAlOIODNcQfQK2fIP3BjQx7PBr1xljZ9D2W9zgY1O75r4N1mQ6shjimC4S +xLo+MVNOK2eT69lhNALo5a7ZgXjNS8oLNce2lvS7gaast7XT6SwwpGBK45mPQo32 +nuoB8C6MXTejSOliut948WTLNrF4WJ6VRCXunDmwHVkGKjb3qife1uRQhNiBd9yI +OqmdfgyPbRy0r9PVNGj5VJ5iEZT+QYzNs85MGgKeQ+dTlnhouCYs3NbEasybmOlX +hR4QB3cregt3rINV2hW5T2bszYe2Td79XVY57UkLs2X1/kCrnkxYhC4zUb3aJA93 +5LF/FBL4GaDNVFXamHXPRGHmEsSO0w== +=VYhJ +-END PGP SIGNATURE- Added: release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.sha512 == --- release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.sha512 (added) +++ release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.sha512 Thu Oct 24 09:47:09 2019 @@ -0,0 +1 @@ +d5a30bf1ff13e8f4381fb78df7f1b3a6b660643f4922cf3318c445462c0ba3a48db0d7000acf0a4aa5129ac5c686299213182f6a51dc56a16b0585c8823e23aa apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar Added: release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar == Binary file - no diff available. Propchange: release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar -- svn:mime-type = application/octet-stream Added: release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.asc == --- release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.asc (added) +++ release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.asc Thu Oct 24 09:47:09 2019 @@ -0,0 +1,11 @@ +-BEGIN PGP SIGNATURE- + +iQEzBAEBCAAdFiEEsZE8naWI0MngB++fuw0pZv1r+vAFAl2vKMQACgkQuw0pZv1r ++vBPJwgAi8K3iVqMayaVabHrMDqnbARLEv6sEmXs20YvNJDGwUIXAyCD8KfClvwq +1v9tBezdjLy7jtgdTXyW6wvs8aCKnXunh/xmyJ8fESgIzbDfTaX2va6NW2latLTP +SgrYcf1GDc2+/hv9Po5x0+yZNWhzDZniuSiQSMGIUWXNRUzxPk2scx7Ak/M+JUDv +MqUfFVVe2Ec9X7HCeBQO25Ar6DX7d2vWcTBps2GvVMmNA273ZLMTIrvgUtmXn8Bi +O3nTt6z9K6NcZZOpl8u6RBcWXLzazb15LQ7IER8g/UtOi2+hxwjli8pek8aHoqmd ++rv8aIKK0yWJ0deRWJ65T8waMIECuA== +=q/zn +-END PGP SIGNATURE- Added: release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.sha512 == --- release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.sha512 (added) +++ release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.sha512 Thu Oct 24 09:47:09 2019 @@ -0,0 +1
[carbondata] branch master updated: [CARBONDATA-3454] optimized index server output for count(*)
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 41ac71a [CARBONDATA-3454] optimized index server output for count(*) 41ac71a is described below commit 41ac71a7ef96a6725ee9b6a8f26bf4836bd535f9 Author: kunal642 AuthorDate: Thu Jun 27 14:32:11 2019 +0530 [CARBONDATA-3454] optimized index server output for count(*) Optimised the output for count(*) queries so that only a long is send back to the driver to reduce the network transfer cost for index server This closes #3308 --- .../apache/carbondata/core/datamap/DataMapJob.java | 2 + .../carbondata/core/datamap/DataMapUtil.java | 13 ++- .../core/datamap/DistributableDataMapFormat.java | 34 +-- .../core/indexstore/ExtendedBlocklet.java | 68 - .../core/indexstore/ExtendedBlockletWrapper.java | 27 +++-- .../ExtendedBlockletWrapperContainer.java | 19 ++-- .../carbondata/hadoop/api/CarbonInputFormat.java | 52 -- .../hadoop/api/CarbonTableInputFormat.java | 22 ++-- .../carbondata/indexserver/DataMapJobs.scala | 15 ++- .../indexserver/DistributedCountRDD.scala | 111 + .../indexserver/DistributedPruneRDD.scala | 29 ++ .../indexserver/DistributedRDDUtils.scala | 13 +++ .../carbondata/indexserver/IndexServer.scala | 19 13 files changed, 319 insertions(+), 105 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java index 9eafe7c..326282d 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java @@ -35,4 +35,6 @@ public interface DataMapJob extends Serializable { List execute(DistributableDataMapFormat dataMapFormat); + Long executeCountJob(DistributableDataMapFormat dataMapFormat); + } diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java index dd9debc..bca7409 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java @@ -230,7 +230,7 @@ public class DataMapUtil { List validSegments, List invalidSegments, DataMapLevel level, List segmentsToBeRefreshed) throws IOException { return executeDataMapJob(carbonTable, resolver, dataMapJob, partitionsToPrune, validSegments, -invalidSegments, level, false, segmentsToBeRefreshed); +invalidSegments, level, false, segmentsToBeRefreshed, false); } /** @@ -241,7 +241,8 @@ public class DataMapUtil { public static List executeDataMapJob(CarbonTable carbonTable, FilterResolverIntf resolver, DataMapJob dataMapJob, List partitionsToPrune, List validSegments, List invalidSegments, DataMapLevel level, - Boolean isFallbackJob, List segmentsToBeRefreshed) throws IOException { + Boolean isFallbackJob, List segmentsToBeRefreshed, boolean isCountJob) + throws IOException { List invalidSegmentNo = new ArrayList<>(); for (Segment segment : invalidSegments) { invalidSegmentNo.add(segment.getSegmentNo()); @@ -250,9 +251,11 @@ public class DataMapUtil { DistributableDataMapFormat dataMapFormat = new DistributableDataMapFormat(carbonTable, resolver, validSegments, invalidSegmentNo, partitionsToPrune, false, level, isFallbackJob); -List prunedBlocklets = dataMapJob.execute(dataMapFormat); -// Apply expression on the blocklets. -return prunedBlocklets; +if (isCountJob) { + dataMapFormat.setCountStarJob(); + dataMapFormat.setIsWriteToFile(false); +} +return dataMapJob.execute(dataMapFormat); } public static SegmentStatusManager.ValidAndInvalidSegmentsInfo getValidAndInvalidSegments( diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java b/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java index 8426fcb..b430c5d 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java @@ -28,7 +28,6 @@ import java.util.UUID; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.constants.CarbonCommonConstants; -import org.apache.carbondata.core.datamap.dev.DataMap; import org.apache.carbondata.core.datamap.dev.expr.DataMapDistributableWrapper; import org.apache.carbondata.core.datastore.impl.FileFactory;
[carbondata] branch master updated: [CARBONDATA-3515] Limit local dictionary size to 16MB and allow configuration.
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new da525ec [CARBONDATA-3515] Limit local dictionary size to 16MB and allow configuration. da525ec is described below commit da525ece20f6606f8b2113ca32b7acb82f0698fd Author: ajantha-bhat AuthorDate: Tue Sep 10 10:48:26 2019 +0530 [CARBONDATA-3515] Limit local dictionary size to 16MB and allow configuration. problem: currently local dictionary max size is 2GB, because of this, for varchar columns or long string columns, local dictionary can be of 2GB size. so, as local dictionary is stored in blocklet. blocklet size will exceed 2 GB, even though configured maximum blocklet size is 64MB. some places inter overflow happens during casting. solution: Limit local dictionary size to 16MB and allow configuration. default size is 4MB This closes #3380 --- .../core/constants/CarbonCommonConstants.java | 11 ++ .../dictionaryholder/MapBasedDictionaryStore.java | 16 ++-- .../carbondata/core/util/CarbonProperties.java | 43 ++ docs/configuration-parameters.md | 1 + 4 files changed, 68 insertions(+), 3 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 67fa13f..ac77582 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1209,6 +1209,17 @@ public final class CarbonCommonConstants { public static final String CARBON_ENABLE_RANGE_COMPACTION_DEFAULT = "true"; + @CarbonProperty + /** + * size based threshold for local dictionary in mb. + */ + public static final String CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB = + "carbon.local.dictionary.size.threshold.inmb"; + + public static final int CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB_DEFAULT = 4; + + public static final int CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB_MAX = 16; + // // Query parameter start here // diff --git a/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java b/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java index 7b8617a..0a50451 100644 --- a/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java +++ b/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java @@ -20,7 +20,9 @@ import java.util.Map; import java.util.concurrent.ConcurrentHashMap; import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper; +import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.localdictionary.exception.DictionaryThresholdReachedException; +import org.apache.carbondata.core.util.CarbonProperties; /** * Map based dictionary holder class, it will use map to hold @@ -51,6 +53,11 @@ public class MapBasedDictionaryStore implements DictionaryStore { private int dictionaryThreshold; /** + * dictionary threshold size in bytes + */ + private long dictionarySizeThresholdInBytes; + + /** * for checking threshold is reached or not */ private boolean isThresholdReached; @@ -62,6 +69,8 @@ public class MapBasedDictionaryStore implements DictionaryStore { public MapBasedDictionaryStore(int dictionaryThreshold) { this.dictionaryThreshold = dictionaryThreshold; +this.dictionarySizeThresholdInBytes = Integer.parseInt(CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB)) << 20; this.dictionary = new ConcurrentHashMap<>(); this.referenceDictionaryArray = new DictionaryByteArrayWrapper[dictionaryThreshold]; } @@ -93,7 +102,7 @@ public class MapBasedDictionaryStore implements DictionaryStore { value = ++lastAssignValue; currentSize += data.length; // if new value is greater than threshold - if (value > dictionaryThreshold || currentSize >= Integer.MAX_VALUE) { + if (value > dictionaryThreshold || currentSize > dictionarySizeThresholdInBytes) { // set the threshold boolean to true isThresholdReached = true; // throw exception @@ -111,9 +120,10 @@ public class MapBasedDictionaryStore imp
[carbondata] branch master updated: [CARBONDATA-3506]Fix alter table failures on parition table with hive.metastore.disallow.incompatible.col.type.changes as true
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 26f2c77 [CARBONDATA-3506]Fix alter table failures on parition table with hive.metastore.disallow.incompatible.col.type.changes as true 26f2c77 is described below commit 26f2c778e5b8c10b2249862877250afdd0062a41 Author: akashrn5 AuthorDate: Wed Aug 28 12:05:13 2019 +0530 [CARBONDATA-3506]Fix alter table failures on parition table with hive.metastore.disallow.incompatible.col.type.changes as true Problem: In case of spark2.2 and above and , when we call alterExternalCatalogForTableWithUpdatedSchema to update the new schema to external catalog in case of add column, spark gets the catalog table and then it itself adds the partition columns if the table is partition table for all the new data schema sent by carbon, so there will be duplicate partition columns, so validation fails in hive When the table has only two columns and one of them is partition column, then dropping non partition column is invalid because, if we allow it is like table with all columns as partition columns. So with the above property as true, drop column will fail to update the hive metastore. in spark2.2 and above if the datatype change is done on partition column, with the above property as true, it also fails, as we are not sending partition column for schema alter in hive Solution: when sending the new schema to spark to update in catalog, do not send the partition columns in case of spark2.2 and above, as spark will take care of adding parition columns to new schema sent by us. In the above scenario of drop, do not allow drop column, if after dropping the specific column, if table has only partition columns. Block the operation on datatype change on partition column on spark2.2 and above. This closes #3367 --- .../StandardPartitionTableQueryTestCase.scala | 29 + .../schema/CarbonAlterTableAddColumnCommand.scala | 20 +--- ...nAlterTableColRenameDataTypeChangeCommand.scala | 36 +++--- .../schema/CarbonAlterTableDropColumnCommand.scala | 35 + 4 files changed, 99 insertions(+), 21 deletions(-) diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala index c19c0b9..fb4b511 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala @@ -21,8 +21,10 @@ import org.apache.spark.sql.execution.strategy.CarbonDataSourceScan import org.apache.spark.sql.test.Spark2TestQueryExecutor import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.util.SparkUtil import org.scalatest.BeforeAndAfterAll +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.datastore.impl.FileFactory import org.apache.carbondata.core.util.CarbonProperties @@ -439,18 +441,32 @@ test("Creation of partition table should fail if the colname in table schema and test("validate data in partition table after dropping and adding a column") { sql("drop table if exists par") -sql("create table par(name string) partitioned by (age double) stored by " + +sql("create table par(name string, add string) partitioned by (age double) stored by " + "'carbondata' TBLPROPERTIES('cache_level'='blocklet')") -sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + -s"('header'='false')") +sql("insert into par select 'joey','NY',32 union all select 'chandler','NY',32") sql("alter table par drop columns(name)") sql("alter table par add columns(name string)") -sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + -s"('header'='false')") -checkAnswer(sql("select name from par"), Seq(Row("a"),Row("b"), Row(null), Row(null))) +sql("insert into par select 'joey','NY',32 union all select 'joey','NY',32") +checkAnswer(sql("select name from par"), Seq(Row("
[carbondata] branch master updated: [CARBONDATA-3505] Drop database cascade fix
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new f3685a5 [CARBONDATA-3505] Drop database cascade fix f3685a5 is described below commit f3685a53ec70a0987f022bc1f479658810cf3755 Author: kunal642 AuthorDate: Tue Aug 27 14:49:58 2019 +0530 [CARBONDATA-3505] Drop database cascade fix Problem: When 2 databases are created on same location and one of them is dropped then the folder is also deleted from backend. If we try to drop the 2nd database then it would try to lookup the other table, but the schema file would not exist in the backend and the drop will fail. Solution: Add a check to call CarbonDropDatabaseCommand only if the database location exists in the backend. This closes #3365 --- .../main/scala/org/apache/spark/sql/CarbonEnv.scala | 19 ++- .../command/cache/CarbonShowCacheCommand.scala| 4 ++-- .../spark/sql/execution/strategy/DDLStrategy.scala| 4 +++- .../apache/spark/sql/hive/CarbonFileMetastore.scala | 4 ++-- 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala index 1cbd156..f2a52d2 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala @@ -20,7 +20,7 @@ package org.apache.spark.sql import java.util.concurrent.ConcurrentHashMap import org.apache.spark.sql.catalyst.TableIdentifier -import org.apache.spark.sql.catalyst.analysis.NoSuchTableException +import org.apache.spark.sql.catalyst.analysis.{NoSuchDatabaseException, NoSuchTableException} import org.apache.spark.sql.catalyst.catalog.SessionCatalog import org.apache.spark.sql.events.{MergeBloomIndexEventListener, MergeIndexEventListener} import org.apache.spark.sql.execution.command.cache._ @@ -267,6 +267,23 @@ object CarbonEnv { } /** + * Returns true with the database folder exists in file system. False in all other scenarios. + */ + def databaseLocationExists(dbName: String, + sparkSession: SparkSession, ifExists: Boolean): Boolean = { +try { + FileFactory.getCarbonFile(getDatabaseLocation(dbName, sparkSession)).exists() +} catch { + case e: NoSuchDatabaseException => +if (ifExists) { + false +} else { + throw e +} +} + } + + /** * The method returns the database location * if carbon.storeLocation does point to spark.sql.warehouse.dir then returns * the database locationUri as database location else follows the old behaviour diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala index 45e811a..4b7f680 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala @@ -443,9 +443,9 @@ case class CarbonShowCacheCommand(tableIdentifier: Option[TableIdentifier], case (_, _, sum, provider) => provider.toLowerCase match { case `bloomFilterIdentifier` => -allIndexSize += sum - case _ => allDatamapSize += sum + case _ => +allIndexSize += sum } } (allIndexSize, allDatamapSize) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala index 4791687..3ef8cfa 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala @@ -37,6 +37,7 @@ import org.apache.spark.util.{CarbonReflectionUtils, DataMapUtil, FileUtils, Spa import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.datastore.impl.FileFactory import org.apache.carbondata.core.metadata.schema.table.CarbonTable import org.apache.carbondata.core.util.{CarbonProperties, DataTypeUtil, ThreadLocalSessionInfo} import org.apache.carbondata.spark.util.Util @@ -115,7 +116,8 @@ class DDLStrategy(sparkSession: SparkSession) extends SparkStrategy { .setConfigurationToCurrentThread(sparkSession.sessionState.newHadoopConf()) FileUtils.createDatabaseD
[carbondata] branch feature/DistributedIndexServer deleted (was 7f05e69)
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a change to branch feature/DistributedIndexServer in repository https://gitbox.apache.org/repos/asf/carbondata.git. was 7f05e69 [HOTFIX]fixed loading issue for legacy store The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
[carbondata] branch branch-1.6.0 deleted (was d7d70a8)
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a change to branch branch-1.6.0 in repository https://gitbox.apache.org/repos/asf/carbondata.git. was d7d70a8 [HOTFIX] Removed the hive-exec and commons dependency from hive module The revisions that were on this branch are still contained in other references; therefore, this change does not discard any commits from the repository.
[carbondata] 03/03: [HOTFIX] Removed the hive-exec and commons dependency from hive module
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 80438f75379cd3754cb31a42a372aeb36e4d61e7 Author: ravipesala AuthorDate: Fri Aug 2 11:15:05 2019 +0530 [HOTFIX] Removed the hive-exec and commons dependency from hive module Removed the hive-exec and commons dependency from hive module as spark has its own hive-exec. Because of external hive-exec dependency, some tests are failing. This closes #3347 --- integration/spark-common/pom.xml | 10 ++ 1 file changed, 10 insertions(+) diff --git a/integration/spark-common/pom.xml b/integration/spark-common/pom.xml index df683e0..a12992d 100644 --- a/integration/spark-common/pom.xml +++ b/integration/spark-common/pom.xml @@ -39,6 +39,16 @@ org.apache.carbondata carbondata-hive ${project.version} + + + org.apache.commons + * + + + org.apache.hive + hive-exec + + org.apache.carbondata
[carbondata] 01/03: [CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after alter operation
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 2ebc0413ee03645659e49b8c4d41969ee444b9aa Author: Indhumathi27 AuthorDate: Fri Jul 26 16:51:32 2019 +0530 [CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after alter operation Problem: In case of alter add, drop, rename operation, restructuredBlockExists will be true. Currently, to get RawResultIterator for a block, we check if block has ColumnDrift or not, by comparing SegmentProperties and columndrift columns. SegmentProperties will be formed based on restructuredBlockExists. if restructuredBlockExists is true, we will take current column schema to form SegmentProperties, else, we will use datafilefooter columnschema to form SegmentProperties. In the example given in CARBONDATA-3478 for both blocks, we use current column schema to form SegmentProperties, as restructuredBlockExists will be true. Hence, while iterating block 1, it throws ArrayIndexOutOfBound exception, as it uses RawResultIterator instead of ColumnDriftRawResultIterator Solution: Use schema from datafilefooter of each block to check if it has columndrift or not This closes #3337 --- .../AlterTableColumnRenameTestCase.scala | 54 ++ .../merger/CarbonCompactionExecutor.java | 9 +++- 2 files changed, 61 insertions(+), 2 deletions(-) diff --git a/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala b/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala index d927724..dd1fa0f 100644 --- a/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala +++ b/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala @@ -320,12 +320,66 @@ class AlterTableColumnRenameTestCase extends Spark2QueryTest with BeforeAndAfter } } + test("test compaction after table rename and alter set tblproerties") { +sql("DROP TABLE IF EXISTS test_rename") +sql("DROP TABLE IF EXISTS test_rename_compact") +sql( + "CREATE TABLE test_rename (empno int, empname String, designation String, doj Timestamp, " + + "workgroupcategory int, workgroupcategoryname String, deptno int, deptname String, " + + "projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int," + + "utilization int,salary int) STORED BY 'org.apache.carbondata.format'") +sql( + s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE test_rename OPTIONS + |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin) +sql("alter table test_rename rename to test_rename_compact") +sql("alter table test_rename_compact set tblproperties('sort_columns'='deptno,projectcode', 'sort_scope'='local_sort')") +sql( + s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE test_rename_compact OPTIONS + |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin) +val res1 = sql("select * from test_rename_compact") +sql("alter table test_rename_compact compact 'major'") +val res2 = sql("select * from test_rename_compact") +assert(res1.collectAsList().containsAll(res2.collectAsList())) +checkExistence(sql("show segments for table test_rename_compact"), true, "Compacted") +sql("DROP TABLE IF EXISTS test_rename") +sql("DROP TABLE IF EXISTS test_rename_compact") + } + + test("test compaction after alter set tblproerties- add and drop") { +sql("DROP TABLE IF EXISTS test_alter") +sql( + "CREATE TABLE test_alter (empno int, empname String, designation String, doj Timestamp, " + + "workgroupcategory int, workgroupcategoryname String, deptno int, deptname String, " + + "projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int," + + "utilization int,salary int) STORED BY 'org.apache.carbondata.format'") +sql( + s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE test_alter OPTIONS + |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin) +sql("alter table test_alter set tblproperties('sort_columns'='deptno,projectcode', 'sort_scope'='local_sort')") +sql("alter table test_alter drop columns(deptno)") +sql( + s"""LOAD DAT
[carbondata] branch branch-1.6 updated (917e041 -> 80438f7)
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a change to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git. from 917e041 [HOTFIX] CLI test case failed during release because of space differences new 2ebc041 [CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after alter operation new 575b711 [CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning new 80438f7 [HOTFIX] Removed the hive-exec and commons dependency from hive module The 3 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../carbondata/core/datamap/TableDataMap.java | 12 +++-- integration/spark-common/pom.xml | 10 .../AlterTableColumnRenameTestCase.scala | 54 ++ .../merger/CarbonCompactionExecutor.java | 9 +++- 4 files changed, 80 insertions(+), 5 deletions(-)
[carbondata] 02/03: [CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 575b7116e5cc0a7c25e17794a462a6ecdf4afb24 Author: ajantha-bhat AuthorDate: Thu Jul 25 18:50:19 2019 +0530 [CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning Cause : When the datamaps count is just near numOfThreadsForPruning, As code is checking '>= ', last thread may not get the datamaps for prune. Hence array out of index exception is thrown in this scenario. There is no issues with higher number of datamaps. Solution: In this scenario launch threads based on the distribution value, not on the hardcoded value This closes #3336 --- .../org/apache/carbondata/core/datamap/TableDataMap.java | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java index 33fc3b1..ecdd586 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java @@ -207,9 +207,6 @@ public final class TableDataMap extends OperationEventListener { */ int numOfThreadsForPruning = CarbonProperties.getNumOfThreadsForPruning(); -LOG.info( -"Number of threads selected for multi-thread block pruning is " + numOfThreadsForPruning -+ ". total files: " + totalFiles + ". total segments: " + segments.size()); int filesPerEachThread = totalFiles / numOfThreadsForPruning; int prev; int filesCount = 0; @@ -254,6 +251,15 @@ public final class TableDataMap extends OperationEventListener { // this should not happen throw new RuntimeException(" not all the files processed "); } +if (datamapListForEachThread.size() < numOfThreadsForPruning) { + // If the total datamaps fitted in lesser number of threads than numOfThreadsForPruning. + // Launch only that many threads where datamaps are fitted while grouping. + LOG.info("Datamaps is distributed in " + datamapListForEachThread.size() + " threads"); + numOfThreadsForPruning = datamapListForEachThread.size(); +} +LOG.info( +"Number of threads selected for multi-thread block pruning is " + numOfThreadsForPruning ++ ". total files: " + totalFiles + ". total segments: " + segments.size()); List> results = new ArrayList<>(numOfThreadsForPruning); final Map> prunedBlockletMap = new ConcurrentHashMap<>(segments.size());
[carbondata] branch branch-1.6.0 created (now d7d70a8)
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a change to branch branch-1.6.0 in repository https://gitbox.apache.org/repos/asf/carbondata.git. at d7d70a8 [HOTFIX] Removed the hive-exec and commons dependency from hive module No new revisions were added by this update.
[carbondata] branch master updated: [HOTFIX] Removed the hive-exec and commons dependency from hive module
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new d7d70a8 [HOTFIX] Removed the hive-exec and commons dependency from hive module d7d70a8 is described below commit d7d70a83d68ac1578f611e9a6a8b3af1c426d5d7 Author: ravipesala AuthorDate: Fri Aug 2 11:15:05 2019 +0530 [HOTFIX] Removed the hive-exec and commons dependency from hive module Removed the hive-exec and commons dependency from hive module as spark has its own hive-exec. Because of external hive-exec dependency, some tests are failing. This closes #3347 --- integration/spark-common/pom.xml | 10 ++ 1 file changed, 10 insertions(+) diff --git a/integration/spark-common/pom.xml b/integration/spark-common/pom.xml index df683e0..a12992d 100644 --- a/integration/spark-common/pom.xml +++ b/integration/spark-common/pom.xml @@ -39,6 +39,16 @@ org.apache.carbondata carbondata-hive ${project.version} + + + org.apache.commons + * + + + org.apache.hive + hive-exec + + org.apache.carbondata
[carbondata] branch master updated: [CARBONDATA-3449] Synchronize the initialization of listeners in case of concuurent scenarios
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new b98e183 [CARBONDATA-3449] Synchronize the initialization of listeners in case of concuurent scenarios b98e183 is described below commit b98e183f1546f577880c414b1c1649264ff2fd7d Author: manishnalla1994 AuthorDate: Sat Jun 22 11:27:41 2019 +0530 [CARBONDATA-3449] Synchronize the initialization of listeners in case of concuurent scenarios Problem: Initialization of listeners in case of concurrent scenarios is not synchronized. Solution: Changed the function to a val due to which the synchronization will be handled by scala and init will only occur once. This closes #3304 --- .../main/java/org/apache/carbondata/events/OperationListenerBus.java | 2 +- .../org/apache/spark/sql/hive/CarbonInMemorySessionState.scala | 2 +- .../org/apache/spark/sql/hive/CarbonSessionState.scala | 2 +- .../spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala | 5 +++-- .../main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala | 2 +- 5 files changed, 7 insertions(+), 6 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/events/OperationListenerBus.java b/core/src/main/java/org/apache/carbondata/events/OperationListenerBus.java index 5f9a05c..3f652e2 100644 --- a/core/src/main/java/org/apache/carbondata/events/OperationListenerBus.java +++ b/core/src/main/java/org/apache/carbondata/events/OperationListenerBus.java @@ -53,7 +53,7 @@ public class OperationListenerBus { * @param eventClass * @param operationEventListener */ - public OperationListenerBus addListener(Class eventClass, + public synchronized OperationListenerBus addListener(Class eventClass, OperationEventListener operationEventListener) { String eventType = eventClass.getName(); diff --git a/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonInMemorySessionState.scala b/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonInMemorySessionState.scala index e286fba..5dfb16d 100644 --- a/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonInMemorySessionState.scala +++ b/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonInMemorySessionState.scala @@ -146,7 +146,7 @@ class InMemorySessionCatalog( } // Initialize all listeners to the Operation bus. - CarbonEnv.initListeners() + CarbonEnv.init def getThriftTableInfo(tablePath: String): TableInfo = { val tableMetadataFile = CarbonTablePath.getSchemaFilePath(tablePath) diff --git a/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionState.scala b/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionState.scala index 08cf3cc..f991a78 100644 --- a/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionState.scala +++ b/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionState.scala @@ -83,7 +83,7 @@ class CarbonHiveSessionCatalog( } // Initialize all listeners to the Operation bus. - CarbonEnv.initListeners() + CarbonEnv.init override def lookupRelation(name: TableIdentifier): LogicalPlan = { val rtnRelation = super.lookupRelation(name) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala index 094d298..e7a6d65 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala @@ -149,9 +149,10 @@ object CarbonEnv { * Method * 1. To initialize Listeners to their respective events in the OperationListenerBus * 2. To register common listeners - * + * 3. Only initialize once for all the listeners in case of concurrent scenarios we have given + * val, as val initializes once */ - def init(sparkSession: SparkSession): Unit = { + val init = { initListeners } diff --git a/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala b/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala index 5caa4dd..26f778e 100644 --- a/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala +++ b/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala @@ -108,7 +108,7 @@ class CarbonHiveSessionCatalog( } // Initialize all listeners to the Operation bus. - CarbonEnv.init(sparkSession) + CarbonEnv.init /** * This method will invalidate carbonrelation from cache if carbon table is updated in
[carbondata] branch master updated: [CARBONDATA-3448] Fix wrong results in preaggregate query with spark adaptive execution
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 9b2ef53 [CARBONDATA-3448] Fix wrong results in preaggregate query with spark adaptive execution 9b2ef53 is described below commit 9b2ef53aef9a823f8007b7d3b042f634e7d874ca Author: ajantha-bhat AuthorDate: Fri Jun 21 10:35:06 2019 +0530 [CARBONDATA-3448] Fix wrong results in preaggregate query with spark adaptive execution problem: Wrong results in preaggregate query with spark adaptive execution Spark2TestQueryExecutor.conf.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "true") cause: For preaggreagate, segment info is set into threadLocal. when adaptive execution is called, spark is calling getInternalPartition in another thread where updated segment conf is not set. Hence it is not using the updated segments. solution: CarbonScanRdd is already having the sessionInfo, use it instead of taking session info from the current thread. This closes #3303 --- .../preaggregate/TestPreAggregateLoad.scala| 29 ++ .../carbondata/spark/rdd/CarbonScanRDD.scala | 16 +--- 2 files changed, 41 insertions(+), 4 deletions(-) diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggregateLoad.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggregateLoad.scala index 7ba8300..75d71ec 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggregateLoad.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggregateLoad.scala @@ -18,6 +18,8 @@ package org.apache.carbondata.integration.spark.testsuite.preaggregate import org.apache.spark.sql.Row +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.test.Spark2TestQueryExecutor import org.apache.spark.util.SparkUtil4Test import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach} @@ -298,6 +300,33 @@ class TestPreAggregateLoad extends SparkQueryTest with BeforeAndAfterAll with Be checkAnswer(sql("select * from maintable_preagg_sum"), Row(1, 52, "xyz")) } + test("test pregarregate with spark adaptive execution ") { +if (Spark2TestQueryExecutor.spark.version.startsWith("2.3")) { + // enable adaptive execution + Spark2TestQueryExecutor.conf.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "true") +} +sql("DROP TABLE IF EXISTS maintable") +sql( + """ +| CREATE TABLE maintable(id int, name string, city string, age int) +| STORED BY 'org.apache.carbondata.format' + """.stripMargin) +sql( + s"""create datamap preagg_sum on table maintable using 'preaggregate' as select id, sum(age) from maintable group by id,name""" +.stripMargin) +sql(s"insert into maintable values(1, 'xyz', 'bengaluru', 20)") +sql(s"insert into maintable values(1, 'xyz', 'bengaluru', 30)") + +checkAnswer(sql("select id, sum(age) from maintable group by id, name"), Row(1, 50)) +sql("drop datamap preagg_sum on table maintable") +sql("drop table maintable") +if (Spark2TestQueryExecutor.spark.version.startsWith("2.3")) { + // disable adaptive execution + Spark2TestQueryExecutor.conf.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, "false") +} + } + + test("check load and select for avg double datatype") { sql("drop table if exists maintbl ") sql("create table maintbl(year int,month int,name string,salary double) stored by 'carbondata' tblproperties('sort_scope'='Global_sort','table_blocksize'='23','sort_columns'='month,year,name')") diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala index b62a7e2..f90d279 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala @@ -654,7 +654,6 @@ class CarbonScanRDD[T: ClassTag]( CarbonInputFormat.setColumnProjection(conf, columnProjection) CarbonInputFormatUtil.setDataMapJobIfConfigured(conf) // when validate segments is disabled in thread local update it to CarbonTableInputFormat -val carbonSessionInfo = ThreadLocalSessionInfo.getCarbo
[carbondata] branch master updated: [CARBONDATA-3427] Beautify DAG by showing less text
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new ce2dde8 [CARBONDATA-3427] Beautify DAG by showing less text ce2dde8 is described below commit ce2dde84a09fb640058ad74b5257550fd370bb3a Author: manhua AuthorDate: Wed Jun 12 09:47:17 2019 +0800 [CARBONDATA-3427] Beautify DAG by showing less text beautify DAG by showing less text This closes #3278 --- .../scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala| 3 +-- .../apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala | 2 ++ 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala index 09763fd..cfb6e6e 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala @@ -195,8 +195,7 @@ case class CarbonDatasourceHadoopRelation( override def unhandledFilters(filters: Array[Filter]): Array[Filter] = new Array[Filter](0) override def toString: String = { -"CarbonDatasourceHadoopRelation [ " + "Database name :" + identifier.getDatabaseName + -", " + "Table name :" + identifier.getTableName + ", Schema :" + tableSchema + " ]" +"CarbonDatasourceHadoopRelation" } override def sizeInBytes: Long = carbonRelation.sizeInBytes diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala index 0f706af..5d238de 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala @@ -55,6 +55,7 @@ import org.apache.carbondata.spark.rdd.CarbonScanRDD */ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { val PUSHED_FILTERS = "PushedFilters" + val READ_SCHEMA = "ReadSchema" /* Spark 2.3.1 plan there can be case of multiple projections like below @@ -274,6 +275,7 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { if (pushedFilters.nonEmpty) { pairs += (PUSHED_FILTERS -> pushedFilters.mkString("[", ", ", "]")) } + pairs += (READ_SCHEMA -> projectSet.++(filterSet).toSeq.toStructType.catalogString) pairs.toMap }
[carbondata] branch master updated: [CARBONDATA-3444]Fix MV query failure when projection has cast expression with alias
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 0d32c6b [CARBONDATA-3444]Fix MV query failure when projection has cast expression with alias 0d32c6b is described below commit 0d32c6b8303f15e433fa3494e106f7ec6fa03b33 Author: akashrn5 AuthorDate: Wed Jun 19 13:38:20 2019 +0530 [CARBONDATA-3444]Fix MV query failure when projection has cast expression with alias Problem: MV datamap creation fails when the project column as cast expression with multiple arithmetic functions on one of main table columns with alias. It throws the field does not exists error. when create datamap DDL has DM provider name as capital letters, the query was not hitting the MV table Solution: When making fieldRelationMap, handling the above case was missed, added a case to handle this scenario. When loading the datamapCatalogs, take care to convert to lower case This closes #3298 --- .../carbondata/core/datamap/DataMapStoreManager.java| 4 ++-- .../scala/org/apache/carbondata/mv/datamap/MVUtil.scala | 17 + .../apache/carbondata/mv/rewrite/MVCreateTestCase.scala | 16 .../carbondata/mv/plans/modular/ModularRelation.scala | 1 + 4 files changed, 36 insertions(+), 2 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java index 729c419..a6a2031 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java @@ -240,7 +240,7 @@ public final class DataMapStoreManager { if (dataMapCatalog == null) { dataMapCatalog = dataMapProvider.createDataMapCatalog(); if (dataMapCatalog != null) { -dataMapCatalogs.put(name, dataMapCatalog); +dataMapCatalogs.put(name.toLowerCase(), dataMapCatalog); dataMapCatalog.registerSchema(dataMapSchema); } } else { @@ -291,7 +291,7 @@ public final class DataMapStoreManager { if (null == dataMapCatalog) { throw new RuntimeException("Internal Error."); } -dataMapCatalogs.put(schema.getProviderName(), dataMapCatalog); +dataMapCatalogs.put(schema.getProviderName().toLowerCase(), dataMapCatalog); } try { dataMapCatalog.registerSchema(schema); diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala index 048e22d..4e633a6 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala @@ -113,6 +113,23 @@ class MVUtil { } case a@Alias(_, name) => checkIfComplexDataTypeExists(a) +val arrayBuffer: ArrayBuffer[ColumnTableRelation] = new ArrayBuffer[ColumnTableRelation]() +a.collect { + case attr: AttributeReference => +val carbonTable = getCarbonTable(logicalRelation, attr) +if (null != carbonTable) { + val relation = getColumnRelation(attr.name, + carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableId, + carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableName, + carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getDatabaseName, +carbonTable) + if (null != relation) { +arrayBuffer += relation + } +} +} +fieldToDataMapFieldMap += +getFieldToDataMapFields(a.name, a.dataType, None, "arithmetic", arrayBuffer, "") } fieldToDataMapFieldMap } diff --git a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala index 535ddef..1d259c8 100644 --- a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala +++ b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala @@ -1153,6 +1153,22 @@ class MVCreateTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table IF EXISTS maintable") } + test("test cast expression with mv") { +sql("drop table IF EXISTS maintable") +sql("create table maintable (m_month bigint, c_code string, " + +"c_country smallint, d_dollar_value double, q_quantity double, u_unit smallint,
[carbondata] branch master updated: [CARBONDATA-3444]Fix MV query failure when column name and table name is same in case of join scenario
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 2b0e79c [CARBONDATA-3444]Fix MV query failure when column name and table name is same in case of join scenario 2b0e79c is described below commit 2b0e79c66357fce671c5f421fd5c400e28f69fde Author: akashrn5 AuthorDate: Wed Jun 19 12:52:30 2019 +0530 [CARBONDATA-3444]Fix MV query failure when column name and table name is same in case of join scenario Problem: when there are columns with same in different table, after sql generation, the project column will be like gen_subsumer_0.product , it fails during logical plan generation from rewritten query, as column names will be ambigous Solution: update the outputlist when there are duplicate columns present in query. Here we can form the qualified name for the Attribute reference. So when qualifier is defined for column, the qualified name wil be like _, if qualifier is not defined, then it will be _. So update for all the nodes like groupby , select nodes, so that it will be handled when there will be amguity in columns. This closes #3297 --- .../apache/carbondata/mv/datamap/MVHelper.scala| 4 +- .../org/apache/carbondata/mv/datamap/MVUtil.scala | 41 .../carbondata/mv/rewrite/DefaultMatchMaker.scala | 20 +++--- .../apache/carbondata/mv/rewrite/Navigator.scala | 16 +--- .../carbondata/mv/rewrite/MVCreateTestCase.scala | 45 ++ 5 files changed, 105 insertions(+), 21 deletions(-) diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala index 4d43088..c0831ae 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala @@ -583,7 +583,9 @@ object MVHelper { val relation = s.dataMapTableRelation.get.asInstanceOf[MVPlanWrapper].plan.asInstanceOf[Select] val outputList = getUpdatedOutputList(relation.outputList, s.dataMapTableRelation) -val mappings = s.outputList zip outputList +// when the output list contains multiple projection of same column, but relation +// contains distinct columns, mapping may go wrong with columns, so select distinct +val mappings = s.outputList.distinct zip outputList val oList = for ((o1, o2) <- mappings) yield { if (o1.name != o2.name) Alias(o2, o1.name)(exprId = o1.exprId) else o2 } diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala index 8cb2f1f..4dff5b8 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala @@ -310,4 +310,45 @@ object MVUtil { " are not allowed for this datamap") } } + + def updateDuplicateColumns(outputList: Seq[NamedExpression]): Seq[NamedExpression] = { +val duplicateNameCols = outputList.groupBy(_.name).filter(_._2.length > 1).flatMap(_._2) + .toList +val updatedOutList = outputList.map { col => + val duplicateColumn = duplicateNameCols +.find(a => a.semanticEquals(col)) + val qualifiedName = col.qualifier.getOrElse(s"${ col.exprId.id }") + "_" + col.name + if (duplicateColumn.isDefined) { +val attributesOfDuplicateCol = duplicateColumn.get.collect { + case a: AttributeReference => a +} +val attributeOfCol = col.collect { case a: AttributeReference => a } +// here need to check the whether the duplicate columns is of same tables, +// since query with duplicate columns is valid, we need to make sure, not to change their +// names with above defined qualifier name, for example in case of some expression like +// cast((FLOOR((cast(col_name) as double))).., upper layer even exprid will be same, +// we need to find the attribute ref(col_name) at lower level and check where expid is same +// or of same tables, so doin the semantic equals +val isStrictDuplicate = attributesOfDuplicateCol.forall(expr => + attributeOfCol.exists(a => a.semanticEquals(expr))) +if (!isStrictDuplicate) { + Alias(col, qualifiedName)(exprId = col.exprId) +} else if (col.qualifier.isDefined) { + Alias(col, qualifiedName)(exprId = col.exprId) + // this check is added in scenario where the column is direct Attribute reference and + // since d
[carbondata] branch master updated: [CARBONDATA-3410] Add UDF, Hex/Base64 SQL functions for binary
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new c497142 [CARBONDATA-3410] Add UDF, Hex/Base64 SQL functions for binary c497142 is described below commit c4971422f283288491cf6e8eea65b35d3a6af091 Author: xubo245 AuthorDate: Fri May 31 20:33:25 2019 +0800 [CARBONDATA-3410] Add UDF, Hex/Base64 SQL functions for binary Add UDF, Hex/Base64 SQL functions for binary This closes # 3253 --- .../testsuite/binary/TestBinaryDataType.scala | 32 + .../SparkCarbonDataSourceBinaryTest.scala | 140 + 2 files changed, 117 insertions(+), 55 deletions(-) diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/binary/TestBinaryDataType.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/binary/TestBinaryDataType.scala index 15e3ee9..1b73aba 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/binary/TestBinaryDataType.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/binary/TestBinaryDataType.scala @@ -65,6 +65,17 @@ class TestBinaryDataType extends QueryTest with BeforeAndAfterAll { } assert(flag) +sqlContext.udf.register("decodeHex", (str: String) => Hex.decodeHex(str.toCharArray)) +sqlContext.udf.register("decodeBase64", (str: String) => Base64.decodeBase64(str.getBytes())) + +val udfHexResult = sql("SELECT decodeHex(binaryField) FROM binaryTable") +val unhexResult = sql("SELECT unhex(binaryField) FROM binaryTable") +checkAnswer(udfHexResult, unhexResult) + +val udfBase64Result = sql("SELECT decodeBase64(binaryField) FROM binaryTable") +val unbase64Result = sql("SELECT unbase64(binaryField) FROM binaryTable") +checkAnswer(udfBase64Result, unbase64Result) + checkAnswer(sql("SELECT COUNT(*) FROM binaryTable"), Seq(Row(3))) try { val df = sql("SELECT * FROM binaryTable").collect() @@ -614,6 +625,27 @@ class TestBinaryDataType extends QueryTest with BeforeAndAfterAll { | OPTIONS('header'='false','DELIMITER'='|','bad_records_action'='fail') """.stripMargin) +val hexHiveResult = sql("SELECT hex(binaryField) FROM hivetable") +val hexCarbonResult = sql("SELECT hex(binaryField) FROM carbontable") +checkAnswer(hexHiveResult, hexCarbonResult) +hexCarbonResult.collect().foreach { each => +val result = new String(Hex.decodeHex((each.getAs[Array[Char]](0)).toString.toCharArray)) +assert("\u0001history\u0002".equals(result) +|| "\u0001biology\u0002".equals(result) +|| "\u0001education\u0002".equals(result)) +} + +val base64HiveResult = sql("SELECT base64(binaryField) FROM hivetable") +val base64CarbonResult = sql("SELECT base64(binaryField) FROM carbontable") +checkAnswer(base64HiveResult, base64CarbonResult) +base64CarbonResult.collect().foreach { each => +val result = new String(Base64.decodeBase64((each.getAs[Array[Char]](0)).toString)) +assert("\u0001history\u0002".equals(result) +|| "\u0001biology\u0002".equals(result) +|| "\u0001education\u0002".equals(result)) +} + + val hiveResult = sql("SELECT * FROM hivetable") val carbonResult = sql("SELECT * FROM carbontable") checkAnswer(hiveResult, carbonResult) diff --git a/integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceBinaryTest.scala b/integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceBinaryTest.scala index bdfc9dd..d234576 100644 --- a/integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceBinaryTest.scala +++ b/integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceBinaryTest.scala @@ -17,16 +17,14 @@ package org.apache.spark.sql.carbondata.datasource import java.io.File - import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.sdk.util.BinaryUtil +import org.apache.commons.codec.binary.{Base64, Hex} import org.apache.commons.io.FileUtils - import or
[carbondata] branch master updated: [CARBONDATA-3421] Fix create table without column with properties failed, but throw incorrect exception
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new d2bc0a9 [CARBONDATA-3421] Fix create table without column with properties failed, but throw incorrect exception d2bc0a9 is described below commit d2bc0a9cf78b770f6507981e833799dcbfbb51d7 Author: jack86596 AuthorDate: Mon Jun 10 09:53:49 2019 +0800 [CARBONDATA-3421] Fix create table without column with properties failed, but throw incorrect exception Problem: Create table without column with properties failed, but throw incorrect exception: Invalid table properties. The exception should be "create table without column." Solution: In CarbonSparkSqlParserUtil.createCarbonTable, we will do some validations like checking tblproperties, is column provided for external table so on. We can add one more validation here to check is column provided for normal table. If not, throw MalformedCarbonCommandException. This closes #3268 --- .../cluster/sdv/generated/SDKwriterTestCase.scala | 2 +- .../testsuite/createTable/TestCreateTableIfNotExists.scala | 6 ++ .../org/apache/carbondata/spark/util/CommonUtil.scala | 14 -- .../apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala | 6 -- 4 files changed, 19 insertions(+), 9 deletions(-) diff --git a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SDKwriterTestCase.scala b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SDKwriterTestCase.scala index 619bfb3..499c478 100644 --- a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SDKwriterTestCase.scala +++ b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SDKwriterTestCase.scala @@ -333,7 +333,7 @@ class SDKwriterTestCase extends QueryTest with BeforeAndAfterEach { |'carbondata' LOCATION |'$writerPath' TBLPROPERTIES('sort_scope'='batch_sort') """.stripMargin) } -assert(ex.message.contains("table properties are not supported for external table")) +assert(ex.message.contains("Table properties are not supported for external table")) } test("Read sdk writer output file and test without carbondata and carbonindex files should fail") diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableIfNotExists.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableIfNotExists.scala index b3fa0eb..35238dc 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableIfNotExists.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableIfNotExists.scala @@ -86,6 +86,12 @@ class TestCreateTableIfNotExists extends QueryTest with BeforeAndAfterAll { } } + test("test create table without column specified") { +val exception = intercept[MalformedCarbonCommandException] { + sql("create table TableWithoutColumn stored by 'carbondata' tblproperties('sort_columns'='')") +} +assert(exception.getMessage.contains("Creating table without column(s) is not supported")) + } override def afterAll { sql("use default") diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala index da42363..1c89a0c 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala @@ -95,12 +95,14 @@ object CommonUtil { def validateTblProperties(tableProperties: Map[String, String], fields: Seq[Field]): Boolean = { var isValid: Boolean = true -tableProperties.foreach { - case (key, value) => -if (!validateFields(key, fields)) { - isValid = false - throw new MalformedCarbonCommandException(s"Invalid table properties ${ key }") -} +if (fields.nonEmpty) { + tableProperties.foreach { +case (key, value) => + if (!validateFields(key, fields)) { +isValid = false +throw new MalformedCarbonCommandException(s"Invalid table properties $key") + } + } } isValid } diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala b/i
[carbondata] branch master updated: [CARBONDATA-3336] Support configurable decode for loading binary data, support base64 and Hex decode.
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 3dda02d [CARBONDATA-3336] Support configurable decode for loading binary data, support base64 and Hex decode. 3dda02d is described below commit 3dda02d44c4dca12c99e16df4f29dd3e8f2e6dc1 Author: xubo245 AuthorDate: Tue Apr 23 15:45:25 2019 +0800 [CARBONDATA-3336] Support configurable decode for loading binary data, support base64 and Hex decode. Support configurable decode for loading binary data, support base64 and Hex decode. 1. support configurable decode for loading 2. test datamap: mv, preaggregate, timeseries, bloomfilter, lucene 3. test datamap and configurable decode Default non decoder for loading binary data, this PR support base64 and hex decoder This closes #3188 --- .../core/constants/CarbonLoadOptionConstants.java | 13 ++ .../carbondata/mv/rewrite/MVCreateTestCase.scala | 59 + .../src/test/resources/binaryDataBase64.csv| 3 + .../{binarydata.csv => binaryDataHex.csv} | 0 .../testsuite/binary/TestBinaryDataType.scala | 247 ++-- .../preaggregate/TestPreAggStreaming.scala | 11 + .../testsuite/dataload/TestLoadDataFrame.scala | 42 .../testsuite/datamap/TestDataMapCommand.scala | 257 +++-- .../spark/sql/catalyst/CarbonDDLSqlParser.scala| 1 + .../datasources/CarbonSparkDataSourceUtil.scala| 4 + .../SparkCarbonDataSourceBinaryTest.scala | 37 ++- .../datasource/SparkCarbonDataSourceTest.scala | 69 +- .../apache/spark/sql/CarbonDataFrameWriter.scala | 1 + .../processing/loading/DataLoadProcessBuilder.java | 2 + .../converter/impl/BinaryFieldConverterImpl.java | 26 +-- .../converter/impl/FieldEncoderFactory.java| 54 - .../loading/converter/impl/RowConverterImpl.java | 9 +- .../converter/impl/binary/Base64BinaryDecoder.java | 42 .../converter/impl/binary/BinaryDecoder.java | 29 +++ .../impl/binary/DefaultBinaryDecoder.java | 32 +++ .../converter/impl/binary/HexBinaryDecoder.java| 34 +++ .../processing/loading/model/CarbonLoadModel.java | 15 ++ .../loading/model/CarbonLoadModelBuilder.java | 17 ++ .../processing/util/CarbonLoaderUtil.java | 9 + .../carbondata/sdk/file/CarbonWriterBuilder.java | 10 +- .../org/apache/carbondata/sdk/file/ImageTest.java | 108 - 26 files changed, 1068 insertions(+), 63 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java index 225a8aa..3bcb06f 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java @@ -172,4 +172,17 @@ public final class CarbonLoadOptionConstants { public static final String CARBON_LOAD_SORT_MEMORY_SPILL_PERCENTAGE_DEFAULT = "0"; + + /** + * carbon binary decoder when writing string data to binary, like decode base64, Hex + */ + @CarbonProperty + public static final String CARBON_OPTIONS_BINARY_DECODER = "carbon.binary.decoder"; + + public static final String CARBON_OPTIONS_BINARY_DECODER_DEFAULT = ""; + + public static final String CARBON_OPTIONS_BINARY_DECODER_BASE64 = "base64"; + + public static final String CARBON_OPTIONS_BINARY_DECODER_HEX = "hex"; + } diff --git a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala index 62e320e..5e12ad3 100644 --- a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala +++ b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala @@ -970,6 +970,65 @@ class MVCreateTestCase extends QueryTest with BeforeAndAfterAll { } } + test("test binary on mv") { +val querySQL = "select x19,x20,sum(x18) from all_table group by x19, x20" +val querySQL2 = "select x19,x20,sum(x18) from all_table where x20=cast('binary2' as binary ) group by x19, x20" + +sql("drop datamap if exists all_table_mv") +sql("drop table if exists all_table") + +sql( + """ +| create table all_table(x1 bigint,x2 bigint, +| x3 string,x4 bigint,x5 bigint,x6 int,x7 string,x8 int, x9 int,x10 bigint, +| x11 bigint, x12 bigint,x13 bigint,x14 bigint,x15 bigint,x16 bigint, +| x17 bigint,x18 bigint,x19 bigint,x20 binary) stored by 'carbondata'""
[carbondata] branch master updated: [CARBONDATA-3394]Clean files command optimization
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 6817e77 [CARBONDATA-3394]Clean files command optimization 6817e77 is described below commit 6817e77ad667dbb483c76812f0478044fc444c49 Author: akashrn5 AuthorDate: Mon May 27 12:24:33 2019 +0530 [CARBONDATA-3394]Clean files command optimization Problem Clean files is taking of lot of time to finish, even though there are no segments to delete Tested for 5000 segments, and clean files takes 15 minutes time to finish Root cause and Solution Lot of table status read operations are were happening during clean files lot of listing operations are happening, even though they are not required. Read and list operations are reduced to reduce overall time for clean files. After changes, for the same store, it takes 35 seconds in same 3 node cluste This closes #3227 --- .../carbondata/core/mutate/CarbonUpdateUtil.java | 160 +++-- .../core/statusmanager/SegmentStatusManager.java | 37 +++-- .../statusmanager/SegmentUpdateStatusManager.java | 21 +-- .../org/apache/carbondata/api/CarbonStore.scala| 3 +- 4 files changed, 105 insertions(+), 116 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java b/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java index beaf1a0..736def6 100644 --- a/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java @@ -466,94 +466,96 @@ public class CarbonUpdateUtil { if (segment.getSegmentStatus() == SegmentStatus.SUCCESS || segment.getSegmentStatus() == SegmentStatus.LOAD_PARTIAL_SUCCESS) { -// take the list of files from this segment. -String segmentPath = CarbonTablePath.getSegmentPath( -table.getAbsoluteTableIdentifier().getTablePath(), segment.getLoadName()); -CarbonFile segDir = -FileFactory.getCarbonFile(segmentPath, FileFactory.getFileType(segmentPath)); -CarbonFile[] allSegmentFiles = segDir.listFiles(); - -// scan through the segment and find the carbondatafiles and index files. -SegmentUpdateStatusManager updateStatusManager = new SegmentUpdateStatusManager(table); - -boolean updateSegmentFile = false; -// deleting of the aborted file scenario. -if (deleteStaleCarbonDataFiles(segment, allSegmentFiles, updateStatusManager)) { - updateSegmentFile = true; -} - -// get Invalid update delta files. -CarbonFile[] invalidUpdateDeltaFiles = updateStatusManager -.getUpdateDeltaFilesList(segment.getLoadName(), false, -CarbonCommonConstants.UPDATE_DELTA_FILE_EXT, true, allSegmentFiles, -isInvalidFile); - -// now for each invalid delta file need to check the query execution time out -// and then delete. -for (CarbonFile invalidFile : invalidUpdateDeltaFiles) { - compareTimestampsAndDelete(invalidFile, forceDelete, false); -} -// do the same for the index files. -CarbonFile[] invalidIndexFiles = updateStatusManager -.getUpdateDeltaFilesList(segment.getLoadName(), false, -CarbonCommonConstants.UPDATE_INDEX_FILE_EXT, true, allSegmentFiles, -isInvalidFile); - -// now for each invalid index file need to check the query execution time out -// and then delete. - -for (CarbonFile invalidFile : invalidIndexFiles) { - if (compareTimestampsAndDelete(invalidFile, forceDelete, false)) { +// when there is no update operations done on table, then no need to go ahead. So +// just check the update delta start timestamp and proceed if not empty +if (!segment.getUpdateDeltaStartTimestamp().isEmpty()) { + // take the list of files from this segment. + String segmentPath = CarbonTablePath.getSegmentPath( + table.getAbsoluteTableIdentifier().getTablePath(), segment.getLoadName()); + CarbonFile segDir = + FileFactory.getCarbonFile(segmentPath, FileFactory.getFileType(segmentPath)); + CarbonFile[] allSegmentFiles = segDir.listFiles(); + + // scan through the segment and find the carbondatafiles and index files. + SegmentUpdateStatusManager updateStatusManager = new SegmentUpdateStatusManager(table); + + boolean updateSegmentFile = false; + // deleting of the aborted file scenario. + if (deleteStaleCarbonDataFiles(segment, allSegmentFiles, updateStatusManager)) { updateSegmentFile = true
[carbondata] branch master updated: [CARBONDATA-3343] Compaction for Range Sort
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new affb40f [CARBONDATA-3343] Compaction for Range Sort affb40f is described below commit affb40f277f28ba362690f5d196b72392b267b3b Author: manishnalla1994 AuthorDate: Mon Apr 22 18:52:45 2019 +0530 [CARBONDATA-3343] Compaction for Range Sort Problem: To support Compaction for Range Sort in correct way as earlier it was grouping the ranges/partitions based on taskId which was not correct. Solution: Combine all the data and create new ranges using Spark's RangePartitioner and using them give each range to one task and apply the filter query to get the compacted segment. This closes #3182 --- .../core/constants/CarbonCommonConstants.java | 1 + .../core/metadata/schema/table/CarbonTable.java| 24 +- .../core/scan/expression/Expression.java | 13 + .../scan/filter/FilterExpressionProcessor.java | 5 +- .../carbondata/core/scan/filter/FilterUtil.java| 52 +- .../resolver/ConditionalFilterResolverImpl.java| 2 +- .../resolver/RowLevelRangeFilterResolverImpl.java | 40 +- .../core/scan/model/QueryModelBuilder.java | 18 +- .../core/scan/result/BlockletScannedResult.java| 62 +- .../scan/result/impl/FilterQueryScannedResult.java | 20 +- .../result/impl/NonFilterQueryScannedResult.java | 59 +- .../dataload/TestRangeColumnDataLoad.scala | 669 - .../spark/load/DataLoadProcessBuilderOnSpark.scala | 43 +- .../carbondata/spark/rdd/CarbonMergerRDD.scala | 202 ++- .../carbondata/spark/rdd/CarbonScanRDD.scala | 7 +- .../org/apache/spark/CarbonInputMetrics.scala | 0 .../apache/spark/DataSkewRangePartitioner.scala| 26 +- .../spark/sql/catalyst/CarbonDDLSqlParser.scala| 12 +- .../spark/sql/CarbonDatasourceHadoopRelation.scala | 1 - .../merger/CarbonCompactionExecutor.java | 20 +- .../processing/merger/CarbonCompactionUtil.java| 140 + .../merger/RowResultMergerProcessor.java | 6 +- 22 files changed, 1274 insertions(+), 148 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 608b5fb..ba8e20a 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1759,6 +1759,7 @@ public final class CarbonCommonConstants { public static final String ARRAY = "array"; public static final String STRUCT = "struct"; public static final String MAP = "map"; + public static final String DECIMAL = "decimal"; public static final String FROM = "from"; /** diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java index 54ea772..c66d1fc 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java @@ -1081,22 +1081,26 @@ public class CarbonTable implements Serializable { return dataSize + indexSize; } - public void processFilterExpression(Expression filterExpression, - boolean[] isFilterDimensions, boolean[] isFilterMeasures) { -QueryModel.FilterProcessVO processVO = -new QueryModel.FilterProcessVO(getDimensionByTableName(getTableName()), -getMeasureByTableName(getTableName()), getImplicitDimensionByTableName(getTableName())); -QueryModel.processFilterExpression(processVO, filterExpression, isFilterDimensions, -isFilterMeasures, this); - + public void processFilterExpression(Expression filterExpression, boolean[] isFilterDimensions, + boolean[] isFilterMeasures) { +processFilterExpressionWithoutRange(filterExpression, isFilterDimensions, isFilterMeasures); if (null != filterExpression) { // Optimize Filter Expression and fit RANGE filters is conditions apply. - FilterOptimizer rangeFilterOptimizer = - new RangeFilterOptmizer(filterExpression); + FilterOptimizer rangeFilterOptimizer = new RangeFilterOptmizer(filterExpression); rangeFilterOptimizer.optimizeFilter(); } } + public void processFilterExpressionWithoutRange(Expression filterExpression, + boolean[] isFilterDimensions, boolean[] isFilterMeasures) { +QueryModel.FilterProcessVO processVO = +new QueryModel.FilterProcessVO(getDimensionByTableName(getTableName()), +getMeasureByTableName(getTableName
[carbondata] branch master updated: [CARBONDATA-3345]A growing streaming ROW_V1 carbondata file would be ingored some InputSplits
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 0ab2412 [CARBONDATA-3345]A growing streaming ROW_V1 carbondata file would be ingored some InputSplits 0ab2412 is described below commit 0ab2412b2f403392a7a17ce20a2327a35f4b8dd0 Author: junyan-zg <275620...@qq.com> AuthorDate: Wed Apr 24 22:46:51 2019 +0800 [CARBONDATA-3345]A growing streaming ROW_V1 carbondata file would be ingored some InputSplits After looking at carbondata segments, when the file grows to more than 150 M (possibly 128M), Presto initiates a query by separating several small files, including those in ROW_V1 format. This bug causes some small files in ROW_V1 format to be ignored, resulting in inaccurate queries. So for the carbondata ROW_V1 inputSplits MapKey(Java), I adjust concat 'carbonInput.getStart()' to keeping the required inputSplit This closes #3186 --- .../org/apache/carbondata/presto/impl/CarbonTableReader.java | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java b/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java index 57d8d5e..7ffe053 100755 --- a/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java +++ b/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java @@ -46,6 +46,7 @@ import org.apache.carbondata.core.metadata.schema.table.CarbonTable; import org.apache.carbondata.core.metadata.schema.table.TableInfo; import org.apache.carbondata.core.reader.ThriftReader; import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.statusmanager.FileFormat; import org.apache.carbondata.core.statusmanager.LoadMetadataDetails; import org.apache.carbondata.core.statusmanager.SegmentStatusManager; import org.apache.carbondata.core.util.CarbonProperties; @@ -291,7 +292,13 @@ public class CarbonTableReader { // Use block distribution List> inputSplits = new ArrayList( result.stream().map(x -> (CarbonLocalInputSplit) x).collect(Collectors.groupingBy( -carbonInput -> carbonInput.getSegmentId().concat(carbonInput.getPath(.values()); +carbonInput -> { + if (FileFormat.ROW_V1.equals(carbonInput.getFileFormat())) { +return carbonInput.getSegmentId().concat(carbonInput.getPath()) + .concat(carbonInput.getStart() + ""); + } + return carbonInput.getSegmentId().concat(carbonInput.getPath()); +})).values()); if (inputSplits != null) { for (int j = 0; j < inputSplits.size(); j++) { multiBlockSplitList.add(new CarbonLocalMultiBlockSplit(inputSplits.get(j),
[carbondata] branch master updated: [CARBONDATA-3359]Fix data mismatch issue for decimal column after delete operation
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new eb7a833 [CARBONDATA-3359]Fix data mismatch issue for decimal column after delete operation eb7a833 is described below commit eb7a8335013957c9a615a48c7304b7968a2f7e24 Author: akashrn5 AuthorDate: Thu Apr 25 15:16:35 2019 +0530 [CARBONDATA-3359]Fix data mismatch issue for decimal column after delete operation Problem: after delete operation is performed, the decimal column data is wrong. This is because, during filling vector for decimal column, we were not considering the deleted rows if present any, we were filling all the row data for decimal. Solution in case of decimal, get the vector from ColumnarVectorWrapperDirectFactory and then put data, which will take care of the deleted rows This closes #3189 --- .../metadata/datatype/DecimalConverterFactory.java | 55 +- .../src/test/resources/decimalData.csv | 4 ++ .../testsuite/iud/DeleteCarbonTableTestCase.scala | 17 +++ 3 files changed, 54 insertions(+), 22 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/datatype/DecimalConverterFactory.java b/core/src/main/java/org/apache/carbondata/core/metadata/datatype/DecimalConverterFactory.java index 9793c38..2e155f4 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/datatype/DecimalConverterFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/datatype/DecimalConverterFactory.java @@ -23,6 +23,7 @@ import java.util.BitSet; import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; +import org.apache.carbondata.core.scan.result.vector.impl.directread.ColumnarVectorWrapperDirectFactory; import org.apache.carbondata.core.util.ByteUtil; import org.apache.carbondata.core.util.DataTypeUtil; @@ -102,13 +103,13 @@ public final class DecimalConverterFactory { return BigDecimal.valueOf((Long) valueToBeConverted, scale); } -@Override public void fillVector(Object valuesToBeConverted, int size, ColumnVectorInfo info, -BitSet nullBitset, DataType pageType) { +@Override public void fillVector(Object valuesToBeConverted, int size, +ColumnVectorInfo vectorInfo, BitSet nullBitSet, DataType pageType) { // TODO we need to find way to directly set to vector with out conversion. This way is very // inefficient. - CarbonColumnVector vector = info.vector; - int precision = info.measure.getMeasure().getPrecision(); - int newMeasureScale = info.measure.getMeasure().getScale(); + CarbonColumnVector vector = getCarbonColumnVector(vectorInfo, nullBitSet); + int precision = vectorInfo.measure.getMeasure().getPrecision(); + int newMeasureScale = vectorInfo.measure.getMeasure().getScale(); if (!(valuesToBeConverted instanceof byte[])) { throw new UnsupportedOperationException("This object type " + valuesToBeConverted.getClass() + " is not supported in this method"); @@ -116,7 +117,7 @@ public final class DecimalConverterFactory { byte[] data = (byte[]) valuesToBeConverted; if (pageType == DataTypes.BYTE) { for (int i = 0; i < size; i++) { - if (nullBitset.get(i)) { + if (nullBitSet.get(i)) { vector.putNull(i); } else { BigDecimal value = BigDecimal.valueOf(data[i], scale); @@ -128,7 +129,7 @@ public final class DecimalConverterFactory { } } else if (pageType == DataTypes.SHORT) { for (int i = 0; i < size; i++) { - if (nullBitset.get(i)) { + if (nullBitSet.get(i)) { vector.putNull(i); } else { BigDecimal value = BigDecimal @@ -142,7 +143,7 @@ public final class DecimalConverterFactory { } } else if (pageType == DataTypes.SHORT_INT) { for (int i = 0; i < size; i++) { - if (nullBitset.get(i)) { + if (nullBitSet.get(i)) { vector.putNull(i); } else { BigDecimal value = BigDecimal @@ -156,7 +157,7 @@ public final class DecimalConverterFactory { } } else if (pageType == DataTypes.INT) { for (int i = 0; i < size; i++) { - if (nullBitset.get(i)) { + if (nullBitSet.get(i)) { vector.putNull(i); } else { BigDecimal value = BigDecimal @@ -170,7 +171,7 @@ public final class DecimalConverterFactory { } } else if (pageType == DataTypes.LONG) { for (int i = 0; i < size; i++) { - if (nullBitset.get(i)) { + if (nullBitSet.get(i)) { v
[carbondata] branch master updated: [CARBONDATA-3341] fixed invalid NULL result in filter query
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new a6ab97c [CARBONDATA-3341] fixed invalid NULL result in filter query a6ab97c is described below commit a6ab97ca40427af5225f12a063a0e44221a503e1 Author: kunal642 AuthorDate: Thu Apr 4 11:53:05 2019 +0530 [CARBONDATA-3341] fixed invalid NULL result in filter query Problem: When vector filter push down is true and the table contains a null value then thegetNullBitSet method is giving an byte[]to represent null. But there is no check for the value of the bitset. Solution: Check if null bit set length is 0 then set the same to the chunkData. This closes #3172 --- .../core/datastore/chunk/store/ColumnPageWrapper.java | 7 ++- .../spark/testsuite/sortcolumns/TestSortColumns.scala | 14 ++ 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java index a1c4aec..f4d3fe4 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java @@ -261,7 +261,12 @@ public class ColumnPageWrapper implements DimensionColumnPage { // if the compare value is null and the data is also null we can directly return 0 return 0; } else { - byte[] chunkData = this.getChunkDataInBytes(rowId); + byte[] chunkData; + if (nullBitSet != null && nullBitSet.length == 0) { +chunkData = nullBitSet; + } else { +chunkData = this.getChunkDataInBytes(rowId); + } return ByteUtil.UnsafeComparer.INSTANCE.compareTo(chunkData, compareValue); } } diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala index df97d0f..bbd58c0 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala @@ -385,6 +385,17 @@ class TestSortColumns extends QueryTest with BeforeAndAfterAll { "sort_columns is unsupported for double datatype column: empno")) } + test("test if equal to 0 filter on sort column gives correct result") { + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, + "true") +sql("create table test1(a bigint) stored by 'carbondata' TBLPROPERTIES('sort_columns'='a')") +sql("insert into test1 select 'k'") +sql("insert into test1 select '1'") +assert(sql("select * from test1 where a = 1 or a = 0").count() == 1) + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, + CarbonCommonConstants.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT) + } + override def afterAll = { dropTestTables CarbonProperties.getInstance().addProperty( @@ -392,9 +403,12 @@ class TestSortColumns extends QueryTest with BeforeAndAfterAll { CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT) + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR, + CarbonCommonConstants.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT) } def dropTestTables = { +sql("drop table if exists test1") sql("drop table if exists sortint") sql("drop table if exists sortint1") sql("drop table if exists sortlong")
[carbondata] branch master updated: [CARBONDATA-3302] [Spark-Integration] code cleaning related to CarbonCreateTable command
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 66982f3 [CARBONDATA-3302] [Spark-Integration] code cleaning related to CarbonCreateTable command 66982f3 is described below commit 66982f342865e7bd8c256630cf4b6d38ec62890a Author: s71955 AuthorDate: Sun Feb 24 21:45:16 2019 +0530 [CARBONDATA-3302] [Spark-Integration] code cleaning related to CarbonCreateTable command What changes were proposed in this pull request? Removed Extra check to validate whether the stream relation is not null , moreover condition can be optimized further, currently the condition has path validation whether path is part of s3 file system and then system is checking whether the stream relation is not null, this check can be added initially as this overall condition has to be evaluated for stream table only if stream is not null. This closes #3134 --- .../spark/sql/execution/command/table/CarbonCreateTableCommand.scala | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableCommand.scala index 12eb420..1e17ffe 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableCommand.scala @@ -78,8 +78,7 @@ case class CarbonCreateTableCommand( path } val streaming = tableInfo.getFactTable.getTableProperties.get("streaming") - if (path.startsWith("s3") && streaming != null && streaming != null && - streaming.equalsIgnoreCase("true")) { + if (streaming != null && streaming.equalsIgnoreCase("true") && path.startsWith("s3")) { throw new UnsupportedOperationException("streaming is not supported with s3 store") } tableInfo.setTablePath(tablePath)
[carbondata] branch master updated: [CARBONDATA-3297] Fix that the IndexoutOfBoundsException when creating table and dropping table are at the same time
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 6840a18 [CARBONDATA-3297] Fix that the IndexoutOfBoundsException when creating table and dropping table are at the same time 6840a18 is described below commit 6840a183689ad6acd86e1850dedff8665bf126ae Author: qiuchenjian <807169...@qq.com> AuthorDate: Wed Feb 20 17:16:34 2019 +0800 [CARBONDATA-3297] Fix that the IndexoutOfBoundsException when creating table and dropping table are at the same time [Problem] Throw the IndexoutOfBoundsException when creating table and dropping table are at the same time [Solution] The type of carbonTables in MetaData.class is ArrayBuffer, and the ArrayBuffer is not thread-safe, so it throw this exception when creating table and dropping table are at the same time Use read write lock to guarantee the thread-safe This closes #3130 --- .../spark/sql/hive/CarbonFileMetastore.scala | 37 +++--- 1 file changed, 33 insertions(+), 4 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala index c1be154..ea3bba8 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala @@ -19,6 +19,7 @@ package org.apache.spark.sql.hive import java.io.IOException import java.net.URI +import java.util.concurrent.locks.{Lock, ReentrantReadWriteLock} import scala.collection.mutable.ArrayBuffer @@ -43,7 +44,8 @@ import org.apache.carbondata.core.fileoperations.FileWriteOperation import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, CarbonMetadata, CarbonTableIdentifier} import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl import org.apache.carbondata.core.metadata.schema -import org.apache.carbondata.core.metadata.schema.{table, SchemaReader} +import org.apache.carbondata.core.metadata.schema.SchemaReader +import org.apache.carbondata.core.metadata.schema.table import org.apache.carbondata.core.metadata.schema.table.CarbonTable import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil} import org.apache.carbondata.core.util.path.CarbonTablePath @@ -53,9 +55,16 @@ import org.apache.carbondata.format.{SchemaEvolutionEntry, TableInfo} import org.apache.carbondata.spark.util.CarbonSparkUtil case class MetaData(var carbonTables: ArrayBuffer[CarbonTable]) { + // use to lock the carbonTables + val lock : ReentrantReadWriteLock = new ReentrantReadWriteLock + val readLock: Lock = lock.readLock() + val writeLock: Lock = lock.writeLock() + // clear the metadata def clear(): Unit = { +writeLock.lock() carbonTables.clear() +writeLock.unlock() } } @@ -192,9 +201,12 @@ class CarbonFileMetastore extends CarbonMetaStore { * @return */ def getTableFromMetadataCache(database: String, tableName: String): Option[CarbonTable] = { -metadata.carbonTables +metadata.readLock.lock() +val ret = metadata.carbonTables .find(table => table.getDatabaseName.equalsIgnoreCase(database) && table.getTableName.equalsIgnoreCase(tableName)) +metadata.readLock.unlock() +ret } def tableExists( @@ -270,11 +282,14 @@ class CarbonFileMetastore extends CarbonMetaStore { } } + wrapperTableInfo.map { tableInfo => CarbonMetadata.getInstance().removeTable(tableUniqueName) CarbonMetadata.getInstance().loadTableMetadata(tableInfo) val carbonTable = CarbonMetadata.getInstance().getCarbonTable(tableUniqueName) + metadata.writeLock.lock() metadata.carbonTables += carbonTable + metadata.writeLock.unlock() carbonTable } } @@ -413,8 +428,11 @@ class CarbonFileMetastore extends CarbonMetaStore { CarbonMetadata.getInstance.removeTable(tableInfo.getTableUniqueName) removeTableFromMetadata(identifier.getDatabaseName, identifier.getTableName) CarbonMetadata.getInstance().loadTableMetadata(tableInfo) +metadata.writeLock.lock() metadata.carbonTables += CarbonMetadata.getInstance().getCarbonTable(identifier.getTableUniqueName) +metadata.writeLock.unlock() +metadata.carbonTables } /** @@ -427,7 +445,9 @@ class CarbonFileMetastore extends CarbonMetaStore { val carbonTableToBeRemoved: Option[CarbonTable] = getTableFromMetadataCache(dbName, tableName) carbonTableToBeRemoved match { case Some(carbonTable) => +metadata.writeLock.lock() metadata.carbonTables -= carbonTable +metadata.writeLock.un
[carbondata] branch master updated: [CARBONDATA-3307] Fix Performance Issue in No Sort
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new f5e4793 [CARBONDATA-3307] Fix Performance Issue in No Sort f5e4793 is described below commit f5e4793bda2324f8417afc4fc7aaeb09acdea2a0 Author: shivamasn AuthorDate: Wed Mar 6 19:03:01 2019 +0530 [CARBONDATA-3307] Fix Performance Issue in No Sort When creating the table without sort_columns and loading the data into it, it is generating more carbondata files than expected. Now the no. of carbondata files is being generated based on the no. of threads launched. Each thread is initialising its own writer and writing data. Now we pass the same writer instance to all the threads, so all the threads will write the data to same file. This closes #3140 --- .../CarbonRowDataWriterProcessorStepImpl.java | 61 ++ 1 file changed, 29 insertions(+), 32 deletions(-) diff --git a/processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java b/processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java index f976abe..184248c 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java +++ b/processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java @@ -18,9 +18,7 @@ package org.apache.carbondata.processing.loading.steps; import java.io.IOException; import java.util.Iterator; -import java.util.List; import java.util.Map; -import java.util.concurrent.CopyOnWriteArrayList; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; @@ -83,16 +81,17 @@ public class CarbonRowDataWriterProcessorStepImpl extends AbstractDataLoadProces private Map localDictionaryGeneratorMap; - private List carbonFactHandlers; + private CarbonFactHandler dataHandler; private ExecutorService executorService = null; + private static final Object lock = new Object(); + public CarbonRowDataWriterProcessorStepImpl(CarbonDataLoadConfiguration configuration, AbstractDataLoadProcessorStep child) { super(configuration, child); this.localDictionaryGeneratorMap = CarbonUtil.getLocalDictionaryModel(configuration.getTableSpec().getCarbonTable()); -this.carbonFactHandlers = new CopyOnWriteArrayList<>(); } @Override public void initialize() throws IOException { @@ -129,20 +128,31 @@ public class CarbonRowDataWriterProcessorStepImpl extends AbstractDataLoadProces .recordDictionaryValue2MdkAdd2FileTime(CarbonTablePath.DEPRECATED_PARTITION_ID, System.currentTimeMillis()); + //Creating a Instance of CarbonFacthandler that will be passed to all the threads + String[] storeLocation = getStoreLocation(); + DataMapWriterListener listener = getDataMapWriterListener(0); + CarbonFactDataHandlerModel model = CarbonFactDataHandlerModel + .createCarbonFactDataHandlerModel(configuration, storeLocation, 0, 0, listener); + model.setColumnLocalDictGenMap(localDictionaryGeneratorMap); + dataHandler = CarbonFactHandlerFactory.createCarbonFactHandler(model); + dataHandler.initialise(); + if (iterators.length == 1) { -doExecute(iterators[0], 0); +doExecute(iterators[0], 0, dataHandler); } else { executorService = Executors.newFixedThreadPool(iterators.length, new CarbonThreadFactory("NoSortDataWriterPool:" + configuration.getTableIdentifier() .getCarbonTableIdentifier().getTableName())); Future[] futures = new Future[iterators.length]; for (int i = 0; i < iterators.length; i++) { - futures[i] = executorService.submit(new DataWriterRunnable(iterators[i], i)); + futures[i] = executorService.submit(new DataWriterRunnable(iterators[i], i, dataHandler)); } for (Future future : futures) { future.get(); } } + finish(dataHandler, 0); + dataHandler = null; } catch (CarbonDataWriterException e) { LOGGER.error("Failed for table: " + tableName + " in DataWriterProcessorStepImpl", e); throw new CarbonDataLoadingException( @@ -157,31 +167,15 @@ public class CarbonRowDataWriterProcessorStepImpl extends AbstractDataLoadProces return null; } - private void doExecute(Iterator iterator, int iteratorIndex) throws IOException { -String[] storeLocation = getStoreLocation(); -DataMapWriterListener listener = getDataMapWriterListener(0); -CarbonFactDataHandlerModel model = CarbonFactDataHandle
[carbondata] branch master updated: [CARBONDATA-3280] Fix the issue of SDK assert can't work
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new ba139b6 [CARBONDATA-3280] Fix the issue of SDK assert can't work ba139b6 is described below commit ba139b642d266e0767fedd4fb53c16d198b26d35 Author: xubo245 AuthorDate: Tue Jan 29 11:36:48 2019 +0800 [CARBONDATA-3280] Fix the issue of SDK assert can't work After PR-3097 merged, the batch rule has been changed, but the test didn't work, such as: org.apache.carbondata.sdk.file.CarbonReaderTest#testReadNextBatchRow org.apache.carbondata.sdk.file.CarbonReaderTest#testReadNextBatchRowWithVectorReader So this PR fixed the test error and add some assert This closes #3112 --- .../carbondata/core/util/CarbonProperties.java | 2 +- .../carbondata/sdk/file/CarbonReaderTest.java | 90 +++--- 2 files changed, 63 insertions(+), 29 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java b/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java index 49388b7..b337e40 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java +++ b/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java @@ -1572,7 +1572,7 @@ public final class CarbonProperties { try { batchSize = Integer.parseInt(batchSizeString); if (batchSize < DETAIL_QUERY_BATCH_SIZE_MIN || batchSize > DETAIL_QUERY_BATCH_SIZE_MAX) { - LOGGER.info("Invalid carbon.detail.batch.size.Using default value " + LOGGER.warn("Invalid carbon.detail.batch.size.Using default value " + DETAIL_QUERY_BATCH_SIZE_DEFAULT); carbonProperties.setProperty(DETAIL_QUERY_BATCH_SIZE, Integer.toString(DETAIL_QUERY_BATCH_SIZE_DEFAULT)); diff --git a/store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java b/store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java index 28944da..871d51b 100644 --- a/store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java +++ b/store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java @@ -104,7 +104,7 @@ public class CarbonReaderTest extends TestCase { FileUtils.deleteDirectory(new File(path)); } - @Test public void testReadWithZeroBatchSize() throws IOException, InterruptedException { + @Test public void testReadWithZeroBatchSize() throws Exception { String path = "./testWriteFiles"; FileUtils.deleteDirectory(new File(path)); DataMapStoreManager.getInstance().clearDataMaps(AbsoluteTableIdentifier.from(path)); @@ -127,6 +127,30 @@ public class CarbonReaderTest extends TestCase { FileUtils.deleteDirectory(new File(path)); } + + @Test + public void testReadBatchWithZeroBatchSize() throws Exception { +String path = "./testWriteFiles"; +FileUtils.deleteDirectory(new File(path)); + DataMapStoreManager.getInstance().clearDataMaps(AbsoluteTableIdentifier.from(path)); +Field[] fields = new Field[2]; +fields[0] = new Field("name", DataTypes.STRING); +fields[1] = new Field("age", DataTypes.INT); + +TestUtil.writeFilesAndVerify(10, new Schema(fields), path); +CarbonReader reader; +reader = CarbonReader.builder(path).withRowRecordReader().withBatch(0).build(); + +int i = 0; +while (reader.hasNext()) { + Object[] row = reader.readNextBatchRow(); + Assert.assertEquals(row.length, 10); + i++; +} +Assert.assertEquals(i, 1); +FileUtils.deleteDirectory(new File(path)); + } + @Test public void testReadWithFilterOfNonTransactionalSimple() throws IOException, InterruptedException { String path = "./testWriteFiles"; @@ -532,6 +556,7 @@ public class CarbonReaderTest extends TestCase { .withCsvInput(schema).writtenBy("CarbonReaderTest").build(); } catch (InvalidLoadOptionException e) { e.printStackTrace(); + Assert.fail(e.getMessage()); } carbonWriter.write(new String[] { "MNO", "100" }); carbonWriter.close(); @@ -546,22 +571,25 @@ public class CarbonReaderTest extends TestCase { .withCsvInput(schema1).writtenBy("CarbonReaderTest").build(); } catch (InvalidLoadOptionException e) { e.printStackTrace(); + Assert.fail(e.getMessage()); } carbonWriter1.write(new String[] { "PQR", "200" }); carbonWriter1.close(); try { - CarbonReader reader = - CarbonReader.builder(path1, "_temp"). - projection(new String[] { "c1", "c3" }) - .build(); -} catch (Exception e){ - System.out.println("Success"
[carbondata] branch master updated: [HOTFIX] Upgraded jars to work S3 with presto
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 3f63f91 [HOTFIX] Upgraded jars to work S3 with presto 3f63f91 is described below commit 3f63f91915d5da9d94a5c912b5415f230be64c07 Author: ravipesala AuthorDate: Sun Jan 27 15:12:29 2019 +0530 [HOTFIX] Upgraded jars to work S3 with presto There is a duplicate jar aws-java-sdk and low version jars avoid connecting to S3 in presto. Those jars are upgraded in this PR and updated doc. This closes #3110 --- .../statusmanager/SegmentUpdateStatusManager.java | 3 ++- docs/presto-guide.md | 18 --- integration/presto/pom.xml | 27 -- 3 files changed, 16 insertions(+), 32 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java b/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java index c5f5f74..a02e903 100644 --- a/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java +++ b/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java @@ -52,6 +52,7 @@ import org.apache.carbondata.core.util.CarbonUtil; import org.apache.carbondata.core.util.path.CarbonTablePath; import com.google.gson.Gson; +import org.apache.commons.lang3.StringUtils; import org.apache.log4j.Logger; /** @@ -655,7 +656,7 @@ public class SegmentUpdateStatusManager { // get the updated status file identifier from the table status. String tableUpdateStatusIdentifier = getUpdatedStatusIdentifier(); -if (null == tableUpdateStatusIdentifier) { +if (StringUtils.isEmpty(tableUpdateStatusIdentifier)) { return new SegmentUpdateDetails[0]; } diff --git a/docs/presto-guide.md b/docs/presto-guide.md index 054f29f..7389bc6 100644 --- a/docs/presto-guide.md +++ b/docs/presto-guide.md @@ -254,23 +254,15 @@ Now you can use the Presto CLI on the coordinator to query data sources in the c ``` Required properties -fs.s3a.access.key={value} -fs.s3a.secret.key={value} +hive.s3.aws-access-key={value} +hive.s3.aws-secret-key={value} Optional properties -fs.s3a.endpoint={value} +hive.s3.endpoint={value} ``` - * In case you want to query carbonstore on s3 using S3 api put following additional properties inside $PRESTO_HOME$/etc/catalog/carbondata.properties -``` - fs.s3.awsAccessKeyId={value} - fs.s3.awsSecretAccessKey={value} -``` - * In case You want to query carbonstore on s3 using S3N api put following additional properties inside $PRESTO_HOME$/etc/catalog/carbondata.properties -``` -fs.s3n.awsAccessKeyId={value} -fs.s3n.awsSecretAccessKey={value} - ``` + + Please refer https://prestodb.io/docs/current/connector/hive.html for more details on S3 integration. ### Generate CarbonData file diff --git a/integration/presto/pom.xml b/integration/presto/pom.xml index d69515d..8a9c06d 100644 --- a/integration/presto/pom.xml +++ b/integration/presto/pom.xml @@ -32,6 +32,7 @@ 0.210 +4.4.9 ${basedir}/../../dev true @@ -376,7 +377,7 @@ com.facebook.presto.hadoop hadoop-apache2 - 2.7.3-1 + 2.7.4-3 org.antlr @@ -522,23 +523,8 @@ jackson-core - com.fasterxml.jackson.core - jackson-annotations - - - com.fasterxml.jackson.core - jackson-databind - - - - - com.amazonaws - aws-java-sdk - 1.7.4 - - - com.fasterxml.jackson.core - jackson-core + com.amazonaws + aws-java-sdk com.fasterxml.jackson.core @@ -560,6 +546,11 @@ httpcore ${httpcore.version} + + org.apache.httpcomponents + httpclient + 4.5.5 +
[carbondata] branch master updated: [CARBONDATA-3235] Fixed Alter Table Rename
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 75d9eae [CARBONDATA-3235] Fixed Alter Table Rename 75d9eae is described below commit 75d9eae88dd9d2fba9814889d37a24f0b7cd9405 Author: namanrastogi AuthorDate: Wed Jan 23 17:57:35 2019 +0530 [CARBONDATA-3235] Fixed Alter Table Rename Fixed negative scenario: Alter Table Rename Table Fail Problem: When tabe rename is success in hive, for failed in carbon data store, it would throw exception, but would not go back and undo rename in hive. Solution: A flag to keep check if hive rename has already executed, and of the code breaks after hive rename is done, go back and undo the hive rename. This closes #3098 --- .../schema/CarbonAlterTableRenameCommand.scala | 34 +++--- 1 file changed, 17 insertions(+), 17 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala index 01698c9..33f3cd9 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala @@ -43,10 +43,12 @@ private[sql] case class CarbonAlterTableRenameCommand( override def processMetadata(sparkSession: SparkSession): Seq[Nothing] = { val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) -val oldTableIdentifier = alterTableRenameModel.oldTableIdentifier -val newTableIdentifier = alterTableRenameModel.newTableIdentifier -val oldDatabaseName = oldTableIdentifier.database +val oldTableName = alterTableRenameModel.oldTableIdentifier.table.toLowerCase +val newTableName = alterTableRenameModel.newTableIdentifier.table.toLowerCase +val oldDatabaseName = alterTableRenameModel.oldTableIdentifier.database .getOrElse(sparkSession.catalog.currentDatabase) +val oldTableIdentifier = TableIdentifier(oldTableName, Some(oldDatabaseName)) +val newTableIdentifier = TableIdentifier(newTableName, Some(oldDatabaseName)) setAuditTable(oldDatabaseName, oldTableIdentifier.table) setAuditInfo(Map("newName" -> alterTableRenameModel.newTableIdentifier.table)) val newDatabaseName = newTableIdentifier.database @@ -59,8 +61,6 @@ private[sql] case class CarbonAlterTableRenameCommand( throw new MalformedCarbonCommandException(s"Table with name $newTableIdentifier " + s"already exists") } -val oldTableName = oldTableIdentifier.table.toLowerCase -val newTableName = newTableIdentifier.table.toLowerCase LOGGER.info(s"Rename table request has been received for $oldDatabaseName.$oldTableName") val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore val relation: CarbonRelation = @@ -108,8 +108,8 @@ private[sql] case class CarbonAlterTableRenameCommand( dataMapSchemaList.addAll(indexSchemas) } // invalid data map for the old table, see CARBON-1690 - val oldTableIdentifier = carbonTable.getAbsoluteTableIdentifier - DataMapStoreManager.getInstance().clearDataMaps(oldTableIdentifier) + val oldAbsoluteTableIdentifier = carbonTable.getAbsoluteTableIdentifier + DataMapStoreManager.getInstance().clearDataMaps(oldAbsoluteTableIdentifier) // get the latest carbon table and check for column existence val operationContext = new OperationContext // TODO: Pass new Table Path in pre-event. @@ -125,7 +125,7 @@ private[sql] case class CarbonAlterTableRenameCommand( schemaEvolutionEntry.setTableName(newTableName) timeStamp = System.currentTimeMillis() schemaEvolutionEntry.setTime_stamp(timeStamp) - val newTableIdentifier = new CarbonTableIdentifier(oldDatabaseName, + val newCarbonTableIdentifier = new CarbonTableIdentifier(oldDatabaseName, newTableName, carbonTable.getCarbonTableIdentifier.getTableId) val oldIdentifier = TableIdentifier(oldTableName, Some(oldDatabaseName)) val newIdentifier = TableIdentifier(newTableName, Some(oldDatabaseName)) @@ -133,17 +133,17 @@ private[sql] case class CarbonAlterTableRenameCommand( var partitions: Seq[CatalogTablePartition] = Seq.empty if (carbonTable.isHivePartitionTable) { partitions = - sparkSession.sessionState.catalog.listPartitions(oldIdentifier) + sparkSession.sessionState.catalog.listPartitions(oldTableIdentifier) } - sparkSession.cata
[carbondata] branch master updated: [CARBONDATA-3264] Added SORT_SCOPE in ALTER TABLE SET
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 8e39ee1 [CARBONDATA-3264] Added SORT_SCOPE in ALTER TABLE SET 8e39ee1 is described below commit 8e39ee113236b7c48b8a0a46777cafc771701d9f Author: namanrastogi AuthorDate: Tue Jan 22 11:42:40 2019 +0530 [CARBONDATA-3264] Added SORT_SCOPE in ALTER TABLE SET Added SORT_SCOPE in ALTER TABLE SET Command. This command changes the SORT_SCOPE of table after table has been created. Usage: ALTER TABLE SET TBLPROPERTIES('sort_scope'='no_sort') Restrictions: Cannot change SORT_SCOPE from NO_SORT to anything else when SORT_COLUMNS is empty. This closes #3094 --- docs/ddl-of-carbondata.md | 58 +++--- .../org/apache/spark/util/AlterTableUtil.scala | 33 +-- .../restructure/AlterTableValidationTestCase.scala | 69 ++ 3 files changed, 134 insertions(+), 26 deletions(-) diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md index 4f9e47b..0d0e5bd 100644 --- a/docs/ddl-of-carbondata.md +++ b/docs/ddl-of-carbondata.md @@ -51,7 +51,7 @@ CarbonData DDL statements are documented here,which includes: * [RENAME COLUMN](#change-column-nametype) * [CHANGE COLUMN NAME/TYPE](#change-column-nametype) * [MERGE INDEXES](#merge-index) -* [SET/UNSET Local Dictionary Properties](#set-and-unset-for-local-dictionary-properties) +* [SET/UNSET](#set-and-unset) * [DROP TABLE](#drop-table) * [REFRESH TABLE](#refresh-table) * [COMMENTS](#table-and-column-comment) @@ -634,7 +634,7 @@ CarbonData DDL statements are documented here,which includes: The following section introduce the commands to modify the physical or logical state of the existing table(s). - - # RENAME TABLE + - RENAME TABLE This command is used to rename the existing table. ``` @@ -648,7 +648,7 @@ CarbonData DDL statements are documented here,which includes: ALTER TABLE test_db.carbon RENAME TO test_db.carbonTable ``` - - # ADD COLUMNS + - ADD COLUMNS This command is used to add a new column to the existing table. ``` @@ -676,7 +676,7 @@ Users can specify which columns to include and exclude for local dictionary gene ALTER TABLE carbon ADD COLUMNS (a1 STRING, b1 STRING) TBLPROPERTIES('LOCAL_DICTIONARY_INCLUDE'='a1','LOCAL_DICTIONARY_EXCLUDE'='b1') ``` - - # DROP COLUMNS + - DROP COLUMNS This command is used to delete the existing column(s) in a table. @@ -696,7 +696,7 @@ Users can specify which columns to include and exclude for local dictionary gene **NOTE:** Drop Complex child column is not supported. - - # CHANGE COLUMN NAME/TYPE + - CHANGE COLUMN NAME/TYPE This command is used to change column name and the data type from INT to BIGINT or decimal precision from lower to higher. Change of decimal data type from lower precision to higher precision will only be supported for cases where there is no data loss. @@ -729,7 +729,8 @@ Users can specify which columns to include and exclude for local dictionary gene ``` **NOTE:** Once the column is renamed, user has to take care about replacing the fileheader with the new name or changing the column header in csv file. -- # MERGE INDEX + + - MERGE INDEX This command is used to merge all the CarbonData index files (.carbonindex) inside a segment to a single CarbonData index merge file (.carbonindexmerge). This enhances the first query performance. @@ -747,23 +748,36 @@ Users can specify which columns to include and exclude for local dictionary gene * Merge index is not supported on streaming table. -- # SET and UNSET for Local Dictionary Properties - - When set command is used, all the newly set properties will override the corresponding old properties if exists. - - Example to SET Local Dictionary Properties: - ``` - ALTER TABLE tablename SET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='false','LOCAL_DICTIONARY_THRESHOLD'='1000','LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2') - ``` - When Local Dictionary properties are unset, corresponding default values will be used for these properties. + - SET and UNSET - Example to UNSET Local Dictionary Properties: - ``` - ALTER TABLE tablename UNSET TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE') - ``` - - **NOTE:** For old tables, by default, local dictionary is disabled. If user wants local dictionary for these tables, user can enable/disable local dictionary for new data at their discretion
[carbondata] branch master updated: [CARBONDATA-3257] Fix for NO_SORT load and describe formatted being in NO_SORT flow even with Sort Columns given
This is an automated email from the ASF dual-hosted git repository. kumarvishal09 pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 7916aa6 [CARBONDATA-3257] Fix for NO_SORT load and describe formatted being in NO_SORT flow even with Sort Columns given 7916aa6 is described below commit 7916aa67f9cdbc171300f45137aed6e38e76d749 Author: manishnalla1994 AuthorDate: Mon Jan 21 17:23:37 2019 +0530 [CARBONDATA-3257] Fix for NO_SORT load and describe formatted being in NO_SORT flow even with Sort Columns given Problem: Data Load is in No sort flow when version is upgraded even if sort columns are given. Also describe formatted displays wrong sort scope after refresh. Solution: Added a condition to check for the presence of Sort Columns. This closes #3083 --- .../core/constants/CarbonCommonConstants.java | 1 + .../sdv/generated/SetParameterTestCase.scala | 8 +++--- .../command/carbonTableSchemaCommon.scala | 12 - .../command/management/CarbonLoadDataCommand.scala | 31 +++--- .../table/CarbonDescribeFormattedCommand.scala | 18 ++--- 5 files changed, 42 insertions(+), 28 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index b7d9761..86bf5f1 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -426,6 +426,7 @@ public final class CarbonCommonConstants { */ public static final String DICTIONARY_PATH = "dictionary_path"; public static final String SORT_COLUMNS = "sort_columns"; + public static final String SORT_SCOPE = "sort_scope"; public static final String RANGE_COLUMN = "range_column"; public static final String PARTITION_TYPE = "partition_type"; public static final String NUM_PARTITIONS = "num_partitions"; diff --git a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala index 8c336d8..54d9e3f 100644 --- a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala +++ b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala @@ -209,11 +209,11 @@ class SetParameterTestCase extends QueryTest with BeforeAndAfterAll { sql("SET carbon.options.sort.scope=local_sort") sql( "create table carbon_table(empno int, empname String, designation String, doj Timestamp," + - "workgroupcategory int) STORED BY 'org.apache.carbondata.format'") -checkExistence(sql("DESC FORMATTED carbon_table"), true, "LOCAL_SORT") -val sortscope=sql("DESC FORMATTED carbon_table").collect().filter(_.getString(1).trim.equals("LOCAL_SORT")) + "workgroupcategory int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('SORT_COLUMNS'='empno,empname')") +checkExistence(sql("DESC FORMATTED carbon_table"), true, "local_sort") +val sortscope=sql("DESC FORMATTED carbon_table").collect().filter(_.getString(1).trim.equals("local_sort")) assertResult(1)(sortscope.length) -assertResult("LOCAL_SORT")(sortscope(0).getString(1).trim) +assertResult("local_sort")(sortscope(0).getString(1).trim) } test("TC_011-test SET property to Enable Unsafe Sort") { diff --git a/integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala b/integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala index 2ce9d89..b6b4e8d 100644 --- a/integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala +++ b/integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala @@ -854,18 +854,6 @@ class TableNewProcessor(cm: TableModel) { tableSchema.getTableId, cm.databaseNameOp.getOrElse("default")) tablePropertiesMap.put("bad_record_path", badRecordsPath) -if (tablePropertiesMap.get("sort_columns") != null) { - val sortCol = tablePropertiesMap.get("sort_columns") - if ((!sortCol.trim.isEmpty) && tablePropertiesMap.get("sort_scope") == null) { -// If
carbondata git commit: [CARBONDATA-3237] Fix presto carbon issues in dictionary include scenario
Repository: carbondata Updated Branches: refs/heads/master 1b45c41fe -> 8e6def9fa [CARBONDATA-3237] Fix presto carbon issues in dictionary include scenario problem1: Decimal column with dictionary include cannot be read in presto cause: int is typecasted to decimal for dictionary columns in decimal stream reader. solution: keep original data type as well as new data type for decimal stream reader. problem2: Optimize presto query time for dictionary include string column currently, for each query, presto carbon creates dictionary block for string columns. cause: This happens for each query and if cardinality is more , it takes more time to build. solution: dictionary block is not required. we can lookup using normal dictionary lookup. This closes #3055 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/8e6def9f Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/8e6def9f Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/8e6def9f Branch: refs/heads/master Commit: 8e6def9facc6c51de58ee655961ac4710c252bc0 Parents: 1b45c41 Author: ajantha-bhat Authored: Mon Jan 7 14:50:11 2019 +0530 Committer: kumarvishal09 Committed: Wed Jan 9 18:21:10 2019 +0530 -- .../carbondata/presto/CarbonVectorBatch.java| 12 ++--- .../readers/DecimalSliceStreamReader.java | 9 ++-- .../presto/readers/SliceStreamReader.java | 53 .../CarbonDictionaryDecodeReadSupport.scala | 22 +--- .../presto/util/CarbonDataStoreCreator.scala| 1 + 5 files changed, 47 insertions(+), 50 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/8e6def9f/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java -- diff --git a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java index fb8300a..140e46b 100644 --- a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java +++ b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java @@ -37,8 +37,6 @@ import org.apache.carbondata.presto.readers.ShortStreamReader; import org.apache.carbondata.presto.readers.SliceStreamReader; import org.apache.carbondata.presto.readers.TimestampStreamReader; -import com.facebook.presto.spi.block.Block; - public class CarbonVectorBatch { private static final int DEFAULT_BATCH_SIZE = 4 * 1024; @@ -63,8 +61,7 @@ public class CarbonVectorBatch { DataType[] dataTypes = readSupport.getDataTypes(); for (int i = 0; i < schema.length; ++i) { - columns[i] = createDirectStreamReader(maxRows, dataTypes[i], schema[i], dictionaries[i], - readSupport.getDictionaryBlock(i)); + columns[i] = createDirectStreamReader(maxRows, dataTypes[i], schema[i], dictionaries[i]); } } @@ -79,7 +76,7 @@ public class CarbonVectorBatch { } private CarbonColumnVectorImpl createDirectStreamReader(int batchSize, DataType dataType, - StructField field, Dictionary dictionary, Block dictionaryBlock) { + StructField field, Dictionary dictionary) { if (dataType == DataTypes.BOOLEAN) { return new BooleanStreamReader(batchSize, field.getDataType(), dictionary); } else if (dataType == DataTypes.SHORT) { @@ -93,9 +90,10 @@ public class CarbonVectorBatch { } else if (dataType == DataTypes.DOUBLE) { return new DoubleStreamReader(batchSize, field.getDataType(), dictionary); } else if (dataType == DataTypes.STRING) { - return new SliceStreamReader(batchSize, field.getDataType(), dictionaryBlock); + return new SliceStreamReader(batchSize, field.getDataType(), dictionary); } else if (DataTypes.isDecimal(dataType)) { - return new DecimalSliceStreamReader(batchSize, (DecimalType) field.getDataType(), dictionary); + return new DecimalSliceStreamReader(batchSize, field.getDataType(), (DecimalType) dataType, + dictionary); } else { return new ObjectStreamReader(batchSize, field.getDataType()); } http://git-wip-us.apache.org/repos/asf/carbondata/blob/8e6def9f/integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java -- diff --git a/integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java b/integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java index 2976ca7..ddc855a 100644 --- a/integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java
carbondata git commit: [CARBONDATA-3200] No-Sort compaction
Repository: carbondata Updated Branches: refs/heads/master 3a5572ee4 -> 1b45c41fe [CARBONDATA-3200] No-Sort compaction When the data is loaded with SORT_SCOPE as NO_SORT, and done compaction upon, the data still remains unsorted. This does not affect much in query. The major purpose of compaction, is better pack the data and improve query performance. Now, the expected behaviour of compaction is sort to the data, so that after compaction, query performance becomes better. The columns to sort upon are provided by SORT_COLUMNS. The new compaction works as follows: Do sorting on unsorted & restructured data and store in temporary files Pick a row from those temporary files, and already sorted carbondata files, according to a comparator on sort_columns. Write data to a new segment (similar to old compaction flow). Repeat steps 2 & 3 until no more rows are left. This closes #3029 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/1b45c41f Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/1b45c41f Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/1b45c41f Branch: refs/heads/master Commit: 1b45c41fe294a7a33ef748d13747c29cd3142670 Parents: 3a5572e Author: namanrastogi Authored: Wed Jan 2 16:26:09 2019 +0530 Committer: kumarvishal09 Committed: Wed Jan 9 18:05:35 2019 +0530 -- .../core/datastore/block/TableBlockInfo.java| 11 + .../blockletindex/BlockletDataMap.java | 1 + .../core/metadata/blocklet/BlockletInfo.java| 19 ++ .../core/metadata/blocklet/DataFileFooter.java | 13 + .../executor/impl/AbstractQueryExecutor.java| 3 + .../util/AbstractDataFileFooterConverter.java | 7 + .../core/util/DataFileFooterConverterV3.java| 5 + format/src/main/thrift/carbondata_index.thrift | 1 + .../compaction/TestHybridCompaction.scala | 262 +++ .../carbondata/spark/rdd/CarbonMergerRDD.scala | 45 ++-- .../carbondata/spark/rdd/StreamHandoffRDD.scala | 4 +- .../merger/AbstractResultProcessor.java | 6 +- .../merger/CarbonCompactionExecutor.java| 35 ++- .../processing/merger/CarbonCompactionUtil.java | 88 +-- .../merger/CompactionResultSortProcessor.java | 23 +- .../merger/RowResultMergerProcessor.java| 11 +- .../sortdata/InMemorySortTempChunkHolder.java | 147 +++ .../SingleThreadFinalSortFilesMerger.java | 52 ++-- .../sort/sortdata/SortTempFileChunkHolder.java | 18 +- .../store/CarbonFactDataHandlerModel.java | 1 + .../store/writer/AbstractFactDataWriter.java| 3 + .../writer/v3/CarbonFactDataWriterImplV3.java | 5 +- 22 files changed, 682 insertions(+), 78 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/1b45c41f/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java b/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java index c38124d..8ef2198 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java @@ -29,6 +29,7 @@ import org.apache.carbondata.core.datamap.Segment; import org.apache.carbondata.core.datastore.impl.FileFactory; import org.apache.carbondata.core.indexstore.BlockletDetailInfo; import org.apache.carbondata.core.metadata.ColumnarFormatVersion; +import org.apache.carbondata.core.metadata.blocklet.DataFileFooter; import org.apache.carbondata.core.util.ByteUtil; import org.apache.carbondata.core.util.path.CarbonTablePath; import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil; @@ -101,6 +102,8 @@ public class TableBlockInfo implements Distributable, Serializable { private String dataMapWriterPath; + private transient DataFileFooter dataFileFooter; + /** * comparator to sort by block size in descending order. * Since each line is not exactly the same, the size of a InputSplit may differs, @@ -462,6 +465,14 @@ public class TableBlockInfo implements Distributable, Serializable { this.dataMapWriterPath = dataMapWriterPath; } + public DataFileFooter getDataFileFooter() { +return dataFileFooter; + } + + public void setDataFileFooter(DataFileFooter dataFileFooter) { +this.dataFileFooter = dataFileFooter; + } + @Override public String toString() { final StringBuilder sb = new StringBuilder("TableBlockInfo{"); http://git-wip-us.apache.org/repos/asf/carbondata/blob/1b45c41f/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex
carbondata git commit: [CARBONDATA-3236] Fix for JVM Crash for insert into new table from old table
Repository: carbondata Updated Branches: refs/heads/master dd2fff269 -> 3a5572ee4 [CARBONDATA-3236] Fix for JVM Crash for insert into new table from old table Problem: Insert into new table from old table fails with JVM crash for file format(Using carbondata). This happened because both the query and load flow were assigned the same taskId and once query finished it freed the unsafe memory while the insert still in progress. Solution: As the flow for file format is direct flow and uses on-heap(safe) so no need to free the unsafe memory in query. This closes #3056 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/3a5572ee Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/3a5572ee Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/3a5572ee Branch: refs/heads/master Commit: 3a5572ee4d0b472e0a37aebf7c6d38e779c8eacb Parents: dd2fff2 Author: manishnalla1994 Authored: Tue Jan 8 16:12:55 2019 +0530 Committer: kumarvishal09 Committed: Wed Jan 9 17:08:51 2019 +0530 -- .../execution/datasources/SparkCarbonFileFormat.scala | 13 +++-- .../tasklisteners/CarbonTaskCompletionListener.scala | 2 +- 2 files changed, 4 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/3a5572ee/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala -- diff --git a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala index 8cb2ca4..f725de3 100644 --- a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala +++ b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala @@ -410,15 +410,8 @@ class SparkCarbonFileFormat extends FileFormat val model = format.createQueryModel(split, hadoopAttemptContext) model.setConverter(new SparkDataTypeConverterImpl) model.setPreFetchData(false) -var isAdded = false -Option(TaskContext.get()).foreach { context => - val onCompleteCallbacksField = context.getClass.getDeclaredField("onCompleteCallbacks") - onCompleteCallbacksField.setAccessible(true) - val listeners = onCompleteCallbacksField.get(context) -.asInstanceOf[ArrayBuffer[TaskCompletionListener]] - isAdded = listeners.exists(p => p.isInstanceOf[CarbonLoadTaskCompletionListener]) - model.setFreeUnsafeMemory(!isAdded) -} +// As file format uses on heap, no need to free unsafe memory +model.setFreeUnsafeMemory(false) val carbonReader = if (readVector) { model.setDirectVectorFill(true) val vectorizedReader = new VectorizedCarbonRecordReader(model, @@ -439,7 +432,7 @@ class SparkCarbonFileFormat extends FileFormat Option(TaskContext.get()).foreach{context => context.addTaskCompletionListener( CarbonQueryTaskCompletionListenerImpl( -iter.asInstanceOf[RecordReaderIterator[InternalRow]], !isAdded)) +iter.asInstanceOf[RecordReaderIterator[InternalRow]])) } if (carbonReader.isInstanceOf[VectorizedCarbonRecordReader] && readVector) { http://git-wip-us.apache.org/repos/asf/carbondata/blob/3a5572ee/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/tasklisteners/CarbonTaskCompletionListener.scala -- diff --git a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/tasklisteners/CarbonTaskCompletionListener.scala b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/tasklisteners/CarbonTaskCompletionListener.scala index eb3e42a..5547228 100644 --- a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/tasklisteners/CarbonTaskCompletionListener.scala +++ b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/tasklisteners/CarbonTaskCompletionListener.scala @@ -40,7 +40,7 @@ trait CarbonQueryTaskCompletionListener extends TaskCompletionListener trait CarbonLoadTaskCompletionListener extends TaskCompletionListener case class CarbonQueryTaskCompletionListenerImpl(iter: RecordReaderIterator[InternalRow], -
carbondata git commit: [CARBONDATA-3235] Fix Rename-Fail & Datamap-creation-Fail
Repository: carbondata Updated Branches: refs/heads/master 3a41ee5df -> dd2fff269 [CARBONDATA-3235] Fix Rename-Fail & Datamap-creation-Fail 1. Alter Table Rename Table Fail Problem: When tabe rename is success in hive, for failed in carbon data store, it would throw exception, but would not go back and undo rename in hive. Solution: A flag to keep check if hive rename has already executed, and of the code breaks after hive rename is done, go back and undo the hive rename. 2. Create-Preagregate-Datamap Fail Problem: When (preaggregate) datamap schema is written, but table updation is failed call CarbonDropDataMapCommand.processMetadata() call dropDataMapFromSystemFolder() -> this is supposed to delete the folder on disk, but doesnt as the datamap is not yet updated in table, and throws NoSuchDataMapException Solution: Call CarbonDropTableCommand.run() instead of CarbonDropTableCommand.processDatamap(). as CarbonDropTableCommand.processData() deletes actual folders from disk. This closes #2996 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/dd2fff26 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/dd2fff26 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/dd2fff26 Branch: refs/heads/master Commit: dd2fff269a6b416cbe0af8bd1a9e7108a02fd600 Parents: 3a41ee5 Author: namanrastogi Authored: Thu Dec 13 16:09:58 2018 +0530 Committer: kumarvishal09 Committed: Wed Jan 9 14:16:20 2019 +0530 -- .../command/datamap/CarbonDropDataMapCommand.scala| 2 +- .../command/schema/CarbonAlterTableRenameCommand.scala| 10 +- 2 files changed, 10 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/dd2fff26/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala index 54096ca..0bafe04 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala @@ -103,7 +103,7 @@ case class CarbonDropDataMapCommand( Some(childCarbonTable.get.getDatabaseName), childCarbonTable.get.getTableName, dropChildTable = true) - commandToRun.processMetadata(sparkSession) + commandToRun.run(sparkSession) } dropDataMapFromSystemFolder(sparkSession) return Seq.empty http://git-wip-us.apache.org/repos/asf/carbondata/blob/dd2fff26/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala index dbf665a..01698c9 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala @@ -87,6 +87,7 @@ private[sql] case class CarbonAlterTableRenameCommand( var timeStamp = 0L var carbonTable: CarbonTable = null +var hiveRenameSuccess = false // lock file path to release locks after operation var carbonTableLockFilePath: String = null try { @@ -139,6 +140,7 @@ private[sql] case class CarbonAlterTableRenameCommand( oldIdentifier, newIdentifier, oldTableIdentifier.getTablePath) + hiveRenameSuccess = true metastore.updateTableSchemaForAlter( newTableIdentifier, @@ -165,6 +167,12 @@ private[sql] case class CarbonAlterTableRenameCommand( case e: ConcurrentOperationException => throw e case e: Exception => +if (hiveRenameSuccess) { + sparkSession.sessionState.catalog.asInstanceOf[CarbonSessionCatalog].alterTableRename( +newTableIdentifier, +oldTableIdentifier, +carbonTable.getAbsoluteTableIdentifier.getTableName) +} if (carbonTable != null) { AlterTableUtil.revertRenameTableChanges( newTableName, @@ -173,7 +181,7 @@ private[sql] case class CarbonAlterTa
carbondata git commit: [CARBONDATA-3201] Added load level SORT_SCOPE Added SORT_SCOPE in Load Options & in SET Command
Repository: carbondata Updated Branches: refs/heads/master 4e27b86df -> 77d2b4e8d [CARBONDATA-3201] Added load level SORT_SCOPE Added SORT_SCOPE in Load Options & in SET Command 1. Added load level SORT_SCOPE 2. Added Sort_Scope for PreAgg 3. Added sort_scope msg for LoadDataCommand 4. Added property CARBON.TABLE.LOAD.SORT.SCOPE.. to set table level sort_scope property 5. Removed test case veryfying LOAD_OPTIONS with SORT_SCOPE Load level SORT_SCOPE LOAD DATA INPATH 'path/to/data.csv' INTO TABLE my_table OPTIONS ( 'sort_scope'='no_sort' ) Priority of SORT_SCOPE Load Level (if provided) Table level (if provided) Default This closes #3014 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/77d2b4e8 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/77d2b4e8 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/77d2b4e8 Branch: refs/heads/master Commit: 77d2b4e8d132f768b83438845f6fb9660a74fe1f Parents: 4e27b86 Author: namanrastogi Authored: Fri Dec 21 13:03:30 2018 +0530 Committer: kumarvishal09 Committed: Wed Jan 9 14:06:17 2019 +0530 -- .../constants/CarbonLoadOptionConstants.java| 6 .../carbondata/core/util/SessionParams.java | 8 - .../TestCreateTableWithSortScope.scala | 19 --- .../streaming/StreamSinkFactory.scala | 2 +- .../spark/sql/catalyst/CarbonDDLSqlParser.scala | 3 +- .../CarbonAlterTableCompactionCommand.scala | 4 +-- .../management/CarbonLoadDataCommand.scala | 35 +--- .../preaaggregate/PreAggregateListeners.scala | 7 ++-- .../preaaggregate/PreAggregateTableHelper.scala | 3 +- .../preaaggregate/PreAggregateUtil.scala| 2 ++ .../execution/command/CarbonHiveCommands.scala | 19 --- .../commands/SetCommandTestCase.scala | 28 .../processing/loading/events/LoadEvents.java | 13 +++- 13 files changed, 111 insertions(+), 38 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/77d2b4e8/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java index 5cf6163..eef2bef 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java @@ -81,6 +81,12 @@ public final class CarbonLoadOptionConstants { "carbon.options.sort.scope"; /** + * option to specify table level sort_scope + */ + @CarbonProperty(dynamicConfigurable = true) + public static final String CARBON_TABLE_LOAD_SORT_SCOPE = "carbon.table.load.sort.scope."; + + /** * option to specify the batch sort size inmb */ @CarbonProperty(dynamicConfigurable = true) http://git-wip-us.apache.org/repos/asf/carbondata/blob/77d2b4e8/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java b/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java index f49747f..d9aa214 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java +++ b/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java @@ -161,7 +161,7 @@ public class SessionParams implements Serializable, Cloneable { isValid = CarbonUtil.isValidSortOption(value); if (!isValid) { throw new InvalidConfigurationException("The sort scope " + key - + " can have only either BATCH_SORT or LOCAL_SORT or NO_SORT."); + + " can have only either NO_SORT, BATCH_SORT, LOCAL_SORT or GLOBAL_SORT."); } break; case CARBON_OPTIONS_BATCH_SORT_SIZE_INMB: @@ -229,6 +229,12 @@ public class SessionParams implements Serializable, Cloneable { if (!isValid) { throw new InvalidConfigurationException("Invalid value " + value + " for key " + key); } +} else if (key.startsWith(CarbonLoadOptionConstants.CARBON_TABLE_LOAD_SORT_SCOPE)) { + isValid = CarbonUtil.isValidSortOption(value); + if (!isValid) { +throw new InvalidConfigurationException("The sort scope " + key ++ " can have only either NO_SORT, BATCH_SORT, LOCAL_SORT or GLOBAL_SORT."); + } } else { throw new InvalidConfiguratio
carbondata git commit: [CARBONDATA-3189] Fix PreAggregate Datamap Issue
Repository: carbondata Updated Branches: refs/heads/master 72da33495 -> aad9aabf9 [CARBONDATA-3189] Fix PreAggregate Datamap Issue Problem - Load and Select query was failing on table with preaggregate datamap. Cause - Previously if query on datamap was not enabled in thread, there was no check afterwards. Solution - After checking whether thread param for Direct Query On Datamap is enable. If not enable, we check in session params and then global. This closes #3010 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/aad9aabf Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/aad9aabf Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/aad9aabf Branch: refs/heads/master Commit: aad9aabf960dce5227ef8e59a56c25c0972d221c Parents: 72da334 Author: Shubh18s Authored: Thu Dec 20 16:47:32 2018 +0530 Committer: kumarvishal09 Committed: Mon Jan 7 14:13:37 2019 +0530 -- .../core/constants/CarbonCommonConstants.java | 6 --- docs/configuration-parameters.md| 3 +- .../preaggregate/TestPreAggCreateCommand.scala | 42 .../apache/spark/sql/test/util/QueryTest.scala | 2 +- .../preaaggregate/PreAggregateUtil.scala| 8 ++-- .../sql/optimizer/CarbonLateDecodeRule.scala| 23 ++- 6 files changed, 21 insertions(+), 63 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/aad9aabf/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 8d0a4d9..c1ef940 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1450,12 +1450,6 @@ public final class CarbonCommonConstants { public static final String SUPPORT_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE = "false"; @CarbonProperty - public static final String VALIDATE_DIRECT_QUERY_ON_DATAMAP = - "carbon.query.validate.direct.query.on.datamap"; - - public static final String VALIDATE_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE = "true"; - - @CarbonProperty public static final String CARBON_SHOW_DATAMAPS = "carbon.query.show.datamaps"; public static final String CARBON_SHOW_DATAMAPS_DEFAULT = "true"; http://git-wip-us.apache.org/repos/asf/carbondata/blob/aad9aabf/docs/configuration-parameters.md -- diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md index db21c6a..105b768 100644 --- a/docs/configuration-parameters.md +++ b/docs/configuration-parameters.md @@ -135,7 +135,6 @@ This section provides the details of all the configurations required for the Car | carbon.custom.block.distribution | false | CarbonData has its own scheduling algorithm to suggest to Spark on how many tasks needs to be launched and how much work each task need to do in a Spark cluster for any query on CarbonData. When this configuration is true, CarbonData would distribute the available blocks to be scanned among the available number of cores. For Example:If there are 10 blocks to be scanned and only 3 tasks can be run(only 3 executor cores available in the cluster), CarbonData would combine blocks as 4,3,3 and give it to 3 tasks to run. **NOTE:** When this configuration is false, as per the ***carbon.task.distribution*** configuration, each block/blocklet would be given to each task. | | enable.query.statistics | false | CarbonData has extensive logging which would be useful for debugging issues related to performance or hard to locate issues. This configuration when made ***true*** would log additional query statistics information to more accurately locate the issues being debugged.**NOTE:** Enabling this would log more debug information to log files, there by increasing the log files size significantly in short span of time. It is advised to configure the log files size, retention of log files parameters in log4j properties appropriately. Also extensive logging is an increased IO operation and hence over all query performance might get reduced. Therefore it is recommended to enable this configuration only for the duration of debugging. | | enable.unsafe.in.query.processing | false | CarbonData supports unsafe operations of Java to avoid GC overhead for certain operations. This configuration enables to use unsafe functions in CarbonData while scanning the data during query. | -|
carbondata git commit: [CARBONDATA-3217] Optimize implicit filter expression performance by removing extra serialization
Repository: carbondata Updated Branches: refs/heads/master 9fa045d40 -> bc1e94472 [CARBONDATA-3217] Optimize implicit filter expression performance by removing extra serialization Fixed performance issue for Implicit filter column 1. Removed serialization all the implicit filter values in each task. Instead serialized values only for the blocks going to particular task 2. Removed 2 times deserialization of implicit filter values in executor for each task. 1 time is sufficient This closes #3039 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/bc1e9447 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/bc1e9447 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/bc1e9447 Branch: refs/heads/master Commit: bc1e94472d845cca548c59c5198ffdcd5c78b571 Parents: 9fa045d Author: manishgupta88 Authored: Thu Dec 27 15:18:07 2018 +0530 Committer: kumarvishal09 Committed: Fri Jan 4 16:37:54 2019 +0530 -- .../indexstore/blockletindex/BlockDataMap.java | 3 +- .../conditional/ImplicitExpression.java | 109 + .../core/scan/filter/ColumnFilterInfo.java | 43 ++- .../carbondata/core/scan/filter/FilterUtil.java | 73 ++-- .../ImplicitIncludeFilterExecutorImpl.java | 23 +++- .../core/scan/filter/intf/ExpressionType.java | 3 +- .../visitor/ImplicitColumnVisitor.java | 24 ++-- .../carbondata/hadoop/CarbonInputSplit.java | 28 + .../hadoop/api/CarbonInputFormat.java | 43 ++- .../TestImplicitFilterExpression.scala | 117 +++ .../carbondata/spark/rdd/CarbonScanRDD.scala| 31 - .../spark/sql/optimizer/CarbonFilters.scala | 15 ++- 12 files changed, 443 insertions(+), 69 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/bc1e9447/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java index 6b04cf7..e29dfef 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java @@ -32,6 +32,7 @@ import org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap; import org.apache.carbondata.core.datastore.block.SegmentProperties; import org.apache.carbondata.core.datastore.block.SegmentPropertiesAndSchemaHolder; import org.apache.carbondata.core.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.datastore.impl.FileFactory; import org.apache.carbondata.core.indexstore.AbstractMemoryDMStore; import org.apache.carbondata.core.indexstore.BlockMetaInfo; import org.apache.carbondata.core.indexstore.Blocklet; @@ -485,7 +486,7 @@ public class BlockDataMap extends CoarseGrainDataMap String fileName = filePath + CarbonCommonConstants.FILE_SEPARATOR + new String( dataMapRow.getByteArray(FILE_PATH_INDEX), CarbonCommonConstants.DEFAULT_CHARSET_CLASS) + CarbonTablePath.getCarbonDataExtension(); -return fileName; +return FileFactory.getUpdatedFilePath(fileName); } private void addTaskSummaryRowToUnsafeMemoryStore(CarbonRowSchema[] taskSummarySchema, http://git-wip-us.apache.org/repos/asf/carbondata/blob/bc1e9447/core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java b/core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java new file mode 100644 index 000..eab564e --- /dev/null +++ b/core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java @@ -0,0 +1,109 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, ei
carbondata git commit: [CARBONDATA-3212] Fixed NegativeArraySizeException while querying in specific scenario
Repository: carbondata Updated Branches: refs/heads/master f8697b106 -> deb08c329 [CARBONDATA-3212] Fixed NegativeArraySizeException while querying in specific scenario Problem:In Local Dictionary, page size was not getting updated for complex children columns. So during fallback, new page was being created with less records giving NegativeArraySizeException while querying data. Solution:Updated the page size in Local Dictionary. This closes#3031 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/deb08c32 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/deb08c32 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/deb08c32 Branch: refs/heads/master Commit: deb08c329287dc7bcdf96af6a6611f7c4b0fc83a Parents: f8697b1 Author: shivamasn Authored: Wed Jan 2 16:19:22 2019 +0530 Committer: kumarvishal09 Committed: Thu Jan 3 11:06:59 2019 +0530 -- .../carbondata/core/datastore/page/LocalDictColumnPage.java | 3 +++ 1 file changed, 3 insertions(+) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/deb08c32/core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java index 5cf2130..0e34d72 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java @@ -140,6 +140,9 @@ public class LocalDictColumnPage extends ColumnPage { } else { actualDataColumnPage.putBytes(rowId, bytes); } +if (pageSize <= rowId) { + pageSize = rowId + 1; +} } @Override public void disableLocalDictEncoding() {
carbondata git commit: [CARBONDATA-3218] Fix schema refresh and wrong query result issues in presto.
Repository: carbondata Updated Branches: refs/heads/master 7477527e9 -> f8697b106 [CARBONDATA-3218] Fix schema refresh and wrong query result issues in presto. Problem: Schema which is updated in spark is not reflecting in presto. which results in wrong query result in presto. Solution: Update the schema in presto whenever the schema changed in spark. And also override the putNulls method in all presto readers to work for null data scenarios. This closes #3041 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/f8697b10 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/f8697b10 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/f8697b10 Branch: refs/heads/master Commit: f8697b1065cd76e3b96be571fd78761a44a58e7e Parents: 7477527 Author: ravipesala Authored: Mon Dec 31 17:20:24 2018 +0530 Committer: kumarvishal09 Committed: Wed Jan 2 18:59:05 2019 +0530 -- .../presto/CarbondataPageSourceProvider.java| 7 +- .../presto/CarbondataSplitManager.java | 65 +- .../presto/impl/CarbonTableCacheModel.java | 29 - .../presto/impl/CarbonTableReader.java | 119 --- .../presto/readers/BooleanStreamReader.java | 6 + .../readers/DecimalSliceStreamReader.java | 12 ++ .../presto/readers/DoubleStreamReader.java | 12 ++ .../presto/readers/IntegerStreamReader.java | 12 ++ .../presto/readers/LongStreamReader.java| 12 ++ .../presto/readers/ObjectStreamReader.java | 6 + .../presto/readers/ShortStreamReader.java | 12 ++ .../presto/readers/SliceStreamReader.java | 24 +++- .../presto/readers/TimestampStreamReader.java | 12 ++ 13 files changed, 215 insertions(+), 113 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/f8697b10/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java -- diff --git a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java index d7b7266..c81e0c3 100644 --- a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java +++ b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java @@ -230,10 +230,11 @@ public class CarbondataPageSourceProvider extends HivePageSourceProvider { .getCarbonCache(new SchemaTableName(carbonSplit.getDatabase(), carbonSplit.getTable()), carbonSplit.getSchema().getProperty("tablePath"), configuration); checkNotNull(tableCacheModel, "tableCacheModel should not be null"); -checkNotNull(tableCacheModel.carbonTable, "tableCacheModel.carbonTable should not be null"); -checkNotNull(tableCacheModel.carbonTable.getTableInfo(), +checkNotNull(tableCacheModel.getCarbonTable(), +"tableCacheModel.carbonTable should not be null"); +checkNotNull(tableCacheModel.getCarbonTable().getTableInfo(), "tableCacheModel.carbonTable.tableInfo should not be null"); -return tableCacheModel.carbonTable; +return tableCacheModel.getCarbonTable(); } } http://git-wip-us.apache.org/repos/asf/carbondata/blob/f8697b10/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataSplitManager.java -- diff --git a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataSplitManager.java b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataSplitManager.java index ded00fc..6efef93 100755 --- a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataSplitManager.java +++ b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataSplitManager.java @@ -119,45 +119,40 @@ public class CarbondataSplitManager extends HiveSplitManager { configuration = carbonTableReader.updateS3Properties(configuration); CarbonTableCacheModel cache = carbonTableReader.getCarbonCache(schemaTableName, location, configuration); -if (null != cache) { - Expression filters = PrestoFilterUtil.parseFilterExpression(predicate); - try { - -List splits = -carbonTableReader.getInputSplits2(cache, filters, predicate, configuration); - -ImmutableList.Builder cSplits = ImmutableList.builder(); -long index = 0; -for (CarbonLocalMultiBlockSplit split : splits) { - index++; - Properties properties = new Properties(); - for (Map.Entry entry : table.getSt
carbondata git commit: [CARBONDATA-3195]Added validation for Inverted Index columns and added a test case in case of varchar
Repository: carbondata Updated Branches: refs/heads/master d85d54324 -> f5c1b7bbd [CARBONDATA-3195]Added validation for Inverted Index columns and added a test case in case of varchar This PR is to add a validation for inverted index when inverted index columns are not present in the sort columns they should throw a exception. Also added a test case in case when varchar columns are passed as inverted index. This closes #3020 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/f5c1b7bb Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/f5c1b7bb Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/f5c1b7bb Branch: refs/heads/master Commit: f5c1b7bbd2485e1186e3a7c718d3f539599905a5 Parents: d85d543 Author: shardul-cr7 Authored: Mon Dec 24 12:51:16 2018 +0530 Committer: kumarvishal09 Committed: Fri Dec 28 17:01:56 2018 +0530 -- docs/ddl-of-carbondata.md | 4 +++- .../dataload/TestNoInvertedIndexLoadAndQuery.scala | 8 .../longstring/VarcharDataTypesBasicTestCase.scala | 13 + .../apache/spark/sql/catalyst/CarbonDDLSqlParser.scala | 10 ++ 4 files changed, 30 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/f5c1b7bb/docs/ddl-of-carbondata.md -- diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md index 3d3db1e..d1a4794 100644 --- a/docs/ddl-of-carbondata.md +++ b/docs/ddl-of-carbondata.md @@ -126,9 +126,11 @@ CarbonData DDL statements are documented here,which includes: By default inverted index is disabled as store size will be reduced, it can be enabled by using a table property. It might help to improve compression ratio and query speed, especially for low cardinality columns which are in reward position. Suggested use cases : For high cardinality columns, you can disable the inverted index for improving the data loading performance. + + **NOTE**: Columns specified in INVERTED_INDEX should also be present in SORT_COLUMNS. ``` - TBLPROPERTIES ('NO_INVERTED_INDEX'='column1', 'INVERTED_INDEX'='column2, column3') + TBLPROPERTIES ('SORT_COLUMNS'='column2,column3','NO_INVERTED_INDEX'='column1', 'INVERTED_INDEX'='column2, column3') ``` - # Sort Columns Configuration http://git-wip-us.apache.org/repos/asf/carbondata/blob/f5c1b7bb/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala index 13f8adb..f483827 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala @@ -305,7 +305,7 @@ class TestNoInvertedIndexLoadAndQuery extends QueryTest with BeforeAndAfterAll { CREATE TABLE IF NOT EXISTS index1 (id Int, name String, city String) STORED BY 'org.apache.carbondata.format' - TBLPROPERTIES('DICTIONARY_INCLUDE'='id','INVERTED_INDEX'='city,name') + TBLPROPERTIES('DICTIONARY_INCLUDE'='id','INVERTED_INDEX'='city,name', 'SORT_COLUMNS'='city,name') """) sql( s""" @@ -333,14 +333,14 @@ class TestNoInvertedIndexLoadAndQuery extends QueryTest with BeforeAndAfterAll { CREATE TABLE IF NOT EXISTS index1 (id Int, name String, city String) STORED BY 'org.apache.carbondata.format' - TBLPROPERTIES('INVERTED_INDEX'='city,name,id') + TBLPROPERTIES('INVERTED_INDEX'='city,name,id','SORT_COLUMNS'='city,name,id') """) val carbonTable = CarbonMetadata.getInstance().getCarbonTable("default", "index1") assert(carbonTable.getColumnByName("index1", "city").getColumnSchema.getEncodingList .contains(Encoding.INVERTED_INDEX)) assert(carbonTable.getColumnByName("index1", "name").getColumnSchema.getEncodingList .contains(Encoding.INVERTED_INDEX)) -assert(!carbonTable.getColumnByName("index1", "id").getColumnSchema.getEncodingList +assert(carbonTable.get
carbondata git commit: [CARBONDATA-3192] Fix for compaction compatibilty issue
Repository: carbondata Updated Branches: refs/heads/master 10bc5c2ec -> f4c1c672b [CARBONDATA-3192] Fix for compaction compatibilty issue Problem: Table Created, Loaded and Altered(Column added) in 1.5.1 version and Refreshed, Altered(Added Column dropped) , Loaded and Compacted with Varchar Columns in new version giving error. Solution: Corrected the Varchar Dimension index calculation by calculating it based on the columns which have been deleted (invisibleColumns). Hence giving the correct ordinals after deletion. This closes #3016 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/f4c1c672 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/f4c1c672 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/f4c1c672 Branch: refs/heads/master Commit: f4c1c672be19201c2c98fe84f6143f1323a60bbf Parents: 10bc5c2 Author: manishnalla1994 Authored: Fri Dec 21 19:11:46 2018 +0530 Committer: kumarvishal09 Committed: Mon Dec 24 13:29:28 2018 +0530 -- .../processing/store/CarbonFactDataHandlerModel.java| 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/f4c1c672/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java -- diff --git a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java b/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java index e759c02..c60da45 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java +++ b/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java @@ -314,18 +314,23 @@ public class CarbonFactDataHandlerModel { // for dynamic page size in write step if varchar columns exist List varcharDimIdxInNoDict = new ArrayList<>(); -List allDimensions = carbonTable.getDimensions(); +List allDimensions = carbonTable.getAllDimensions(); int dictDimCount = allDimensions.size() - segmentProperties.getNumberOfNoDictionaryDimension() - segmentProperties.getComplexDimensions().size(); CarbonColumn[] noDicAndComplexColumns = new CarbonColumn[segmentProperties.getNumberOfNoDictionaryDimension() + segmentProperties .getComplexDimensions().size()]; int noDicAndComp = 0; +int invisibleCount = 0; for (CarbonDimension dim : allDimensions) { + if (dim.isInvisible()) { +invisibleCount++; +continue; + } if (!dim.isComplex() && !dim.hasEncoding(Encoding.DICTIONARY) && dim.getDataType() == DataTypes.VARCHAR) { // ordinal is set in CarbonTable.fillDimensionsAndMeasuresForTables() -varcharDimIdxInNoDict.add(dim.getOrdinal() - dictDimCount); +varcharDimIdxInNoDict.add(dim.getOrdinal() - dictDimCount - invisibleCount); } if (!dim.hasEncoding(Encoding.DICTIONARY)) { noDicAndComplexColumns[noDicAndComp++] =
carbondata git commit: [CARBONDATA-3186]Avoid creating empty carbondata file when all the records are bad record with action redirect.
Repository: carbondata Updated Branches: refs/heads/master bd752e9d5 -> 10bc5c2ec [CARBONDATA-3186]Avoid creating empty carbondata file when all the records are bad record with action redirect. problem: In the no_sort flow, writer will be open as there is no blocking sort step. So, when all the record goes as bad record with redirect in converted step. writer is closing the empty .carbondata file. when this empty carbondata file is queried , we get multiple issues including NPE. solution: When the file size is 0 bytes. do the following a) If one data and one index file -- delete carbondata file and avoid index file creation b) If multiple data and one index file (with few data file is full of bad recod) -- delete carbondata files, remove them from blockIndexInfoList, so index file not will not have that info of empty carbon files c) In case direct write to store path is enable. need to delete data file from there and avoid writing index file with that carbondata in info. [HOTFIX] Presto NPE when non-transactional table is cached for s3a/HDFS. cause: for non-transactional table, schema must not be read. solution: use inferred schema, instead of checking schema file. This closes #3003 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/10bc5c2e Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/10bc5c2e Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/10bc5c2e Branch: refs/heads/master Commit: 10bc5c2ec69711c12bc379e9f0997d3363543364 Parents: bd752e9 Author: ajantha-bhat Authored: Wed Dec 19 18:27:53 2018 +0530 Committer: kumarvishal09 Committed: Mon Dec 24 13:22:41 2018 +0530 -- .../presto/impl/CarbonTableReader.java | 4 +- .../TestNonTransactionalCarbonTable.scala | 29 - .../store/writer/AbstractFactDataWriter.java| 45 3 files changed, 65 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/10bc5c2e/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java -- diff --git a/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java b/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java index 9677839..363f3f5 100755 --- a/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java +++ b/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java @@ -288,8 +288,8 @@ public class CarbonTableReader { } if (isKeyExists) { CarbonTableCacheModel carbonTableCacheModel = carbonCache.get().get(schemaTableName); - if (carbonTableCacheModel != null - && carbonTableCacheModel.carbonTable.getTableInfo() != null) { + if (carbonTableCacheModel != null && carbonTableCacheModel.carbonTable.getTableInfo() != null + && carbonTableCacheModel.carbonTable.isTransactionalTable()) { Long latestTime = FileFactory.getCarbonFile(CarbonTablePath .getSchemaFilePath(carbonCache.get().get(schemaTableName).carbonTable.getTablePath())) .getLastModifiedTime(); http://git-wip-us.apache.org/repos/asf/carbondata/blob/10bc5c2e/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala index a166789..1c211e3 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala @@ -34,7 +34,7 @@ import org.apache.avro.generic.{GenericDatumReader, GenericDatumWriter, GenericR import org.apache.avro.io.{DecoderFactory, Encoder} import org.apache.commons.io.FileUtils import org.apache.spark.sql.test.util.QueryTest -import org.apache.spark.sql.{CarbonEnv, Row} +import org.apache.spark.sql.{AnalysisException, CarbonEnv, Row} import org.junit.Assert import org.scalatest.BeforeAndAfterAll @@ -119,6 +119,13 @@ class TestNonTransactionalCarbonTable extends QueryTest with BeforeAndAfterAll { buildTestData(rows, options, List("name")) } + def buildTestDataWithOptionsAndEmptySortColumn(rows: Int, + op
carbondata git commit: [CARBONDATA-3179] Map Data Load Failure and Struct Projection Pushdown Issue
Repository: carbondata Updated Branches: refs/heads/master 34923db0e -> 96b2ea364 [CARBONDATA-3179] Map Data Load Failure and Struct Projection Pushdown Issue Problem1 : Data Load failing for Insert into Select from same table in containing Map datatype. Solution: Map type was not handled for this scenario. Handled it now. Problem2 : Projection Pushdown not supported for table containing Struct of Map. Solution: Pass the parent column only for projection pushdown if table contains MapType. This closes #2993 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/96b2ea36 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/96b2ea36 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/96b2ea36 Branch: refs/heads/master Commit: 96b2ea3646a2a768133880bb2e4c1318d366b482 Parents: 34923db Author: manishnalla1994 Authored: Fri Dec 14 17:20:15 2018 +0530 Committer: kumarvishal09 Committed: Thu Dec 20 22:33:58 2018 +0530 -- .../TestCreateDDLForComplexMapType.scala| 71 +++- .../spark/rdd/CarbonGlobalDictionaryRDD.scala | 6 +- .../spark/rdd/NewCarbonDataLoadRDD.scala| 12 ++-- .../carbondata/spark/util/CarbonScalaUtil.scala | 25 --- .../spark/rdd/CarbonDataRDDFactory.scala| 5 +- .../sql/CarbonDatasourceHadoopRelation.scala| 37 ++ .../streaming/parser/FieldConverter.scala | 44 ++-- .../streaming/parser/RowStreamParserImp.scala | 16 +++-- 8 files changed, 150 insertions(+), 66 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/96b2ea36/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala index 09f23e5..9006b61 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala @@ -27,7 +27,6 @@ import org.apache.spark.sql.test.util.QueryTest import org.scalatest.BeforeAndAfterAll import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk - import scala.collection.JavaConversions._ class TestCreateDDLForComplexMapType extends QueryTest with BeforeAndAfterAll { @@ -471,4 +470,74 @@ class TestCreateDDLForComplexMapType extends QueryTest with BeforeAndAfterAll { "sort_columns is unsupported for map datatype column: mapfield")) } + test("Data Load Fail Issue") { +sql("DROP TABLE IF EXISTS carbon") +sql( + s""" + | CREATE TABLE carbon( + | mapField map + | ) + | STORED BY 'carbondata' + | """ +.stripMargin) +sql( + s""" + | LOAD DATA LOCAL INPATH '$path' + | INTO TABLE carbon OPTIONS( + | 'header' = 'false') + """.stripMargin) +sql("INSERT INTO carbon SELECT * FROM carbon") +checkAnswer(sql("select * from carbon"), Seq( + Row(Map(1 -> "Nalla", 2 -> "Singh", 4 -> "Kumar")), + Row(Map(1 -> "Nalla", 2 -> "Singh", 4 -> "Kumar")), + Row(Map(10 -> "Nallaa", 20 -> "Sissngh", 100 -> "Gusspta", 40 -> "Kumar")), + Row(Map(10 -> "Nallaa", 20 -> "Sissngh", 100 -> "Gusspta", 40 -> "Kumar")) + )) + } + + test("Struct inside map") { +sql("DROP TABLE IF EXISTS carbon") +sql( + s""" + | CREATE TABLE carbon( + | mapField map> + | ) + | STORED BY 'carbondata' + | """ +.stripMargin) +sql("INSERT INTO carbon values('1\002man\003nan\0012\002kands\003dsnknd')") +sql("INSERT INTO carbon SELECT * FROM carbon") +checkAnswer(sql("SELECT * FROM carbon limit 1"), + Seq(Row(Map(1 -> Row("man", "nan"), (2 -> Row("kands", "dsnknd")) + } + + test("Struct inside map pushdown") { +sql("DROP TABLE IF EXISTS carbon") +sql( + s""" +
carbondata git commit: [CARBONDATA-3187] Supported Global Dictionary For Map
Repository: carbondata Updated Branches: refs/heads/master 96ce00758 -> 5f0549a81 [CARBONDATA-3187] Supported Global Dictionary For Map Problem: Global Dictionary was not working for Map datatype and giving Null values. Solution:Added the case for Global Dictionary to be created in case the datatype is Complex Map. This closes #3006 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/5f0549a8 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/5f0549a8 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/5f0549a8 Branch: refs/heads/master Commit: 5f0549a81e2f232e927ed824db4a6791a633c95f Parents: 96ce007 Author: manishnalla1994 Authored: Thu Dec 20 11:23:46 2018 +0530 Committer: kumarvishal09 Committed: Thu Dec 20 16:53:08 2018 +0530 -- .../createTable/TestCreateDDLForComplexMapType.scala | 10 +- .../carbondata/spark/util/GlobalDictionaryUtil.scala | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/5f0549a8/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala index b8f7549..09f23e5 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala @@ -226,22 +226,22 @@ class TestCreateDDLForComplexMapType extends QueryTest with BeforeAndAfterAll { Row(Map(1 -> "Nalla", 2 -> "", 3 -> "Gupta", 4 -> "Kumar" } - // Support this for Map type + // Global Dictionary for Map type test("Test Load data in map with dictionary include") { sql("DROP TABLE IF EXISTS carbon") sql( s""" | CREATE TABLE carbon( - | mapField map + | mapField map | ) | STORED BY 'carbondata' | TBLPROPERTIES('DICTIONARY_INCLUDE'='mapField') | """ .stripMargin) -sql("insert into carbon values('1\002Nalla\0012\002Singh\0013\002Gupta')") +sql("insert into carbon values('vi\002Nalla\001sh\002Singh\001al\002Gupta')") sql("select * from carbon").show(false) -//checkAnswer(sql("select * from carbon"), Seq( -//Row(Map(1 -> "Nalla", 2 -> "Singh", 3 -> "Gupta", 4 -> "Kumar" +checkAnswer(sql("select * from carbon"), Seq( + Row(Map("vi" -> "Nalla", "sh" -> "Singh", "al" -> "Gupta" } test("Test Load data in map with partition columns") { http://git-wip-us.apache.org/repos/asf/carbondata/blob/5f0549a8/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala -- diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala index 704382f..922eadb 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala @@ -182,7 +182,7 @@ object GlobalDictionaryUtil { case None => None case Some(dim) => -if (DataTypes.isArrayType(dim.getDataType)) { +if (DataTypes.isArrayType(dim.getDataType) || DataTypes.isMapType(dim.getDataType)) { val arrDim = ArrayParser(dim, format) generateParserForChildrenDimension(dim, format, mapColumnValuesWithId, arrDim) Some(arrDim)
carbondata git commit: [CARBONDATA-3005]Support Gzip as column compressor
Repository: carbondata Updated Branches: refs/heads/master c7d2acb89 -> fd0885b03 [CARBONDATA-3005]Support Gzip as column compressor This PR is to add a new compressor "Gzip" and enhance the compressing capabilities offered by CarbonData. User can now use gzip as the compressor for loading the data. Gzip can be set at System Properties level or also for particular table. This closes #2847 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/fd0885b0 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/fd0885b0 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/fd0885b0 Branch: refs/heads/master Commit: fd0885b03c5e24c7f78851a9fdc80a0cea0e5980 Parents: c7d2acb Author: shardul-cr7 Authored: Tue Oct 23 17:27:47 2018 +0530 Committer: kumarvishal09 Committed: Tue Dec 11 14:55:41 2018 +0530 -- .../compression/AbstractCompressor.java | 1 + .../compression/CompressorFactory.java | 3 +- .../datastore/compression/GzipCompressor.java | 134 +++ .../datastore/compression/ZstdCompressor.java | 5 - .../dataload/TestLoadDataWithCompression.scala | 94 ++--- .../TestLoadWithSortTempCompressed.scala| 20 +++ 6 files changed, 236 insertions(+), 21 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/fd0885b0/core/src/main/java/org/apache/carbondata/core/datastore/compression/AbstractCompressor.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/compression/AbstractCompressor.java b/core/src/main/java/org/apache/carbondata/core/datastore/compression/AbstractCompressor.java index 0724bdc..c554dc6 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/compression/AbstractCompressor.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/compression/AbstractCompressor.java @@ -123,4 +123,5 @@ public abstract class AbstractCompressor implements Compressor { return false; } + @Override public boolean supportUnsafe() { return false; } } http://git-wip-us.apache.org/repos/asf/carbondata/blob/fd0885b0/core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java b/core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java index f7d4e06..b7779ba 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java @@ -36,7 +36,8 @@ public class CompressorFactory { public enum NativeSupportedCompressor { SNAPPY("snappy", SnappyCompressor.class), -ZSTD("zstd", ZstdCompressor.class); +ZSTD("zstd", ZstdCompressor.class), +GZIP("gzip", GzipCompressor.class); private String name; private Class compressorClass; http://git-wip-us.apache.org/repos/asf/carbondata/blob/fd0885b0/core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java b/core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java new file mode 100644 index 000..b386913 --- /dev/null +++ b/core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; + +import org.ap
carbondata git commit: [CARBONDATA-3145] Avoid duplicate decoding for complex column pages while querying
Repository: carbondata Updated Branches: refs/heads/master 4c9f08217 -> 0c94559e2 [CARBONDATA-3145] Avoid duplicate decoding for complex column pages while querying Problem: Column page is decoded for getting each row of a complex primitive column. Solution: Decode a page it once then use the same. This closes #2975 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/0c94559e Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/0c94559e Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/0c94559e Branch: refs/heads/master Commit: 0c94559e2feaf3d5a001665c3da2bfc3bf941043 Parents: 4c9f082 Author: dhatchayani Authored: Wed Dec 5 12:40:56 2018 +0530 Committer: kumarvishal09 Committed: Mon Dec 10 19:31:12 2018 +0530 -- .../core/scan/complextypes/ArrayQueryType.java | 11 ++-- .../scan/complextypes/ComplexQueryType.java | 14 +++- .../scan/complextypes/PrimitiveQueryType.java | 11 ++-- .../core/scan/complextypes/StructQueryType.java | 14 ++-- .../core/scan/filter/GenericQueryType.java | 4 +- .../executer/RowLevelFilterExecuterImpl.java| 7 +- .../core/scan/result/BlockletScannedResult.java | 68 +--- 7 files changed, 86 insertions(+), 43 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/0c94559e/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java b/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java index a5f4234..8538edb 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java @@ -22,6 +22,7 @@ import java.io.IOException; import java.nio.ByteBuffer; import java.util.Map; +import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage; import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; import org.apache.carbondata.core.scan.filter.GenericQueryType; @@ -62,17 +63,17 @@ public class ArrayQueryType extends ComplexQueryType implements GenericQueryType } public void parseBlocksAndReturnComplexColumnByteArray(DimensionRawColumnChunk[] rawColumnChunks, - int rowNumber, int pageNumber, DataOutputStream dataOutputStream) throws IOException { -byte[] input = copyBlockDataChunk(rawColumnChunks, rowNumber, pageNumber); + DimensionColumnPage[][] dimensionColumnPages, int rowNumber, int pageNumber, + DataOutputStream dataOutputStream) throws IOException { +byte[] input = copyBlockDataChunk(rawColumnChunks, dimensionColumnPages, rowNumber, pageNumber); ByteBuffer byteArray = ByteBuffer.wrap(input); int dataLength = byteArray.getInt(); dataOutputStream.writeInt(dataLength); if (dataLength > 0) { int dataOffset = byteArray.getInt(); for (int i = 0; i < dataLength; i++) { -children -.parseBlocksAndReturnComplexColumnByteArray(rawColumnChunks, dataOffset++, pageNumber, -dataOutputStream); +children.parseBlocksAndReturnComplexColumnByteArray(rawColumnChunks, dimensionColumnPages, +dataOffset++, pageNumber, dataOutputStream); } } } http://git-wip-us.apache.org/repos/asf/carbondata/blob/0c94559e/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java b/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java index 98f0715..704af89 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java @@ -19,6 +19,7 @@ package org.apache.carbondata.core.scan.complextypes; import java.io.IOException; +import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage; import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; import org.apache.carbondata.core.scan.processor.RawBlockletColumnChunks; @@ -40,9 +41,10 @@ public class ComplexQueryType { * This method is also used by child. */ protected byte[] copyBlockDataChunk(DimensionRawColumnChunk[] rawColumnChunks, - int rowNumber, int pageNumber) { + DimensionColumnPage[][] dimensionColumnPages, int rowNumber, int pageNumber) { by
carbondata git commit: [CARBONDATA-3143] Fixed local dictionary in presto
Repository: carbondata Updated Branches: refs/heads/master d9f1a8115 -> 4c9f08217 [CARBONDATA-3143] Fixed local dictionary in presto Problem: Currently, local dictionary columns are not working for presto as it is not handled in the integration layer. Solution: Add local dictionary support to presto integration layer. This closes #2972 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/4c9f0821 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/4c9f0821 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/4c9f0821 Branch: refs/heads/master Commit: 4c9f08217c7b9fa7ad33e148dbf33280e0f2b33f Parents: d9f1a81 Author: ravipesala Authored: Mon Dec 3 18:27:33 2018 +0530 Committer: kumarvishal09 Committed: Mon Dec 10 19:18:32 2018 +0530 -- .../presto/CarbonColumnVectorWrapper.java | 2 +- .../presto/readers/SliceStreamReader.java | 35 +++ .../PrestoAllDataTypeLocalDictTest.scala| 291 +++ .../integrationtest/PrestoAllDataTypeTest.scala | 2 +- .../carbondata/presto/server/PrestoServer.scala | 4 +- .../presto/util/CarbonDataStoreCreator.scala| 18 +- 6 files changed, 342 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/4c9f0821/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorWrapper.java -- diff --git a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorWrapper.java b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorWrapper.java index a80751f..f001488 100644 --- a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorWrapper.java +++ b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorWrapper.java @@ -244,7 +244,7 @@ public class CarbonColumnVectorWrapper implements CarbonColumnVector { } @Override public CarbonColumnVector getDictionaryVector() { -return this.columnVector; +return this.columnVector.getDictionaryVector(); } @Override public void putFloats(int rowId, int count, float[] src, int srcIndex) { http://git-wip-us.apache.org/repos/asf/carbondata/blob/4c9f0821/integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java -- diff --git a/integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java b/integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java index ab270fc..04e5bb3 100644 --- a/integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java +++ b/integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java @@ -17,14 +17,19 @@ package org.apache.carbondata.presto.readers; +import java.util.Optional; + import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; import org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl; import com.facebook.presto.spi.block.Block; import com.facebook.presto.spi.block.BlockBuilder; import com.facebook.presto.spi.block.DictionaryBlock; +import com.facebook.presto.spi.block.VariableWidthBlock; import com.facebook.presto.spi.type.Type; import com.facebook.presto.spi.type.VarcharType; +import io.airlift.slice.Slices; import static io.airlift.slice.Slices.wrappedBuffer; @@ -63,6 +68,36 @@ public class SliceStreamReader extends CarbonColumnVectorImpl implements PrestoV } } + @Override public void setDictionary(CarbonDictionary dictionary) { +super.setDictionary(dictionary); +if (dictionary == null) { + dictionaryBlock = null; + return; +} +boolean[] nulls = new boolean[dictionary.getDictionarySize()]; +nulls[0] = true; +nulls[1] = true; +int[] dictOffsets = new int[dictionary.getDictionarySize() + 1]; +int size = 0; +for (int i = 0; i < dictionary.getDictionarySize(); i++) { + if (dictionary.getDictionaryValue(i) != null) { +dictOffsets[i] = size; +size += dictionary.getDictionaryValue(i).length; + } +} +byte[] singleArrayDictValues = new byte[size]; +for (int i = 0; i < dictionary.getDictionarySize(); i++) { + if (dictionary.getDictionaryValue(i) != null) { +System.arraycopy(dictionary.getDictionaryValue(i), 0, singleArrayDictValues, dictOffsets[i], +dictionary.getDictionaryValue(i).length); + } +} +dictOffsets[dictOffsets.length - 1] = size; +dictionaryBlock = new VariableWidthBlock(dictionary.getDicti
carbondata git commit: [CARBONDATA-3138] Fix random count mismatch with multi-thread block pruning
Repository: carbondata Updated Branches: refs/heads/master 1bbae2657 -> 0bcd8677a [CARBONDATA-3138] Fix random count mismatch with multi-thread block pruning problem: Random count mismatch in query in multi-thread block-pruning scenario. cause: Existing prune method not meant for multi-threading as synchronization was missing. only in implicit filter scenario, while preparing the block ID list, synchronization was missing. Hence pruning was giving wrong result. solution: synchronize the implicit filter preparation, as prune now called in multi-thread This closes #2962 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/0bcd8677 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/0bcd8677 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/0bcd8677 Branch: refs/heads/master Commit: 0bcd8677a88eab90942ebadf57a31fac1de7f75a Parents: 1bbae26 Author: ajantha-bhat Authored: Wed Nov 28 19:18:16 2018 +0530 Committer: kumarvishal09 Committed: Thu Nov 29 17:51:12 2018 +0530 -- .../carbondata/core/datamap/TableDataMap.java| 13 +++-- .../core/scan/filter/ColumnFilterInfo.java | 19 +-- 2 files changed, 24 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/0bcd8677/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java index e1b2c13..06d2cab 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java @@ -145,6 +145,7 @@ public final class TableDataMap extends OperationEventListener { // for filter queries int totalFiles = 0; int datamapsCount = 0; +int filesCountPerDatamap; boolean isBlockDataMapType = true; for (Segment segment : segments) { for (DataMap dataMap : dataMaps.get(segment)) { @@ -152,7 +153,9 @@ public final class TableDataMap extends OperationEventListener { isBlockDataMapType = false; break; } -totalFiles += ((BlockDataMap) dataMap).getTotalBlocks(); +filesCountPerDatamap = ((BlockDataMap) dataMap).getTotalBlocks(); +// old legacy store can give 0, so consider one datamap as 1 record. +totalFiles += (filesCountPerDatamap == 0) ? 1 : filesCountPerDatamap; datamapsCount++; } if (!isBlockDataMapType) { @@ -206,10 +209,14 @@ public final class TableDataMap extends OperationEventListener { List blocklets, final Map> dataMaps, int totalFiles) { int numOfThreadsForPruning = getNumOfThreadsForPruning(); +LOG.info( +"Number of threads selected for multi-thread block pruning is " + numOfThreadsForPruning ++ ". total files: " + totalFiles + ". total segments: " + segments.size()); int filesPerEachThread = totalFiles / numOfThreadsForPruning; int prev; int filesCount = 0; int processedFileCount = 0; +int filesCountPerDatamap; List> segmentList = new ArrayList<>(numOfThreadsForPruning); List segmentDataMapGroupList = new ArrayList<>(); for (Segment segment : segments) { @@ -217,7 +224,9 @@ public final class TableDataMap extends OperationEventListener { prev = 0; for (int i = 0; i < eachSegmentDataMapList.size(); i++) { DataMap dataMap = eachSegmentDataMapList.get(i); -filesCount += ((BlockDataMap) dataMap).getTotalBlocks(); +filesCountPerDatamap = ((BlockDataMap) dataMap).getTotalBlocks(); +// old legacy store can give 0, so consider one datamap as 1 record. +filesCount += (filesCountPerDatamap == 0) ? 1 : filesCountPerDatamap; if (filesCount >= filesPerEachThread) { if (segmentList.size() != numOfThreadsForPruning - 1) { // not the last segmentList http://git-wip-us.apache.org/repos/asf/carbondata/blob/0bcd8677/core/src/main/java/org/apache/carbondata/core/scan/filter/ColumnFilterInfo.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/filter/ColumnFilterInfo.java b/core/src/main/java/org/apache/carbondata/core/scan/filter/ColumnFilterInfo.java index 75ec35e..8677a2d 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/filter/ColumnFilterInfo.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/filter/ColumnFilterInfo.java @@ -107,19 +107,26 @@ public class
carbondata git commit: [DOCUMENT] Added filter push handling parameter in documents.
Repository: carbondata Updated Branches: refs/heads/master eeeaf50f1 -> c5bfe4acf [DOCUMENT] Added filter push handling parameter in documents. Added filter push handling parameter in documents This closes #2957 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/c5bfe4ac Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/c5bfe4ac Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/c5bfe4ac Branch: refs/heads/master Commit: c5bfe4acfffe33679a95d22f67f0859da583adb1 Parents: eeeaf50 Author: ravipesala Authored: Tue Nov 27 15:16:57 2018 +0530 Committer: kumarvishal09 Committed: Wed Nov 28 15:51:13 2018 +0530 -- docs/configuration-parameters.md | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/c5bfe4ac/docs/configuration-parameters.md -- diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md index c82d5d7..a41a3d5 100644 --- a/docs/configuration-parameters.md +++ b/docs/configuration-parameters.md @@ -138,6 +138,7 @@ This section provides the details of all the configurations required for the Car | carbon.query.validate.direct.query.on.datamap | true | CarbonData supports creating pre-aggregate table datamaps as an independent tables. For some debugging purposes, it might be required to directly query from such datamap tables. This configuration allows to query on such datamaps. | | carbon.max.driver.threads.for.block.pruning | 4 | Number of threads used for driver pruning when the carbon files are more than 100k Maximum memory. This configuration can used to set number of threads between 1 to 4. | | carbon.heap.memory.pooling.threshold.bytes | 1048576 | CarbonData supports unsafe operations of Java to avoid GC overhead for certain operations. Using unsafe, memory can be allocated on Java Heap or off heap. This configuration controls the allocation mechanism on Java HEAP. If the heap memory allocations of the given size is greater or equal than this value,it should go through the pooling mechanism. But if set this size to -1, it should not go through the pooling mechanism. Default value is 1048576(1MB, the same as Spark). Value to be specified in bytes. | +| carbon.push.rowfilters.for.vector | false | When enabled complete row filters will be handled by carbon in case of vector. If it is disabled then only page level pruning will be done by carbon and row level filtering will be done by spark for vector. And also there are scan optimizations in carbon to avoid multiple data copies when this parameter is set to false. There is no change in flow for non-vector based queries. | ## Data Mutation Configuration | Parameter | Default Value | Description |
carbondata git commit: [CARBONDATA-2896] Added TestCases for Adaptive encoding
Repository: carbondata Updated Branches: refs/heads/master 50ecb83a2 -> 0b83a8183 [CARBONDATA-2896] Added TestCases for Adaptive encoding Test cases added for Adaptive encoding for primitive types. This closes #2849 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/0b83a818 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/0b83a818 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/0b83a818 Branch: refs/heads/master Commit: 0b83a8183f70973960ab7ea25b68f27fb3e7247f Parents: 50ecb83 Author: dhatchayani Authored: Mon Oct 22 12:28:13 2018 +0530 Committer: kumarvishal09 Committed: Thu Nov 22 16:35:31 2018 +0530 -- .../test/resources/dataWithNegativeValues.csv | 7 + .../TestAdaptiveEncodingForPrimitiveTypes.scala | 430 +++ 2 files changed, 437 insertions(+) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/0b83a818/integration/spark-common-test/src/test/resources/dataWithNegativeValues.csv -- diff --git a/integration/spark-common-test/src/test/resources/dataWithNegativeValues.csv b/integration/spark-common-test/src/test/resources/dataWithNegativeValues.csv new file mode 100644 index 000..9e369ca --- /dev/null +++ b/integration/spark-common-test/src/test/resources/dataWithNegativeValues.csv @@ -0,0 +1,7 @@ +-3,aaa,-300 +0,ddd,0 +-2,bbb,-200 +7,ggg,700 +1,eee,100 +-1,ccc,-100 +null,null,null \ No newline at end of file http://git-wip-us.apache.org/repos/asf/carbondata/blob/0b83a818/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/TestAdaptiveEncodingForPrimitiveTypes.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/TestAdaptiveEncodingForPrimitiveTypes.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/TestAdaptiveEncodingForPrimitiveTypes.scala new file mode 100644 index 000..944de37 --- /dev/null +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/TestAdaptiveEncodingForPrimitiveTypes.scala @@ -0,0 +1,430 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.integration.spark.testsuite.primitiveTypes + +import java.io.File + +import org.apache.spark.sql.Row +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + +class TestAdaptiveEncodingForPrimitiveTypes extends QueryTest with BeforeAndAfterAll { + + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath + + private val vectorReader = CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.ENABLE_VECTOR_READER) + + private val unsafeColumnPage = CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE) + + private val unsafeQueryExecution = CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.ENABLE_UNSAFE_IN_QUERY_EXECUTION) + + private val unsafeSort = CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.ENABLE_UNSAFE_SORT) + + private val compactionThreshold = CarbonProperties.getInstance() +.getProperty(CarbonCommonConstants.COMPACTION_SEGMENT_LEVEL_THRESHOLD) + + CarbonProperties.getInstance() +.addProperty(CarbonCommonConstants.COMPACTION_SEGMENT_LEVEL_THRESHOLD, "2,2") + + override def beforeAll: Unit = { +dropTables +sql( + "CREATE TABLE uniqdata_Compare (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB " + + "ti
carbondata git commit: [CARBONDATA-3114]Remove Null Values for a Dictionary_Include Timestamp column for Range Filters
Repository: carbondata Updated Branches: refs/heads/master 697eee3de -> 50ecb83a2 [CARBONDATA-3114]Remove Null Values for a Dictionary_Include Timestamp column for Range Filters Problem: Null Values are not removed in case of RangeFilters, if column is a dictionary and no_inverted_index timestamp column. Solution: Remove NULL values in case of RangeFilters for such dictionary and no_inverted_index timestamp column. This closes #2937 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/50ecb83a Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/50ecb83a Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/50ecb83a Branch: refs/heads/master Commit: 50ecb83a264ab6512ebade0580e3288295452966 Parents: 697eee3 Author: Indhumathi27 Authored: Wed Nov 21 15:21:49 2018 +0530 Committer: kumarvishal09 Committed: Thu Nov 22 16:32:15 2018 +0530 -- .../carbondata/core/scan/filter/FilterUtil.java | 23 +++ .../executer/RangeValueFilterExecuterImpl.java | 21 ++ .../RowLevelRangeGrtThanFiterExecuterImpl.java | 8 +- ...elRangeGrtrThanEquaToFilterExecuterImpl.java | 8 +- ...velRangeLessThanEqualFilterExecuterImpl.java | 20 - ...RowLevelRangeLessThanFilterExecuterImpl.java | 20 - .../src/test/resources/data_timestamp.csv | 10 +++ ...estampDataTypeDirectDictionaryTestCase.scala | 30 8 files changed, 89 insertions(+), 51 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/50ecb83a/core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java b/core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java index 06672f5..286f68f 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java @@ -52,6 +52,8 @@ import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage; import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; import org.apache.carbondata.core.keygenerator.KeyGenException; import org.apache.carbondata.core.keygenerator.KeyGenerator; +import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator; +import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryKeyGeneratorFactory; import org.apache.carbondata.core.keygenerator.factory.KeyGeneratorFactory; import org.apache.carbondata.core.keygenerator.mdkey.MultiDimKeyVarLengthGenerator; import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; @@ -2247,4 +2249,25 @@ public final class FilterUtil { } } + /** + * This method is used to get default null values for a direct dictionary column + * @param currentBlockDimension + * @param segmentProperties + * @return + */ + public static byte[] getDefaultNullValue(CarbonDimension currentBlockDimension, + SegmentProperties segmentProperties) { +byte[] defaultValue = null; +DirectDictionaryGenerator directDictionaryGenerator = DirectDictionaryKeyGeneratorFactory +.getDirectDictionaryGenerator(currentBlockDimension.getDataType()); +int key = directDictionaryGenerator.generateDirectSurrogateKey(null); +if (currentBlockDimension.isSortColumn()) { + defaultValue = FilterUtil + .getMaskKey(key, currentBlockDimension, segmentProperties.getSortColumnsGenerator()); +} else { + defaultValue = ByteUtil.toXorBytes(key); +} +return defaultValue; + } + } http://git-wip-us.apache.org/repos/asf/carbondata/blob/50ecb83a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java index e84e82d..bcae001 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java @@ -24,8 +24,6 @@ import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.datastore.block.SegmentProperties; import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage; import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; -imp
carbondata git commit: [CARBONDATA-3115] Fix CodeGen error in preaggregate table and codegen display issue in oldstores
Repository: carbondata Updated Branches: refs/heads/master 0fa0a96c4 -> 697eee3de [CARBONDATA-3115] Fix CodeGen error in preaggregate table and codegen display issue in oldstores Problem: 1. While querying a preaggregate table, codegen error is displayed. 2. In old stores, code is getting displayed while executing queries. This closes #2939 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/697eee3d Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/697eee3d Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/697eee3d Branch: refs/heads/master Commit: 697eee3de7eb1147fd75452d10acfe087a0566ba Parents: 0fa0a96 Author: Indhumathi27 Authored: Wed Nov 21 17:23:25 2018 +0530 Committer: kumarvishal09 Committed: Thu Nov 22 15:31:00 2018 +0530 -- .../preaggregate/TestPreAggCreateCommand.scala | 23 .../spark/sql/CarbonDictionaryDecoder.scala | 12 +- 2 files changed, 29 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/697eee3d/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala index 9fbdff7..7851bd1 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala @@ -438,6 +438,29 @@ class TestPreAggCreateCommand extends QueryTest with BeforeAndAfterAll { } } + test("test codegen issue with preaggregate") { +sql("DROP TABLE IF EXISTS PreAggMain") +sql("CREATE TABLE PreAggMain (id Int, date date, country string, phonetype string, " + +"serialname String,salary int ) STORED BY 'org.apache.carbondata.format' " + +"tblproperties('dictionary_include'='country')") +sql("create datamap PreAggSum on table PreAggMain using 'preaggregate' as " + +"select country,sum(salary) as sum from PreAggMain group by country") +sql("create datamap PreAggAvg on table PreAggMain using 'preaggregate' as " + +"select country,avg(salary) as avg from PreAggMain group by country") +sql("create datamap PreAggCount on table PreAggMain using 'preaggregate' as " + +"select country,count(salary) as count from PreAggMain group by country") +sql("create datamap PreAggMin on table PreAggMain using 'preaggregate' as " + +"select country,min(salary) as min from PreAggMain group by country") +sql("create datamap PreAggMax on table PreAggMain using 'preaggregate' as " + +"select country,max(salary) as max from PreAggMain group by country") +sql(s"LOAD DATA INPATH '$integrationPath/spark-common-test/src/test/resources/source.csv' " + +s"into table PreAggMain") +checkExistence(sql("select t1.country,sum(id) from PreAggMain t1 join (select " + + "country as newcountry,sum(salary) as sum from PreAggMain group by country)" + + "t2 on t1.country=t2.newcountry group by country"), true, "france") +sql("DROP TABLE IF EXISTS PreAggMain") + } + // TODO: Need to Fix ignore("test creation of multiple preaggregate of same name concurrently") { sql("DROP TABLE IF EXISTS tbl_concurr") http://git-wip-us.apache.org/repos/asf/carbondata/blob/697eee3d/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala index 95ab29d..3b20c2f 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala @@ -248,34 +248,34 @@ case class CarbonDictionaryDecoder( |org.apache.spark.sql.DictTuple $value = $decodeDecimal($dictRef, ${ev.value});
carbondata git commit: [CARBONDATA-3096] Wrong records size on the input metrics
Repository: carbondata Updated Branches: refs/heads/master 2f69e4fb7 -> b8d602598 [CARBONDATA-3096] Wrong records size on the input metrics Scanned record result size is taking from the default batch size. It should be taken from the records scanned. This closes #2927 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/b8d60259 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/b8d60259 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/b8d60259 Branch: refs/heads/master Commit: b8d6025982cf27a172674de19db69b60f1448958 Parents: 2f69e4f Author: dhatchayani Authored: Tue Nov 13 18:28:48 2018 +0530 Committer: kumarvishal09 Committed: Wed Nov 21 19:45:21 2018 +0530 -- .../spark/vectorreader/VectorizedCarbonRecordReader.java | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/b8d60259/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java -- diff --git a/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java b/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java index 1f28b8c..c9a4ba4 100644 --- a/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java +++ b/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java @@ -163,8 +163,8 @@ public class VectorizedCarbonRecordReader extends AbstractRecordReader { @Override public void close() throws IOException { -logStatistics(rowCount, queryModel.getStatisticsRecorder()); if (vectorProxy != null) { + logStatistics(rowCount, queryModel.getStatisticsRecorder()); vectorProxy.close(); vectorProxy = null; } @@ -200,7 +200,7 @@ public class VectorizedCarbonRecordReader extends AbstractRecordReader { @Override public Object getCurrentValue() throws IOException, InterruptedException { if (returnColumnarBatch) { - int value = vectorProxy.numRows(); + int value = carbonColumnarBatch.getActualSize(); rowCount += value; if (inputMetricsStats != null) { inputMetricsStats.incrementRecordRead((long) value);
carbondata git commit: [CARBONDATA-3070] Fix partition load issue when custom location is added.
Repository: carbondata Updated Branches: refs/heads/master 74a2ddee9 -> d62277696 [CARBONDATA-3070] Fix partition load issue when custom location is added. Problem: Load files from carbonfile format when custom partition location is added Reason: Carbon has its own filename for each carbondata file, it does not use the filename proposed by spark. And also it has extra index file need to be created. In case of custom partition location sparks keep track the files of name which creates and move them. But carbon has different files created and maintained, that creates the filenot found exception. Solution: Use custom protocol to manage commit and folder location for custom partition location. This closes #2873 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/d6227769 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/d6227769 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/d6227769 Branch: refs/heads/master Commit: d62277696cd19257a50cc956e3e7ff8fad5e651f Parents: 74a2dde Author: ravipesala Authored: Mon Oct 29 13:15:00 2018 +0530 Committer: kumarvishal09 Committed: Fri Nov 2 18:29:46 2018 +0530 -- .../datasources/SparkCarbonFileFormat.scala | 87 +++- .../org/apache/spark/sql/CarbonVectorProxy.java | 3 + .../datasource/SparkCarbonDataSourceTest.scala | 34 3 files changed, 120 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/d6227769/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala -- diff --git a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala index cd2035c..8c2f200 100644 --- a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala +++ b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala @@ -17,6 +17,8 @@ package org.apache.spark.sql.carbondata.execution.datasources +import java.net.URI + import scala.collection.JavaConverters._ import scala.collection.mutable.ArrayBuffer @@ -27,6 +29,7 @@ import org.apache.hadoop.mapreduce._ import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl import org.apache.spark.TaskContext import org.apache.spark.internal.Logging +import org.apache.spark.internal.io.FileCommitProtocol import org.apache.spark.memory.MemoryMode import org.apache.spark.sql._ import org.apache.spark.sql.carbondata.execution.datasources.readsupport.SparkUnsafeRowReadSuport @@ -112,6 +115,13 @@ class SparkCarbonFileFormat extends FileFormat } /** + * Add our own protocol to control the commit. + */ + SparkSession.getActiveSession.get.sessionState.conf.setConfString( +"spark.sql.sources.commitProtocolClass", + "org.apache.spark.sql.carbondata.execution.datasources.CarbonSQLHadoopMapReduceCommitProtocol") + + /** * Prepares a write job and returns an [[OutputWriterFactory]]. Client side job preparation is * done here. */ @@ -125,6 +135,7 @@ class SparkCarbonFileFormat extends FileFormat val model = CarbonSparkDataSourceUtil.prepareLoadModel(options, dataSchema) model.setLoadWithoutConverterStep(true) CarbonTableOutputFormat.setLoadModel(conf, model) +conf.set(CarbonSQLHadoopMapReduceCommitProtocol.COMMIT_PROTOCOL, "true") new OutputWriterFactory { override def newInstance( @@ -310,7 +321,6 @@ class SparkCarbonFileFormat extends FileFormat vectorizedReader.toBoolean && schema.forall(_.dataType.isInstanceOf[AtomicType]) } - /** * Returns whether this format support returning columnar batch or not. */ @@ -369,7 +379,7 @@ class SparkCarbonFileFormat extends FileFormat if (file.filePath.endsWith(CarbonTablePath.CARBON_DATA_EXT)) { val split = new CarbonInputSplit("null", - new Path(file.filePath), + new Path(new URI(file.filePath)), file.start, file.length, file.locations, @@ -380,10 +390,12 @@ class SparkCarbonFileFormat extends FileFormat split.setDetailInfo(info) info.setBlockSize(file.length) // Read the footer offset and set. -val reader = FileFactory.getFileHolder(FileFactory.getFileType(file.filePath), +val reader = FileFactory.getFileHolder(FileFactory.getFileType(split.getPath.toString), bro
carbondata git commit: [HOTFIX-compatibility] Handle Lazy loading with inverted index for ColumnarVectorWrapperDirectWithInvertedIndex
Repository: carbondata Updated Branches: refs/heads/master bcf3e0fd5 -> 94a4f8314 [HOTFIX-compatibility] Handle Lazy loading with inverted index for ColumnarVectorWrapperDirectWithInvertedIndex Problem: Create a store with 1.4 code with inverted index and read it with vector filling (latest master code). below exception will be thrown from AbstractCarbonColumnarVector. UnsupportedOperationException("Not allowed from here " + getClass().getName()); cause: when the lazy loading with an inverted index, getBlockDataType() was not implemented for ColumnarVectorWrapperDirectWithInvertedIndex. So, Added implementation. This closes #2870 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/94a4f831 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/94a4f831 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/94a4f831 Branch: refs/heads/master Commit: 94a4f8314068ffd4c0743907752b58879578749b Parents: bcf3e0f Author: ajantha-bhat Authored: Mon Oct 29 12:33:55 2018 +0530 Committer: kumarvishal09 Committed: Wed Oct 31 17:54:03 2018 +0530 -- .../encoding/adaptive/AdaptiveDeltaFloatingCodec.java | 10 ++ .../page/encoding/adaptive/AdaptiveFloatingCodec.java | 10 ++ .../ColumnarVectorWrapperDirectWithInvertedIndex.java | 6 ++ 3 files changed, 26 insertions(+) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/94a4f831/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java index d73318d..f91ede5 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java @@ -272,6 +272,11 @@ public class AdaptiveDeltaFloatingCodec extends AdaptiveCodec { int shortInt = ByteUtil.valueOf3Bytes(shortIntPage, i * 3); vector.putFloat(i, (max - shortInt) / floatFactor); } +} else if (pageDataType == DataTypes.INT) { + int[] intData = columnPage.getIntPage(); + for (int i = 0; i < pageSize; i++) { +vector.putFloat(i, (max - intData[i]) / floatFactor); + } } else { throw new RuntimeException("internal error: " + this.toString()); } @@ -298,6 +303,11 @@ public class AdaptiveDeltaFloatingCodec extends AdaptiveCodec { for (int i = 0; i < pageSize; i++) { vector.putDouble(i, (max - intData[i]) / factor); } +} else if (pageDataType == DataTypes.LONG) { + long[] longData = columnPage.getLongPage(); + for (int i = 0; i < pageSize; i++) { +vector.putDouble(i, (max - longData[i]) / factor); + } } else { throw new RuntimeException("Unsupported datatype : " + pageDataType); } http://git-wip-us.apache.org/repos/asf/carbondata/blob/94a4f831/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveFloatingCodec.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveFloatingCodec.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveFloatingCodec.java index b300ee1..49696eb 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveFloatingCodec.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveFloatingCodec.java @@ -274,6 +274,11 @@ public class AdaptiveFloatingCodec extends AdaptiveCodec { int shortInt = ByteUtil.valueOf3Bytes(shortIntPage, i * 3); vector.putFloat(i, (shortInt / floatFactor)); } +} else if (pageDataType == DataTypes.INT) { + int[] intData = columnPage.getIntPage(); + for (int i = 0; i < pageSize; i++) { +vector.putFloat(i, (intData[i] / floatFactor)); + } } else { throw new RuntimeException("internal error: " + this.toString()); } @@ -300,6 +305,11 @@ public class AdaptiveFloatingCodec extends AdaptiveCodec { for (int i = 0; i < pageSize; i++) { vector.putDouble(i, (intData[i] / factor)); } +}
[1/2] carbondata git commit: [CARBONDATA-3015] Support Lazy load in carbon vector
Repository: carbondata Updated Branches: refs/heads/master 019f5cd06 -> 170c2f56d http://git-wip-us.apache.org/repos/asf/carbondata/blob/170c2f56/integration/spark-datasource/src/main/spark2.3plus/org/apache/spark/sql/CarbonVectorProxy.java -- diff --git a/integration/spark-datasource/src/main/spark2.3plus/org/apache/spark/sql/CarbonVectorProxy.java b/integration/spark-datasource/src/main/spark2.3plus/org/apache/spark/sql/CarbonVectorProxy.java index bd74b05..c8c4e2c 100644 --- a/integration/spark-datasource/src/main/spark2.3plus/org/apache/spark/sql/CarbonVectorProxy.java +++ b/integration/spark-datasource/src/main/spark2.3plus/org/apache/spark/sql/CarbonVectorProxy.java @@ -19,12 +19,16 @@ package org.apache.spark.sql; import java.math.BigInteger; import org.apache.carbondata.core.scan.result.vector.CarbonDictionary; +import org.apache.carbondata.core.scan.scanner.LazyPageLoader; import org.apache.spark.memory.MemoryMode; import org.apache.spark.sql.catalyst.InternalRow; import org.apache.spark.sql.execution.vectorized.WritableColumnVector; import org.apache.spark.sql.types.*; +import org.apache.spark.sql.vectorized.ColumnVector; +import org.apache.spark.sql.vectorized.ColumnarArray; import org.apache.spark.sql.vectorized.ColumnarBatch; +import org.apache.spark.sql.vectorized.ColumnarMap; import org.apache.spark.unsafe.types.CalendarInterval; import org.apache.spark.unsafe.types.UTF8String; @@ -52,23 +56,23 @@ public class CarbonVectorProxy { public CarbonVectorProxy(MemoryMode memMode, int rowNum, StructField[] structFileds) { WritableColumnVector[] columnVectors = ColumnVectorFactory.getColumnVector(memMode, new StructType(structFileds), rowNum); -columnarBatch = new ColumnarBatch(columnVectors); -columnarBatch.setNumRows(rowNum); -columnVectorProxies = new ColumnVectorProxy[columnarBatch.numCols()]; +columnVectorProxies = new ColumnVectorProxy[columnVectors.length]; for (int i = 0; i < columnVectorProxies.length; i++) { -columnVectorProxies[i] = new ColumnVectorProxy(columnarBatch, i); +columnVectorProxies[i] = new ColumnVectorProxy(columnVectors[i]); } +columnarBatch = new ColumnarBatch(columnVectorProxies); +columnarBatch.setNumRows(rowNum); } public CarbonVectorProxy(MemoryMode memMode, StructType outputSchema, int rowNum) { WritableColumnVector[] columnVectors = ColumnVectorFactory .getColumnVector(memMode, outputSchema, rowNum); -columnarBatch = new ColumnarBatch(columnVectors); -columnarBatch.setNumRows(rowNum); -columnVectorProxies = new ColumnVectorProxy[columnarBatch.numCols()]; +columnVectorProxies = new ColumnVectorProxy[columnVectors.length]; for (int i = 0; i < columnVectorProxies.length; i++) { -columnVectorProxies[i] = new ColumnVectorProxy(columnarBatch, i); +columnVectorProxies[i] = new ColumnVectorProxy(columnVectors[i]); } +columnarBatch = new ColumnarBatch(columnVectorProxies); +columnarBatch.setNumRows(rowNum); } /** @@ -86,7 +90,7 @@ public class CarbonVectorProxy { * @return */ public WritableColumnVector column(int ordinal) { -return (WritableColumnVector) columnarBatch.column(ordinal); +return ((ColumnVectorProxy) columnarBatch.column(ordinal)).getVector(); } public ColumnVectorProxy getColumnVector(int ordinal) { @@ -97,12 +101,12 @@ public class CarbonVectorProxy { */ public void reset() { for (int i = 0; i < columnarBatch.numCols(); i++) { -((WritableColumnVector)columnarBatch.column(i)).reset(); +((ColumnVectorProxy) columnarBatch.column(i)).reset(); } } public void resetDictionaryIds(int ordinal) { - ((WritableColumnVector)columnarBatch.column(ordinal)).getDictionaryIds().reset(); +(((ColumnVectorProxy) columnarBatch.column(ordinal)).getVector()).getDictionaryIds().reset(); } /** @@ -140,65 +144,70 @@ public class CarbonVectorProxy { return columnarBatch.column(ordinal).dataType(); } -public static class ColumnVectorProxy { +public static class ColumnVectorProxy extends ColumnVector { private WritableColumnVector vector; -public ColumnVectorProxy(ColumnarBatch columnarBatch, int ordinal) { -vector = (WritableColumnVector) columnarBatch.column(ordinal); +private LazyPageLoader pageLoad; + +private boolean isLoaded; + +public ColumnVectorProxy(ColumnVector columnVector) { +super(columnVector.dataType()); +vector = (WritableColumnVector) columnVector; } -public void putRowToColumnBatch(int rowId, Object value, int offset) { -DataType t = dataType(offset); +
[2/2] carbondata git commit: [CARBONDATA-3015] Support Lazy load in carbon vector
[CARBONDATA-3015] Support Lazy load in carbon vector Even though we prune the pages as per min/max there is a high chance of false positives in case of filters on high cardinality columns. So to avoid that we can use the lazy loading design. It does not read/decompresses data and fill the vector immediately when the call comes for data filling from spark/presto. First only reads the required filter columns give back to execution engine, execution engine starts filtering on the filtered column vector and if it finds some data need to be read from projection columns then only it starts reads the projection columns and fills the vector on demand. It is the concept of presto and same is integrated with spark 2.3. Older versions of spark cannot use this advantage as ColumnVector interfaces are non-extendable. For the above purpose added new classes 'LazyBlockletLoad' and 'LazyPageLoad' and changed the carbon vector interfaces. This closes #2823 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/170c2f56 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/170c2f56 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/170c2f56 Branch: refs/heads/master Commit: 170c2f56dc1f9b55444aa727d0e587a207f7b8c7 Parents: 019f5cd Author: ravipesala Authored: Tue Oct 16 18:39:16 2018 +0530 Committer: kumarvishal09 Committed: Sat Oct 27 05:28:54 2018 +0530 -- .../core/constants/CarbonCommonConstants.java | 2 +- .../safe/AbstractNonDictionaryVectorFiller.java | 2 +- .../datastore/page/SafeFixLengthColumnPage.java | 4 +- .../encoding/compress/DirectCompressCodec.java | 5 + .../core/scan/result/BlockletScannedResult.java | 33 ++- .../scan/result/vector/CarbonColumnVector.java | 3 + .../vector/impl/CarbonColumnVectorImpl.java | 5 +- .../AbstractCarbonColumnarVector.java | 46 ++-- .../core/scan/scanner/LazyBlockletLoader.java | 158 .../core/scan/scanner/LazyPageLoader.java | 80 ++ .../scanner/impl/BlockletFilterScanner.java | 77 ++ .../scan/scanner/impl/BlockletFullScanner.java | 5 +- .../presto/CarbonColumnVectorWrapper.java | 4 + .../lucene/LuceneFineGrainDataMapSuite.scala| 2 +- ...imestampNoDictionaryColumnCastTestCase.scala | 2 +- .../vectorreader/ColumnarVectorWrapper.java | 80 +++--- .../ColumnarVectorWrapperDirect.java| 57 +++-- .../datasources/SparkCarbonFileFormat.scala | 2 +- .../org/apache/spark/sql/CarbonVectorProxy.java | 88 +++ .../org/apache/spark/sql/CarbonVectorProxy.java | 249 +-- .../stream/CarbonStreamRecordReader.java| 2 +- .../partition/TestAlterPartitionTable.scala | 4 +- 22 files changed, 630 insertions(+), 280 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/170c2f56/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 72da3bd..7df1b7e 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1735,7 +1735,7 @@ public final class CarbonCommonConstants { public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR = "carbon.push.rowfilters.for.vector"; - public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "true"; + public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "false"; // // Unused constants and parameters start here http://git-wip-us.apache.org/repos/asf/carbondata/blob/170c2f56/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java index 2e68648..9626da7 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java @@ -52,7 +52,7 @@ class NonDictionaryVectorFillerFactory { public static Abs
carbondata git commit: [CARBONDATA-3014] Added support for inverted index and delete delta for direct scan queries
Repository: carbondata Updated Branches: refs/heads/master b62b0fd9c -> 71d617955 [CARBONDATA-3014] Added support for inverted index and delete delta for direct scan queries Added new classes to support inverted index and delete delta directly from column vector. ColumnarVectorWrapperDirectWithInvertedIndex ColumnarVectorWrapperDirectWithDeleteDelta ColumnarVectorWrapperDirectWithDeleteDeltaAndInvertedIndex This closes #2822 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/71d61795 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/71d61795 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/71d61795 Branch: refs/heads/master Commit: 71d6179557703718ff0aac099efcc89ee41ed941 Parents: b62b0fd Author: ravipesala Authored: Tue Oct 16 16:37:18 2018 +0530 Committer: kumarvishal09 Committed: Fri Oct 26 18:52:10 2018 +0530 -- ...mpressedDimensionChunkFileBasedReaderV3.java | 12 +- .../safe/AbstractNonDictionaryVectorFiller.java | 6 +- .../SafeFixedLengthDimensionDataChunkStore.java | 11 + ...feVariableLengthDimensionDataChunkStore.java | 10 + .../adaptive/AdaptiveDeltaFloatingCodec.java| 3 + .../adaptive/AdaptiveDeltaIntegralCodec.java| 35 ++- .../adaptive/AdaptiveFloatingCodec.java | 3 + .../adaptive/AdaptiveIntegralCodec.java | 17 +- .../encoding/compress/DirectCompressCodec.java | 16 +- .../datatype/DecimalConverterFactory.java | 42 +++- .../scan/collector/ResultCollectorFactory.java | 11 +- .../executer/RestructureEvaluatorImpl.java | 2 +- ...elRangeGrtrThanEquaToFilterExecuterImpl.java | 14 +- .../scan/result/vector/ColumnVectorInfo.java| 1 + .../AbstractCarbonColumnarVector.java | 133 .../ColumnarVectorWrapperDirectFactory.java | 59 + ...umnarVectorWrapperDirectWithDeleteDelta.java | 216 +++ ...erDirectWithDeleteDeltaAndInvertedIndex.java | 179 +++ ...narVectorWrapperDirectWithInvertedIndex.java | 144 + .../impl/directread/ConvertableVector.java | 30 +++ .../scanner/impl/BlockletFilterScanner.java | 8 +- .../detailquery/CastColumnTestCase.scala| 2 +- .../datasources/SparkCarbonFileFormat.scala | 1 + 23 files changed, 910 insertions(+), 45 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/71d61795/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java index a9f9338..602e694 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java @@ -276,13 +276,19 @@ public class CompressedDimensionChunkFileBasedReaderV3 extends AbstractChunkRead offset += pageMetadata.data_page_length; invertedIndexes = CarbonUtil .getUnCompressColumnIndex(pageMetadata.rowid_page_length, pageData, offset); -// get the reverse index -invertedIndexesReverse = CarbonUtil.getInvertedReverseIndex(invertedIndexes); +if (vectorInfo == null) { + // get the reverse index + invertedIndexesReverse = CarbonUtil.getInvertedReverseIndex(invertedIndexes); +} else { + vectorInfo.invertedIndex = invertedIndexes; +} } BitSet nullBitSet = QueryUtil.getNullBitSet(pageMetadata.presence, this.compressor); ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, dataOffset, null != rawColumnPage.getLocalDictionary(), vectorInfo, nullBitSet); - decodedPage.setNullBits(nullBitSet); + if (decodedPage != null) { +decodedPage.setNullBits(nullBitSet); + } return new ColumnPageWrapper(decodedPage, rawColumnPage.getLocalDictionary(), invertedIndexes, invertedIndexesReverse, isEncodedWithAdaptiveMeta(pageMetadata), isExplicitSorted); } else { http://git-wip-us.apache.org/repos/asf/carbondata/blob/71d61795/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/s
carbondata git commit: [CARBONDATA-3013] Added support for pruning pages for vector direct fill.
Repository: carbondata Updated Branches: refs/heads/master 3d3b6ff16 -> e6d15da74 [CARBONDATA-3013] Added support for pruning pages for vector direct fill. First, apply page level pruning using the min/max of each page and get the valid pages of blocklet. Decompress only valid pages and fill the vector directly as mentioned in full scan query scenario. For this purpose to prune pages first before decompressing the data, added new method inside a class FilterExecuter. BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) throws FilterUnsupportedException, IOException; The above method reads the necessary column chunk metadata and prunes the pages as per the min/max meta. Based on the pruned pages BlockletScannedResult decompresses and fills the column page data to vector as described in full scan in above mentioned PR . This closes #2820 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/e6d15da7 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/e6d15da7 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/e6d15da7 Branch: refs/heads/master Commit: e6d15da74a3a4d9c0af9d6886b811ac5bb2d89e9 Parents: 3d3b6ff Author: ravipesala Authored: Tue Oct 16 14:53:14 2018 +0530 Committer: kumarvishal09 Committed: Fri Oct 26 16:24:38 2018 +0530 -- .../filter/executer/AndFilterExecuterImpl.java | 15 ++ .../executer/ExcludeFilterExecuterImpl.java | 10 ++ .../filter/executer/FalseFilterExecutor.java| 8 + .../scan/filter/executer/FilterExecuter.java| 6 + .../ImplicitIncludeFilterExecutorImpl.java | 9 + .../executer/IncludeFilterExecuterImpl.java | 87 -- .../filter/executer/OrFilterExecuterImpl.java | 9 + .../executer/RangeValueFilterExecuterImpl.java | 38 + .../executer/RestructureEvaluatorImpl.java | 10 ++ .../executer/RowLevelFilterExecuterImpl.java| 10 ++ .../RowLevelRangeGrtThanFiterExecuterImpl.java | 85 -- ...elRangeGrtrThanEquaToFilterExecuterImpl.java | 88 -- ...velRangeLessThanEqualFilterExecuterImpl.java | 87 -- ...RowLevelRangeLessThanFilterExecuterImpl.java | 86 -- .../filter/executer/TrueFilterExecutor.java | 9 + .../scanner/impl/BlockletFilterScanner.java | 166 ++- 16 files changed, 656 insertions(+), 67 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/e6d15da7/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/AndFilterExecuterImpl.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/AndFilterExecuterImpl.java b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/AndFilterExecuterImpl.java index d743151..f0feb0e 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/AndFilterExecuterImpl.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/AndFilterExecuterImpl.java @@ -50,6 +50,21 @@ public class AndFilterExecuterImpl implements FilterExecuter, ImplicitColumnFilt return leftFilters; } + @Override + public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) + throws FilterUnsupportedException, IOException { +BitSet leftFilters = leftExecuter.prunePages(rawBlockletColumnChunks); +if (leftFilters.isEmpty()) { + return leftFilters; +} +BitSet rightFilter = rightExecuter.prunePages(rawBlockletColumnChunks); +if (rightFilter.isEmpty()) { + return rightFilter; +} +leftFilters.and(rightFilter); +return leftFilters; + } + @Override public boolean applyFilter(RowIntf value, int dimOrdinalMax) throws FilterUnsupportedException, IOException { return leftExecuter.applyFilter(value, dimOrdinalMax) && http://git-wip-us.apache.org/repos/asf/carbondata/blob/e6d15da7/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java index 15a43c5..fc9fbae 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java @@ -25,6 +25,7 @@ import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; import org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk; import org.apache.carbondata.core.datastore.page.ColumnPa
[2/3] carbondata git commit: [CARBONDATA-3012] Added support for full scan queries for vector direct fill.
http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java index 39b8282..a760b64 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java @@ -124,8 +124,9 @@ public abstract class VarLengthColumnPageBase extends ColumnPage { /** * Create a new column page for decimal page */ - static ColumnPage newDecimalColumnPage(TableSpec.ColumnSpec columnSpec, byte[] lvEncodedBytes, - String compressorName) throws MemoryException { + static ColumnPage newDecimalColumnPage(ColumnPageEncoderMeta meta, + byte[] lvEncodedBytes) throws MemoryException { +TableSpec.ColumnSpec columnSpec = meta.getColumnSpec(); DecimalConverterFactory.DecimalConverter decimalConverter = DecimalConverterFactory.INSTANCE.getDecimalConverter(columnSpec.getPrecision(), columnSpec.getScale()); @@ -133,10 +134,10 @@ public abstract class VarLengthColumnPageBase extends ColumnPage { if (size < 0) { return getLVBytesColumnPage(columnSpec, lvEncodedBytes, DataTypes.createDecimalType(columnSpec.getPrecision(), columnSpec.getScale()), - CarbonCommonConstants.INT_SIZE_IN_BYTE, compressorName); + CarbonCommonConstants.INT_SIZE_IN_BYTE, meta.getCompressorName()); } else { // Here the size is always fixed. - return getDecimalColumnPage(columnSpec, lvEncodedBytes, size, compressorName); + return getDecimalColumnPage(meta, lvEncodedBytes, size); } } @@ -158,8 +159,10 @@ public abstract class VarLengthColumnPageBase extends ColumnPage { lvLength, compressorName); } - private static ColumnPage getDecimalColumnPage(TableSpec.ColumnSpec columnSpec, - byte[] lvEncodedBytes, int size, String compressorName) throws MemoryException { + private static ColumnPage getDecimalColumnPage(ColumnPageEncoderMeta meta, + byte[] lvEncodedBytes, int size) throws MemoryException { +TableSpec.ColumnSpec columnSpec = meta.getColumnSpec(); +String compressorName = meta.getCompressorName(); TableSpec.ColumnSpec spec = TableSpec.ColumnSpec .newInstance(columnSpec.getFieldName(), DataTypes.INT, ColumnType.MEASURE); ColumnPage rowOffset = ColumnPage.newPage( @@ -176,7 +179,7 @@ public abstract class VarLengthColumnPageBase extends ColumnPage { rowOffset.putInt(counter, offset); VarLengthColumnPageBase page; -if (unsafe) { +if (isUnsafeEnabled(meta)) { page = new UnsafeDecimalColumnPage( new ColumnPageEncoderMeta(columnSpec, columnSpec.getSchemaDataType(), compressorName), rowId); http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java index 4e491c5..d82a873 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java @@ -18,9 +18,11 @@ package org.apache.carbondata.core.datastore.page.encoding; import java.io.IOException; +import java.util.BitSet; import org.apache.carbondata.core.datastore.page.ColumnPage; import org.apache.carbondata.core.memory.MemoryException; +import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo; public interface ColumnPageDecoder { @@ -29,6 +31,12 @@ public interface ColumnPageDecoder { */ ColumnPage decode(byte[] input, int offset, int length) throws MemoryException, IOException; + /** + * Apply decoding algorithm on input byte array and fill the vector here. + */ + void decodeAndFillVector(byte[] input, int offset, int length, ColumnVectorInfo vectorInfo, + BitSet nullBits, boolean isLVEncoded) throws MemoryException, IOException; + ColumnPage decode(byte[] input, int offset, int length, boolean isLVEncoded) throws MemoryException, IOException; } http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageEncoderMeta.java -- diff --git
[1/3] carbondata git commit: [CARBONDATA-3012] Added support for full scan queries for vector direct fill.
Repository: carbondata Updated Branches: refs/heads/master e0baa9b9f -> 3d3b6ff16 http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnarBatch.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnarBatch.java b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnarBatch.java index 803715c..471f9b2 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnarBatch.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnarBatch.java @@ -56,7 +56,9 @@ public class CarbonColumnarBatch { actualSize = 0; rowCounter = 0; rowsFiltered = 0; -Arrays.fill(filteredRows, false); +if (filteredRows != null) { + Arrays.fill(filteredRows, false); +} for (int i = 0; i < columnVectors.length; i++) { columnVectors[i].reset(); } http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonDictionary.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonDictionary.java b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonDictionary.java index 50d2ac5..2147c43 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonDictionary.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonDictionary.java @@ -27,4 +27,6 @@ public interface CarbonDictionary { void setDictionaryUsed(); byte[] getDictionaryValue(int index); + + byte[][] getAllDictionaryValues(); } http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java index 59117dd..d127728 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java @@ -16,7 +16,10 @@ */ package org.apache.carbondata.core.scan.result.vector; +import java.util.BitSet; + import org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator; +import org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory; import org.apache.carbondata.core.scan.filter.GenericQueryType; import org.apache.carbondata.core.scan.model.ProjectionDimension; import org.apache.carbondata.core.scan.model.ProjectionMeasure; @@ -32,6 +35,8 @@ public class ColumnVectorInfo implements Comparable { public DirectDictionaryGenerator directDictionaryGenerator; public MeasureDataVectorProcessor.MeasureVectorFiller measureVectorFiller; public GenericQueryType genericQueryType; + public BitSet deletedRows; + public DecimalConverterFactory.DecimalConverter decimalConverter; @Override public int compareTo(ColumnVectorInfo o) { return ordinal - o.ordinal; http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java index f8f663f..5dfd6ca 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java @@ -146,7 +146,7 @@ public class CarbonColumnVectorImpl implements CarbonColumnVector { } } - @Override public void putBytes(int rowId, byte[] value) { + @Override public void putByteArray(int rowId, byte[] value) { bytes[rowId] = value; } @@ -160,7 +160,7 @@ public class CarbonColumnVectorImpl implements CarbonColumnVector { } } - @Override public void putBytes(int rowId, int offset, int length, byte[] value) { + @Override public void putByteArray(int rowId, int offset, int length, byte[] value) { bytes[rowId] = new byte[length]; System.arraycopy(value, offset, bytes[rowId], 0, length); } @@ -227,6 +227,31 @@ public class CarbonColumnVectorImpl implements CarbonColumnVector { } } + public Object getDataArray() { +if (dataType == DataTypes.BOOLEAN || dataType == DataTypes.BYTE) { + return byteArr;
[3/3] carbondata git commit: [CARBONDATA-3012] Added support for full scan queries for vector direct fill.
[CARBONDATA-3012] Added support for full scan queries for vector direct fill. After decompressing the page in our V3 reader we can immediately fill the data to a vector without any condition checks inside loops. So here complete column page data is set to column vector in a single batch and gives back data to Spark/Presto. For this purpose, a new method is added in ColumnPageDecoder ColumnPage decodeAndFillVector(byte[] input, int offset, int length, ColumnVectorInfo vectorInfo, BitSet nullBits, boolean isLVEncoded) The above method takes vector fill it in a single loop without any checks inside loop. And also added new method inside DimensionDataChunkStore void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data, ColumnVectorInfo vectorInfo); The above method takes vector fill it in a single loop without any checks inside loop. This closes #2818 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/3d3b6ff1 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/3d3b6ff1 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/3d3b6ff1 Branch: refs/heads/master Commit: 3d3b6ff1615e08131f6bcaea23dec0116a18081d Parents: e0baa9b Author: ravipesala Authored: Tue Oct 16 11:30:43 2018 +0530 Committer: kumarvishal09 Committed: Thu Oct 25 22:24:24 2018 +0530 -- .../chunk/impl/DimensionRawColumnChunk.java | 17 ++ .../impl/FixedLengthDimensionColumnPage.java| 29 +- .../chunk/impl/MeasureRawColumnChunk.java | 17 ++ .../impl/VariableLengthDimensionColumnPage.java | 29 +- .../reader/DimensionColumnChunkReader.java | 7 + .../chunk/reader/MeasureColumnChunkReader.java | 7 + .../reader/dimension/AbstractChunkReader.java | 11 + ...essedDimChunkFileBasedPageLevelReaderV3.java | 2 +- ...mpressedDimensionChunkFileBasedReaderV3.java | 78 +++-- .../measure/AbstractMeasureChunkReader.java | 12 + ...CompressedMeasureChunkFileBasedReaderV3.java | 45 ++- ...essedMsrChunkFileBasedPageLevelReaderV3.java | 6 +- .../chunk/store/DimensionChunkStoreFactory.java | 16 +- .../chunk/store/DimensionDataChunkStore.java| 7 + .../impl/LocalDictDimensionDataChunkStore.java | 25 ++ .../safe/AbstractNonDictionaryVectorFiller.java | 282 ++ .../SafeFixedLengthDimensionDataChunkStore.java | 51 +++- ...feVariableLengthDimensionDataChunkStore.java | 17 +- .../UnsafeAbstractDimensionDataChunkStore.java | 6 + .../datastore/columnar/BlockIndexerStorage.java | 5 +- .../BlockIndexerStorageForNoDictionary.java | 3 +- .../columnar/BlockIndexerStorageForShort.java | 3 +- .../core/datastore/columnar/UnBlockIndexer.java | 3 + .../core/datastore/impl/FileReaderImpl.java | 1 + .../core/datastore/page/ColumnPage.java | 130 .../page/ColumnPageValueConverter.java | 3 + .../datastore/page/SafeDecimalColumnPage.java | 25 ++ .../datastore/page/VarLengthColumnPageBase.java | 17 +- .../page/encoding/ColumnPageDecoder.java| 8 + .../page/encoding/ColumnPageEncoderMeta.java| 11 + .../page/encoding/EncodingFactory.java | 44 ++- .../adaptive/AdaptiveDeltaFloatingCodec.java| 82 + .../adaptive/AdaptiveDeltaIntegralCodec.java| 194 +++- .../adaptive/AdaptiveFloatingCodec.java | 84 +- .../adaptive/AdaptiveIntegralCodec.java | 157 ++ .../encoding/compress/DirectCompressCodec.java | 170 ++- .../datastore/page/encoding/rle/RLECodec.java | 9 + .../DateDirectDictionaryGenerator.java | 2 +- .../datatype/DecimalConverterFactory.java | 91 +- .../carbondata/core/mutate/DeleteDeltaVo.java | 4 + .../DictionaryBasedVectorResultCollector.java | 112 +-- .../executor/impl/AbstractQueryExecutor.java| 13 + .../scan/executor/infos/BlockExecutionInfo.java | 13 + .../core/scan/executor/util/QueryUtil.java | 2 +- .../carbondata/core/scan/model/QueryModel.java | 6 +- .../core/scan/result/BlockletScannedResult.java | 76 - .../scan/result/vector/CarbonColumnVector.java | 18 +- .../scan/result/vector/CarbonColumnarBatch.java | 4 +- .../scan/result/vector/CarbonDictionary.java| 2 + .../scan/result/vector/ColumnVectorInfo.java| 5 + .../vector/impl/CarbonColumnVectorImpl.java | 67 - .../vector/impl/CarbonDictionaryImpl.java | 3 + .../scan/scanner/impl/BlockletFullScanner.java | 4 +- .../core/stats/QueryStatisticsModel.java| 13 + .../apache/carbondata/core/util/ByteUtil.java | 8 + .../executer/IncludeFilterExecuterImplTest.java | 6 +- .../carbondata/core/util/CarbonUtilTest.java| 2 +- .../presto/CarbonColumnVectorWrapper.java | 65 +++- .../presto/readers/SliceStreamReader.java | 4 +- .../filterexpr
carbondata git commit: [CARBONDATA-3011] Add carbon property to configure vector based row pruning push down
Repository: carbondata Updated Branches: refs/heads/master 9578786b2 -> de6e98b08 [CARBONDATA-3011] Add carbon property to configure vector based row pruning push down Added below configuration in carbon to enable or disable row filter push down for vector. carbon.push.rowfilters.for.vector When enabled complete row filters will be handled by carbon in case of vector. If it is disabled then only page level pruning will be done by carbon and row level filtering will be done by spark for vector. There is no change in flow for non-vector based queries. Default value is true This closes #2818 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/de6e98b0 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/de6e98b0 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/de6e98b0 Branch: refs/heads/master Commit: de6e98b085723811b0894e659c3c4ce9770f7ca2 Parents: 9578786 Author: ravipesala Authored: Tue Oct 16 10:32:18 2018 +0530 Committer: kumarvishal09 Committed: Thu Oct 25 17:28:29 2018 +0530 -- .../core/constants/CarbonCommonConstants.java | 12 +++ .../carbondata/core/scan/model/QueryModel.java | 13 .../carbondata/core/util/CarbonProperties.java | 8 ++ .../carbondata/spark/rdd/CarbonScanRDD.scala| 17 +++- .../strategy/CarbonLateDecodeStrategy.scala | 82 +--- 5 files changed, 120 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/de6e98b0/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index fa5227b..72da3bd 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1725,6 +1725,18 @@ public final class CarbonCommonConstants { */ public static final String CARBON_WRITTEN_BY_APPNAME = "carbon.writtenby.app.name"; + /** + * When enabled complete row filters will be handled by carbon in case of vector. + * If it is disabled then only page level pruning will be done by carbon and row level filtering + * will be done by spark for vector. + * There is no change in flow for non-vector based queries. + */ + @CarbonProperty + public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR = + "carbon.push.rowfilters.for.vector"; + + public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = "true"; + // // Unused constants and parameters start here // http://git-wip-us.apache.org/repos/asf/carbondata/blob/de6e98b0/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java b/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java index d90c35e..0951da0 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java @@ -124,6 +124,11 @@ public class QueryModel { private boolean preFetchData = true; + /** + * It fills the vector directly from decoded column page with out any staging and conversions + */ + private boolean isDirectVectorFill; + private QueryModel(CarbonTable carbonTable) { tableBlockInfos = new ArrayList(); invalidSegmentIds = new ArrayList<>(); @@ -406,6 +411,14 @@ public class QueryModel { this.preFetchData = preFetchData; } + public boolean isDirectVectorFill() { +return isDirectVectorFill; + } + + public void setDirectVectorFill(boolean directVectorFill) { +isDirectVectorFill = directVectorFill; + } + @Override public String toString() { return String.format("scan on table %s.%s, %d projection columns with filter (%s)", http://git-wip-us.apache.org/repos/asf/carbondata/blob/de6e98b0/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java b/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java index e6d48e5..49d89e7 100644 --- a/core/src/main/java/org/apache
carbondata git commit: [CARBONDATA-2594] Do not add InvertedIndex in Encoding list for non-sort dimension column #2768
Repository: carbondata Updated Branches: refs/heads/master 8fbd4a5f5 -> 18fbdfc40 [CARBONDATA-2594] Do not add InvertedIndex in Encoding list for non-sort dimension column #2768 Not add InvertedIndex in Encoding list for non-sort dimension column This closes #2768 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/18fbdfc4 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/18fbdfc4 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/18fbdfc4 Branch: refs/heads/master Commit: 18fbdfc409dc14812c9f384c437a793e9293b32b Parents: 8fbd4a5 Author: Jacky Li Authored: Wed Sep 26 21:31:35 2018 +0800 Committer: kumarvishal09 Committed: Thu Oct 4 16:57:57 2018 +0530 -- .../carbondata/core/metadata/schema/table/TableSchemaBuilder.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/18fbdfc4/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java index f1be5ca..b5ce725 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java @@ -224,7 +224,7 @@ public class TableSchemaBuilder { } } } -if (newColumn.isDimensionColumn()) { +if (newColumn.isDimensionColumn() && newColumn.isSortColumn()) { newColumn.setUseInvertedIndex(true); } if (field.getDataType().isComplexType()) {
carbondata git commit: [HOTFIX] Fixed S3 metrics issue.
Repository: carbondata Updated Branches: refs/heads/master 2081bc87a -> 7d1fcb309 [HOTFIX] Fixed S3 metrics issue. Problem: When data read from s3 it shows the data read as more than the size of carbon data total size. Reason: It happens because carbondata uses dataInputStream.skip but in s3 interface it cannot handle properly it reads in a loop and reads more data than required. Solution: Use FSDataInputStream.seek instead of skip to fix this issue. This closes #2789 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/7d1fcb30 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/7d1fcb30 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/7d1fcb30 Branch: refs/heads/master Commit: 7d1fcb3092a1e9da6c49f17c63c6217892e9e531 Parents: 2081bc8 Author: ravipesala Authored: Fri Sep 28 18:29:08 2018 +0530 Committer: kumarvishal09 Committed: Wed Oct 3 16:08:49 2018 +0530 -- .../datastore/filesystem/AbstractDFSCarbonFile.java | 7 +-- .../apache/carbondata/core/reader/ThriftReader.java | 16 ++-- 2 files changed, 11 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/7d1fcb30/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java b/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java index b1e476b..c764430 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java @@ -327,8 +327,11 @@ public abstract class AbstractDFSCarbonFile implements CarbonFile { CompressionCodec codec = new CompressionCodecFactory(hadoopConf).getCodecByName(codecName); inputStream = codec.createInputStream(inputStream); } - -return new DataInputStream(new BufferedInputStream(inputStream)); +if (bufferSize <= 0 && inputStream instanceof FSDataInputStream) { + return (DataInputStream) inputStream; +} else { + return new DataInputStream(new BufferedInputStream(inputStream)); +} } /** http://git-wip-us.apache.org/repos/asf/carbondata/blob/7d1fcb30/core/src/main/java/org/apache/carbondata/core/reader/ThriftReader.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/reader/ThriftReader.java b/core/src/main/java/org/apache/carbondata/core/reader/ThriftReader.java index 48d8345..f5ecda6 100644 --- a/core/src/main/java/org/apache/carbondata/core/reader/ThriftReader.java +++ b/core/src/main/java/org/apache/carbondata/core/reader/ThriftReader.java @@ -25,6 +25,7 @@ import org.apache.carbondata.core.datastore.impl.FileFactory; import org.apache.carbondata.core.util.CarbonUtil; import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; import org.apache.thrift.TBase; import org.apache.thrift.TException; import org.apache.thrift.protocol.TCompactProtocol; @@ -36,10 +37,6 @@ import org.apache.thrift.transport.TIOStreamTransport; */ public class ThriftReader { /** - * buffer size - */ - private static final int bufferSize = 2048; - /** * File containing the objects. */ private String fileName; @@ -101,7 +98,7 @@ public class ThriftReader { public void open() throws IOException { Configuration conf = configuration != null ? configuration : FileFactory.getConfiguration(); FileFactory.FileType fileType = FileFactory.getFileType(fileName); -dataInputStream = FileFactory.getDataInputStream(fileName, fileType, bufferSize, conf); +dataInputStream = FileFactory.getDataInputStream(fileName, fileType, conf); binaryIn = new TCompactProtocol(new TIOStreamTransport(dataInputStream)); } @@ -109,7 +106,9 @@ public class ThriftReader { * This method will set the position of stream from where data has to be read */ public void setReadOffset(long bytesToSkip) throws IOException { -if (dataInputStream.skip(bytesToSkip) != bytesToSkip) { +if (dataInputStream instanceof FSDataInputStream) { + ((FSDataInputStream)dataInputStream).seek(bytesToSkip); +} else if (dataInputStream.skip(bytesToSkip) != bytesToSkip) { throw new IOException("It doesn't set the offset properly"); } } @@ -118,10 +117,7 @@ public class ThriftReader { * Checks if another objects is available by attempting to read another byte from the stream. */ public boolean hasNext() throws IOException { -
carbondata git commit: [CARBONDATA-2978] Fixed JVM crash issue when insert into carbon table from other carbon table
Repository: carbondata Updated Branches: refs/heads/master c01636163 -> 9ae91cc5a [CARBONDATA-2978] Fixed JVM crash issue when insert into carbon table from other carbon table Problem: When data is inserted from one carbon to other carbon table and unsafe load and query is enabled then JVM crash is happening. Reason: When insert happens from one carbon table another table it uses same task and thread so it gets the same taskid and at the unsafe manager tries to release all memory acquired by the task even though load happens on the task. Solution: Check the listeners and ignore cache clearing. This closes #2773 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/9ae91cc5 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/9ae91cc5 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/9ae91cc5 Branch: refs/heads/master Commit: 9ae91cc5a9d683ef54550cfe7e65c4d63d5e5a24 Parents: c016361 Author: ravipesala Authored: Wed Sep 26 23:04:59 2018 +0530 Committer: kumarvishal09 Committed: Fri Sep 28 19:51:06 2018 +0530 -- .../hadoop/api/CarbonTableOutputFormat.java | 35 + .../InsertIntoNonCarbonTableTestCase.scala | 79 +++- .../carbondata/spark/rdd/CarbonScanRDD.scala| 76 --- .../rdd/InsertTaskCompletionListener.scala | 4 +- .../spark/rdd/QueryTaskCompletionListener.scala | 4 +- .../datasources/SparkCarbonFileFormat.scala | 23 +- .../CarbonTaskCompletionListener.scala | 72 ++ 7 files changed, 246 insertions(+), 47 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/9ae91cc5/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java -- diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java index 28817e9..762983b 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java @@ -424,6 +424,8 @@ public class CarbonTableOutputFormat extends FileOutputFormathttp://git-wip-us.apache.org/repos/asf/carbondata/blob/9ae91cc5/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/insertQuery/InsertIntoNonCarbonTableTestCase.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/insertQuery/InsertIntoNonCarbonTableTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/insertQuery/InsertIntoNonCarbonTableTestCase.scala index a745672..a3fb11c 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/insertQuery/InsertIntoNonCarbonTableTestCase.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/insertQuery/InsertIntoNonCarbonTableTestCase.scala @@ -18,10 +18,13 @@ */ package org.apache.carbondata.spark.testsuite.insertQuery -import org.apache.spark.sql.Row +import org.apache.spark.sql.{Row, SaveMode} import org.apache.spark.sql.test.util.QueryTest import org.scalatest.BeforeAndAfterAll +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + class InsertIntoNonCarbonTableTestCase extends QueryTest with BeforeAndAfterAll { override def beforeAll { @@ -64,6 +67,8 @@ class InsertIntoNonCarbonTableTestCase extends QueryTest with BeforeAndAfterAll "Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions," + "Latest_operatorId,gamePointDescription,gamePointId,contractNumber', " + "'bad_records_logger_enable'='false','bad_records_action'='FORCE')") + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_UNSAFE_IN_QUERY_EXECUTION, "true") + CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE, "true") } test("insert into hive") { @@ -102,7 +107,79 @@ class InsertIntoNonCarbonTableTestCase extends QueryTest with BeforeAndAfterAll sql("drop table thive_cond") } + test("jvm crash when insert data from datasource table to session table") { +val spark = sqlContext.sparkSession +import spark.implicits._ + +import scala.util.Random +val r = new Random() +val df = spark.sparkContext.parallelize(1 to 10) + .map(x => (r.nextInt(10), "n
carbondata git commit: [CARBONDATA-2970]lock object creation fix for viewFS
Repository: carbondata Updated Branches: refs/heads/master 5d17ff40b -> 1b4109d5b [CARBONDATA-2970]lock object creation fix for viewFS Problem when default fs is set to ViewFS then the drop table and load fails with exception saying failed to get lock like meta.lock, tablestatus.lock. This is because when getting locktypeObject we wre not checking for viewfs and we are returning it as local file system and failes while acquiring Solution Check for viewFS also when trying to get the lock object This closes #2762 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/1b4109d5 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/1b4109d5 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/1b4109d5 Branch: refs/heads/master Commit: 1b4109d5b2badc0c10d5522502bd799c6325263c Parents: 5d17ff4 Author: akashrn5 Authored: Tue Sep 25 18:59:04 2018 +0530 Committer: kumarvishal09 Committed: Thu Sep 27 16:46:11 2018 +0530 -- .../java/org/apache/carbondata/core/locks/CarbonLockFactory.java | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/1b4109d5/core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java b/core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java index 91677a6..79bad6c 100644 --- a/core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java @@ -71,7 +71,8 @@ public class CarbonLockFactory { lockTypeConfigured = CarbonCommonConstants.CARBON_LOCK_TYPE_S3; return new S3FileLock(absoluteLockPath, lockFile); -} else if (absoluteLockPath.startsWith(CarbonCommonConstants.HDFSURL_PREFIX)) { +} else if (absoluteLockPath.startsWith(CarbonCommonConstants.HDFSURL_PREFIX) || absoluteLockPath +.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) { lockTypeConfigured = CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS; return new HdfsFileLock(absoluteLockPath, lockFile); } else {
carbondata git commit: [CARBONDATA-2968] Single pass load fails 2nd time in Spark submit execution due to port binding error
Repository: carbondata Updated Branches: refs/heads/master e07df44a1 -> 13ecc9e7a [CARBONDATA-2968] Single pass load fails 2nd time in Spark submit execution due to port binding error Problem : In secure cluster setup, single pass load is failing in spark-submit after using the beeline. Solution: It was happening because port was not getting updated and was not looking for the next empty port. port variable was not changing.So modified that part and added log to diplay the port number. This closes #2760 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/13ecc9e7 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/13ecc9e7 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/13ecc9e7 Branch: refs/heads/master Commit: 13ecc9e7a0a42ebf2f8417814c20474f3ce489f1 Parents: e07df44 Author: shardul-cr7 Authored: Tue Sep 25 19:55:19 2018 +0530 Committer: kumarvishal09 Committed: Wed Sep 26 14:16:21 2018 +0530 -- .../core/dictionary/server/NonSecureDictionaryServer.java | 3 ++- .../spark/dictionary/server/SecureDictionaryServer.java| 6 -- 2 files changed, 6 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/13ecc9e7/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServer.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServer.java b/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServer.java index 95f3d69..dc2d211 100644 --- a/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServer.java +++ b/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServer.java @@ -109,6 +109,7 @@ public class NonSecureDictionaryServer extends AbstractDictionaryServer }); bootstrap.childOption(ChannelOption.SO_KEEPALIVE, true); String hostToBind = findLocalIpAddress(LOGGER); +//iteratively listening to newports InetSocketAddress address = hostToBind == null ? new InetSocketAddress(newPort) : new InetSocketAddress(hostToBind, newPort); @@ -119,7 +120,7 @@ public class NonSecureDictionaryServer extends AbstractDictionaryServer this.host = hostToBind; break; } catch (Exception e) { -LOGGER.error(e, "Dictionary Server Failed to bind to port:"); +LOGGER.error(e, "Dictionary Server Failed to bind to port:" + newPort); if (i == 9) { throw new RuntimeException("Dictionary Server Could not bind to any port"); } http://git-wip-us.apache.org/repos/asf/carbondata/blob/13ecc9e7/integration/spark-common/src/main/java/org/apache/carbondata/spark/dictionary/server/SecureDictionaryServer.java -- diff --git a/integration/spark-common/src/main/java/org/apache/carbondata/spark/dictionary/server/SecureDictionaryServer.java b/integration/spark-common/src/main/java/org/apache/carbondata/spark/dictionary/server/SecureDictionaryServer.java index f4948c4..995e520 100644 --- a/integration/spark-common/src/main/java/org/apache/carbondata/spark/dictionary/server/SecureDictionaryServer.java +++ b/integration/spark-common/src/main/java/org/apache/carbondata/spark/dictionary/server/SecureDictionaryServer.java @@ -143,14 +143,16 @@ public class SecureDictionaryServer extends AbstractDictionaryServer implements TransportServerBootstrap bootstrap = new SaslServerBootstrap(transportConf, securityManager); String host = findLocalIpAddress(LOGGER); -context.createServer(host, port, Lists.newArrayList(bootstrap)); +//iteratively listening to newports +context +.createServer(host, newPort, Lists.newArrayList(bootstrap)); LOGGER.audit("Dictionary Server started, Time spent " + (System.currentTimeMillis() - start) + " Listening on port " + newPort); this.port = newPort; this.host = host; break; } catch (Exception e) { -LOGGER.error(e, "Dictionary Server Failed to bind to port:"); +LOGGER.error(e, "Dictionary Server Failed to bind to port: " + newPort); if (i == 9) { throw new RuntimeException("Dictionary Server Could not bind to any port"); }
carbondata git commit: [CARBONDATA-2962]Even after carbon file is copied to targetfolder(local/hdfs), carbon files is not deleted from temp directory
Repository: carbondata Updated Branches: refs/heads/master 2ab2254be -> 49f67153a [CARBONDATA-2962]Even after carbon file is copied to targetfolder(local/hdfs), carbon files is not deleted from temp directory Problem: Even after carbon file is copied to targetfolder(local/hdfs), carbon files is not deleted from temp directory. Solution: After copying Carbon data and index files from temp directory, delete those files. This closes #2752 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/49f67153 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/49f67153 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/49f67153 Branch: refs/heads/master Commit: 49f67153a21e5a0cb5705adeb0f056eef4d3ed25 Parents: 2ab2254 Author: Indhumathi27 Authored: Mon Sep 24 12:28:47 2018 +0530 Committer: kumarvishal09 Committed: Wed Sep 26 12:35:24 2018 +0530 -- .../store/writer/AbstractFactDataWriter.java| 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/49f67153/processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java -- diff --git a/processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java b/processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java index ad0e8e0..4afb3ef 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java +++ b/processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java @@ -270,12 +270,18 @@ public abstract class AbstractFactDataWriter implements CarbonFactDataWriter { notifyDataMapBlockEnd(); CarbonUtil.closeStreams(this.fileOutputStream, this.fileChannel); if (!enableDirectlyWriteData2Hdfs) { - if (copyInCurrentThread) { -CarbonUtil.copyCarbonDataFileToCarbonStorePath(carbonDataFileTempPath, -model.getCarbonDataDirectoryPath(), fileSizeInBytes); - } else { -executorServiceSubmitList.add(executorService.submit( -new CompleteHdfsBackendThread(carbonDataFileTempPath))); + try { +if (copyInCurrentThread) { + CarbonUtil.copyCarbonDataFileToCarbonStorePath(carbonDataFileTempPath, + model.getCarbonDataDirectoryPath(), fileSizeInBytes); + FileFactory + .deleteFile(carbonDataFileTempPath, FileFactory.getFileType(carbonDataFileTempPath)); +} else { + executorServiceSubmitList + .add(executorService.submit(new CompleteHdfsBackendThread(carbonDataFileTempPath))); +} + } catch (IOException e) { +LOGGER.error("Failed to delete carbondata file from temp location" + e.getMessage()); } } } @@ -405,6 +411,7 @@ public abstract class AbstractFactDataWriter implements CarbonFactDataWriter { CarbonUtil .copyCarbonDataFileToCarbonStorePath(indexFileName, model.getCarbonDataDirectoryPath(), fileSizeInBytes); + FileFactory.deleteFile(indexFileName, FileFactory.getFileType(indexFileName)); } } @@ -470,6 +477,7 @@ public abstract class AbstractFactDataWriter implements CarbonFactDataWriter { public Void call() throws Exception { CarbonUtil.copyCarbonDataFileToCarbonStorePath(fileName, model.getCarbonDataDirectoryPath(), fileSizeInBytes); + FileFactory.deleteFile(fileName, FileFactory.getFileType(fileName)); return null; } }
carbondata git commit: [CARBONDATA-2960] SDK Reader fix with projection columns
Repository: carbondata Updated Branches: refs/heads/master e3eb03054 -> 786db2171 [CARBONDATA-2960] SDK Reader fix with projection columns SDK Reader was not working when all projection columns were given. Added exception for Complex child projections too. Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/786db217 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/786db217 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/786db217 Branch: refs/heads/master Commit: 786db217120e1d341e9a7d00ce9576dccd1d96af Parents: e3eb030 Author: Manish Nalla Authored: Fri Sep 21 19:24:01 2018 +0530 Committer: kumarvishal09 Committed: Tue Sep 25 12:38:52 2018 +0530 -- .../hadoop/api/CarbonInputFormat.java | 13 - ...tNonTransactionalCarbonTableForMapType.scala | 53 .../sdk/file/CarbonReaderBuilder.java | 8 +++ 3 files changed, 73 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/786db217/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java -- diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java index 8183335..db93cbd 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java @@ -775,9 +775,20 @@ m filterExpression public String[] projectAllColumns(CarbonTable carbonTable) { List colList = carbonTable.getTableInfo().getFactTable().getListOfColumns(); List projectColumn = new ArrayList<>(); +// childCount will recursively count the number of children for any parent +// complex type and add just the parent column name while skipping the child columns. +int childDimCount = 0; for (ColumnSchema cols : colList) { if (cols.getSchemaOrdinal() != -1) { -projectColumn.add(cols.getColumnName()); +if (childDimCount == 0) { + projectColumn.add(cols.getColumnName()); +} +if (childDimCount > 0) { + childDimCount--; +} +if (cols.getDataType().isComplexType()) { + childDimCount += cols.getNumberOfChild(); +} } } String[] projectionColumns = new String[projectColumn.size()]; http://git-wip-us.apache.org/repos/asf/carbondata/blob/786db217/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTableForMapType.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTableForMapType.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTableForMapType.scala index a6bc224..b060ec1 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTableForMapType.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTableForMapType.scala @@ -20,15 +20,20 @@ package org.apache.carbondata.spark.testsuite.createTable import java.io.File import org.apache.commons.io.FileUtils +import org.apache.hadoop.conf.Configuration import org.apache.spark.sql.Row import org.apache.spark.sql.test.util.QueryTest import org.scalatest.BeforeAndAfterAll +import org.apache.carbondata.sdk.file.CarbonReader + /** * test cases for SDK complex map data type support */ class TestNonTransactionalCarbonTableForMapType extends QueryTest with BeforeAndAfterAll { + private val conf: Configuration = new Configuration(false) + private val nonTransactionalCarbonTable = new TestNonTransactionalCarbonTable private val writerPath = nonTransactionalCarbonTable.writerPath @@ -401,6 +406,54 @@ class TestNonTransactionalCarbonTableForMapType extends QueryTest with BeforeAnd dropSchema } + test("SDK Reader Without Projection Columns"){ +deleteDirectory(writerPath) +val mySchema = + """ +|{ +| "name": "address", +| "type": "record", +| "fields": [ +|{ +| "name": "name", +| "type": "string" +|}, +|{ +| "name": "age", +| "type": "
carbondata git commit: [CARBONDATA-2954]Fix error when create external table command fired if path already exists
Repository: carbondata Updated Branches: refs/heads/master 25d949cfa -> 759cb31f6 [CARBONDATA-2954]Fix error when create external table command fired if path already exists Problem : Creating a external table and providing a valid location having some empty directory and .carbondata files was giving "operation not allowed: invalid datapath provided" error. Solution: It was happening because if the location was having some empty directory getFilePathExternalFilePath method in carbonutil.java was returning null due to the presence of empty directory.So made a slight modification to prevent this problem. This closes #2739 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/759cb31f Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/759cb31f Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/759cb31f Branch: refs/heads/master Commit: 759cb31f64c22b1dd67b1b90e2edb89380f36094 Parents: 25d949c Author: shardul-cr7 Authored: Thu Sep 20 19:42:54 2018 +0530 Committer: kumarvishal09 Committed: Mon Sep 24 15:13:55 2018 +0530 -- .../core/metadata/schema/table/CarbonTable.java | 8 +++- .../org/apache/carbondata/core/util/CarbonUtil.java | 9 - .../createTable/TestNonTransactionalCarbonTable.scala | 10 ++ .../apache/carbondata/sdk/file/CarbonReaderTest.java| 12 4 files changed, 37 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/759cb31f/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java index c606063..3d04cca 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java @@ -261,7 +261,13 @@ public class CarbonTable implements Serializable { CarbonFile[] carbonFiles = tablePath.listFiles(); for (CarbonFile carbonFile : carbonFiles) { if (carbonFile.isDirectory()) { -return getFirstIndexFile(carbonFile); +// if the list has directories that doesn't contain index files, +// continue checking other files/directories in the list. +if (getFirstIndexFile(carbonFile) == null) { + continue; +} else { + return getFirstIndexFile(carbonFile); +} } else if (carbonFile.getName().endsWith(CarbonTablePath.INDEX_FILE_EXT)) { return carbonFile; } http://git-wip-us.apache.org/repos/asf/carbondata/blob/759cb31f/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java b/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java index 5a85b14..03054bf 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java @@ -2230,9 +2230,16 @@ public final class CarbonUtil { if (dataFile.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT)) { return dataFile.getAbsolutePath(); } else if (dataFile.isDirectory()) { -return getFilePathExternalFilePath(dataFile.getAbsolutePath(), configuration); +// if the list has directories that doesn't contain data files, +// continue checking other files/directories in the list. +if (getFilePathExternalFilePath(dataFile.getAbsolutePath(), configuration) == null) { + continue; +} else { + return getFilePathExternalFilePath(dataFile.getAbsolutePath(), configuration); +} } } +//returning null only if the path doesn't have data files. return null; } http://git-wip-us.apache.org/repos/asf/carbondata/blob/759cb31f/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala index b80a2f2..f6d12ab 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala +++ b/inte
carbondata git commit: [HOTFIX] Fix partition filter slow issue #2740
Repository: carbondata Updated Branches: refs/heads/master ed8564421 -> 25d949cfa [HOTFIX] Fix partition filter slow issue #2740 Problem: In FileSourceScanExec it lists all the files of partitions from CatalogFileIndex , it causes another job creation to list files per each query. Solution: Make the CatalogFileIndex as we don't want any list files. so make the CatalogFileIndex as dummy. This closes #2740 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/25d949cf Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/25d949cf Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/25d949cf Branch: refs/heads/master Commit: 25d949cfa82c9a29fe0e54ddbe54e890cc865b7f Parents: ed85644 Author: ravipesala Authored: Thu Sep 20 21:21:47 2018 +0530 Committer: kumarvishal09 Committed: Mon Sep 24 12:54:19 2018 +0530 -- .../execution/datasources/CarbonFileIndex.scala | 14 ++ .../strategy/CarbonLateDecodeStrategy.scala | 15 ++- 2 files changed, 24 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/25d949cf/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala -- diff --git a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala index 3a650ec..c57528f 100644 --- a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala +++ b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala @@ -51,6 +51,10 @@ class CarbonFileIndex( fileIndex: FileIndex) extends FileIndex with AbstractCarbonFileIndex { + // When this flag is set it just returns empty files during pruning. It is needed for carbon + // session partition flow as we handle directly through datamap pruining. + private var actAsDummy = false + override def rootPaths: Seq[Path] = fileIndex.rootPaths override def inputFiles: Array[String] = fileIndex.inputFiles @@ -70,6 +74,9 @@ class CarbonFileIndex( */ override def listFiles(partitionFilters: Seq[Expression], dataFilters: Seq[Expression]): Seq[PartitionDirectory] = { +if (actAsDummy) { + return Seq.empty +} val method = fileIndex.getClass.getMethods.find(_.getName == "listFiles").get val directories = method.invoke( @@ -143,11 +150,18 @@ class CarbonFileIndex( } override def listFiles(filters: Seq[Expression]): Seq[PartitionDirectory] = { +if (actAsDummy) { + return Seq.empty +} val method = fileIndex.getClass.getMethods.find(_.getName == "listFiles").get val directories = method.invoke(fileIndex, filters).asInstanceOf[Seq[PartitionDirectory]] prune(filters, directories) } + + def setDummy(actDummy: Boolean): Unit = { +actAsDummy = actDummy + } } /** http://git-wip-us.apache.org/repos/asf/carbondata/blob/25d949cf/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala index 8f128fe..f0184cd 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala @@ -34,7 +34,7 @@ import org.apache.spark.sql.optimizer.{CarbonDecoderRelation, CarbonFilters} import org.apache.spark.sql.sources.{BaseRelation, Filter} import org.apache.spark.sql.types._ import org.apache.spark.sql.CarbonExpressions.{MatchCast => Cast} -import org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil +import org.apache.spark.sql.carbondata.execution.datasources.{CarbonFileIndex, CarbonSparkDataSourceUtil} import org.apache.spark.util.CarbonReflectionUtils import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException @@ -704,11 +704,16 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { val sparkSession = relation.relation.sqlContext.sparkSession relation.catalogTable match { case Some(catalogTable) => -HadoopFsRelation( +val fi
carbondata git commit: [CARBONDATA-2958] Compaction with CarbonProperty 'carbon.enable.page.level.reader.in.compaction' enabled fails as Compressor is null
Repository: carbondata Updated Branches: refs/heads/master 8320918e5 -> ed8564421 [CARBONDATA-2958] Compaction with CarbonProperty 'carbon.enable.page.level.reader.in.compaction' enabled fails as Compressor is null Problem: When CarbonProperty 'carbon.enable.page.level.reader.in.compaction' is enabled, compaction fails throwing Null Pointer Exception as compressor is Null Solution: Set compressor from pageMetaData This closes #2745 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/ed856442 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/ed856442 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/ed856442 Branch: refs/heads/master Commit: ed856442166a96d1b414336945fb1dbc1d514c4a Parents: 8320918 Author: Indhumathi27 Authored: Fri Sep 21 15:24:39 2018 +0530 Committer: kumarvishal09 Committed: Mon Sep 24 12:24:28 2018 +0530 -- ...essedDimChunkFileBasedPageLevelReaderV3.java | 7 +++ ...andardPartitionTableCompactionTestCase.scala | 22 2 files changed, 29 insertions(+) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/ed856442/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimChunkFileBasedPageLevelReaderV3.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimChunkFileBasedPageLevelReaderV3.java b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimChunkFileBasedPageLevelReaderV3.java index e69984b..6efaf8a 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimChunkFileBasedPageLevelReaderV3.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimChunkFileBasedPageLevelReaderV3.java @@ -23,8 +23,10 @@ import java.nio.ByteBuffer; import org.apache.carbondata.core.datastore.FileReader; import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage; import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; +import org.apache.carbondata.core.datastore.compression.CompressorFactory; import org.apache.carbondata.core.memory.MemoryException; import org.apache.carbondata.core.metadata.blocklet.BlockletInfo; +import org.apache.carbondata.core.util.CarbonMetadataUtil; import org.apache.carbondata.core.util.CarbonUtil; import org.apache.carbondata.format.DataChunk2; import org.apache.carbondata.format.DataChunk3; @@ -146,6 +148,11 @@ public class CompressedDimChunkFileBasedPageLevelReaderV3 DataChunk3 dataChunk3 = dimensionRawColumnChunk.getDataChunkV3(); pageMetadata = dataChunk3.getData_chunk_list().get(pageNumber); + +if (compressor == null) { + this.compressor = CompressorFactory.getInstance().getCompressor( + CarbonMetadataUtil.getCompressorNameFromChunkMeta(pageMetadata.getChunk_meta())); +} // calculating the start point of data // as buffer can contain multiple column data, start point will be datachunkoffset + // data chunk length + page offset http://git-wip-us.apache.org/repos/asf/carbondata/blob/ed856442/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableCompactionTestCase.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableCompactionTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableCompactionTestCase.scala index 33e761f..23c2aa0 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableCompactionTestCase.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableCompactionTestCase.scala @@ -16,6 +16,7 @@ */ package org.apache.carbondata.spark.testsuite.standardpartition +import org.apache.spark.sql.Row import org.apache.spark.sql.test.util.QueryTest import org.scalatest.BeforeAndAfterAll @@ -183,6 +184,27 @@ class StandardPartitionTableCompactionTestCase extends QueryTest with BeforeAndA sql(s"""alter table compactionupdatepartition compact 'major'""").collect } + test("test compaction when 'carbon.enable.page.level.reader.in.compaction' is set to true") { +sql("DROP TABLE IF EXISTS originTable") +
carbondata git commit: [CARBONDATA-2950]alter add column of hive table fails from carbon for spark versions above 2.1
Repository: carbondata Updated Branches: refs/heads/master f962e41b7 -> 8320918e5 [CARBONDATA-2950]alter add column of hive table fails from carbon for spark versions above 2.1 Problem: spark does not support add columns in spark-2.1, but it is supported in 2.2 and above when add column is fired for hive table in carbon session, for spark -version above 2.1, it throws error as unsupported operation on hive table Solution: when alter add columns for hive is fired for spark-2.2 and above, it should not throw any exception and it should pass This closes #2735 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/8320918e Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/8320918e Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/8320918e Branch: refs/heads/master Commit: 8320918e55b393fedc946e4543843a72712d9199 Parents: f962e41 Author: akashrn5 Authored: Wed Sep 19 19:51:39 2018 +0530 Committer: kumarvishal09 Committed: Fri Sep 21 21:55:06 2018 +0530 -- .../sdv/generated/AlterTableTestCase.scala | 18 - .../lucene/LuceneFineGrainDataMapSuite.scala| 27 .../org/apache/carbondata/spark/util/Util.java | 2 +- .../spark/util/CarbonReflectionUtils.scala | 15 +++ .../sql/execution/strategy/DDLStrategy.scala| 21 +-- 5 files changed, 52 insertions(+), 31 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/8320918e/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/AlterTableTestCase.scala -- diff --git a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/AlterTableTestCase.scala b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/AlterTableTestCase.scala index 4e53ea3..90fa602 100644 --- a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/AlterTableTestCase.scala +++ b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/AlterTableTestCase.scala @@ -18,12 +18,14 @@ package org.apache.carbondata.cluster.sdv.generated +import org.apache.spark.SPARK_VERSION import org.apache.spark.sql.Row import org.apache.spark.sql.common.util._ -import org.apache.spark.sql.test.TestQueryExecutor +import org.apache.spark.util.SparkUtil import org.scalatest.BeforeAndAfterAll import org.apache.carbondata.common.constants.LoggerAction +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties @@ -1000,6 +1002,20 @@ class AlterTableTestCase extends QueryTest with BeforeAndAfterAll { sql(s"""drop table if exists uniqdata59""").collect } + test("Alter table add column for hive table for spark version above 2.1") { +sql("drop table if exists alter_hive") +sql("create table alter_hive(name string)") +if(SPARK_VERSION.startsWith("2.1")) { + val exception = intercept[MalformedCarbonCommandException] { +sql("alter table alter_hive add columns(add string)") + } + assert(exception.getMessage.contains("Unsupported alter operation on hive table")) +} else if (SparkUtil.isSparkVersionXandAbove("2.2")) { + sql("alter table alter_hive add columns(add string)") + sql("insert into alter_hive select 'abc','banglore'") +} + } + val prop = CarbonProperties.getInstance() val p1 = prop.getProperty("carbon.horizontal.compaction.enable", CarbonCommonConstants.defaultIsHorizontalCompactionEnabled) val p2 = prop.getProperty("carbon.horizontal.update.compaction.threshold", CarbonCommonConstants.DEFAULT_UPDATE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION) http://git-wip-us.apache.org/repos/asf/carbondata/blob/8320918e/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala index 0c6134b..2e3019a 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala +++ b/integration/spark-common-test/src/test/scala/
carbondata git commit: [CARBONDATA-2953]fixed dataload failure with sort columns and query wrong result from other session
Repository: carbondata Updated Branches: refs/heads/master edfcdca0a -> f962e41b7 [CARBONDATA-2953]fixed dataload failure with sort columns and query wrong result from other session Problem: when dataload is done with sort columns, it fails with following exeptions when two sessions are running in parallel, the follow below steps in session1 drop table create table load data to table follow below step in session2 query on table(select * from table limit 1), then the query returns null result instead of proper result Solution During sorting, the index increament for no dictionary measure data was not happening correctly, hence was trying to cast to byte array and failing If table is dropped from first session and created again, and queries from another session, the metastore needs to be updated for newly created table, but since the database in identifier was None. we were trying to get old table from default database, here need to get from current database This closes #2743 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/f962e41b Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/f962e41b Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/f962e41b Branch: refs/heads/master Commit: f962e41b7f2c2dd29ae71ad5e1f7797e3aaec084 Parents: edfcdca Author: akashrn5 Authored: Thu Sep 20 15:39:01 2018 +0530 Committer: kumarvishal09 Committed: Fri Sep 21 18:46:25 2018 +0530 -- .../execution/command/datamap/CarbonDataMapShowCommand.scala| 2 +- .../scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala | 5 - .../processing/loading/partition/impl/RawRowComparator.java | 2 +- .../sort/sortdata/IntermediateSortTempRowComparator.java| 2 +- .../carbondata/processing/sort/sortdata/NewRowComparator.java | 2 +- 5 files changed, 8 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/f962e41b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapShowCommand.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapShowCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapShowCommand.scala index b583a30..ae33aa8 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapShowCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapShowCommand.scala @@ -57,8 +57,8 @@ case class CarbonDataMapShowCommand(tableIdentifier: Option[TableIdentifier]) val dataMapSchemaList: util.List[DataMapSchema] = new util.ArrayList[DataMapSchema]() tableIdentifier match { case Some(table) => -Checker.validateTableExists(table.database, table.table, sparkSession) val carbonTable = CarbonEnv.getCarbonTable(table)(sparkSession) +Checker.validateTableExists(table.database, table.table, sparkSession) if (carbonTable.hasDataMapSchema) { dataMapSchemaList.addAll(carbonTable.getTableInfo.getDataMapSchemaList) } http://git-wip-us.apache.org/repos/asf/carbondata/blob/f962e41b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala index 1840c5d..982bbee 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala @@ -580,7 +580,10 @@ class CarbonFileMetastore extends CarbonMetaStore { tableModifiedTimeStore.get(CarbonCommonConstants.DATABASE_DEFAULT_NAME))) { metadata.carbonTables = metadata.carbonTables.filterNot( table => table.getTableName.equalsIgnoreCase(tableIdentifier.table) && - table.getDatabaseName.equalsIgnoreCase(tableIdentifier.database.getOrElse("default"))) + table.getDatabaseName + .equalsIgnoreCase(tableIdentifier.database + .getOrElse(SparkSession.getActiveSession.get.sessionState.catalog + .getCurrentDatabase))) updateSchemasUpdatedTime(lastModifiedTime) isRefreshed = true } http://git-wip-us.apache.org/repos/asf/carbondata/blob/f962e41b/processing/src/main/java/org/apache/carbondata/processing/loading/partition
carbondata git commit: [HOTFIX] Correct metrics and avoid twice read when prefetch is disabled
Repository: carbondata Updated Branches: refs/heads/master 817230da1 -> b04269b2b [HOTFIX] Correct metrics and avoid twice read when prefetch is disabled When prefetch is disabled in full scan queries read twice the data. This PR removes extra read. This closes #2737 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/b04269b2 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/b04269b2 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/b04269b2 Branch: refs/heads/master Commit: b04269b2b8d05ce21e2fb4f8ebeab668e902aba7 Parents: 817230d Author: ravipesala Authored: Thu Sep 20 14:44:09 2018 +0530 Committer: kumarvishal09 Committed: Fri Sep 21 16:46:15 2018 +0530 -- .../carbondata/core/scan/scanner/impl/BlockletFullScanner.java| 3 --- 1 file changed, 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/b04269b2/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java b/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java index f61a8b1..4ec8cb6 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java @@ -84,9 +84,6 @@ public class BlockletFullScanner implements BlockletScanner { String blockletId = blockExecutionInfo.getBlockIdString() + CarbonCommonConstants.FILE_SEPARATOR + rawBlockletColumnChunks.getDataBlock().blockletIndex(); scannedResult.setBlockletId(blockletId); -if (!blockExecutionInfo.isPrefetchBlocklet()) { - readBlocklet(rawBlockletColumnChunks); -} DimensionRawColumnChunk[] dimensionRawColumnChunks = rawBlockletColumnChunks.getDimensionRawColumnChunks(); DimensionColumnPage[][] dimensionColumnDataChunks =
carbondata git commit: [HOTFIX] Old stores cannot read with new table infered through sdk.
Repository: carbondata Updated Branches: refs/heads/master daa91c88e -> 4c692d185 [HOTFIX] Old stores cannot read with new table infered through sdk. Problem: Old stores column schema is written in the different case then fileformat cannot read data because of sdk infer schema always gives lower case schema. Solution: Do case insensitivity check while comparing. It also disables prefetch as it is redundant for fileformat read and not getting inputmetrics properly if we use thread This closes #2704 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/4c692d18 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/4c692d18 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/4c692d18 Branch: refs/heads/master Commit: 4c692d185c4247e645d94c2d79787744c413817b Parents: daa91c8 Author: ravipesala Authored: Mon Sep 10 21:11:18 2018 +0530 Committer: kumarvishal09 Committed: Wed Sep 12 19:16:39 2018 +0530 -- .../apache/carbondata/core/metadata/CarbonMetadata.java | 5 +++-- .../metadata/schema/table/AggregationDataMapSchema.java | 4 ++-- .../core/metadata/schema/table/column/ColumnSchema.java | 2 +- .../core/scan/executor/impl/AbstractQueryExecutor.java| 6 +- .../core/scan/executor/util/RestructureUtil.java | 7 --- .../scan/expression/logical/BinaryLogicalExpression.java | 2 +- .../apache/carbondata/core/scan/filter/FilterUtil.java| 2 +- .../org/apache/carbondata/core/scan/model/QueryModel.java | 10 ++ .../apache/carbondata/core/util/BlockletDataMapUtil.java | 2 +- .../java/org/apache/carbondata/core/util/CarbonUtil.java | 2 +- .../execution/datasources/SparkCarbonFileFormat.scala | 1 + 11 files changed, 30 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/4c692d18/core/src/main/java/org/apache/carbondata/core/metadata/CarbonMetadata.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/CarbonMetadata.java b/core/src/main/java/org/apache/carbondata/core/metadata/CarbonMetadata.java index 3f8c12d..850f477 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/CarbonMetadata.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/CarbonMetadata.java @@ -143,7 +143,7 @@ public final class CarbonMetadata { List listOfCarbonDims = carbonTable.getDimensionByTableName(carbonTable.getTableName()); for (CarbonDimension dimension : listOfCarbonDims) { - if (dimension.getColumnId().equals(columnIdentifier)) { + if (dimension.getColumnId().equalsIgnoreCase(columnIdentifier)) { return dimension; } if (dimension.getNumberOfChild() > 0) { @@ -168,7 +168,8 @@ public final class CarbonMetadata { private CarbonDimension getCarbonChildDimsBasedOnColIdentifier(String columnIdentifier, CarbonDimension dimension) { for (int i = 0; i < dimension.getNumberOfChild(); i++) { - if (dimension.getListOfChildDimensions().get(i).getColumnId().equals(columnIdentifier)) { + if (dimension.getListOfChildDimensions().get(i).getColumnId() + .equalsIgnoreCase(columnIdentifier)) { return dimension.getListOfChildDimensions().get(i); } else if (dimension.getListOfChildDimensions().get(i).getNumberOfChild() > 0) { CarbonDimension childDim = getCarbonChildDimsBasedOnColIdentifier(columnIdentifier, http://git-wip-us.apache.org/repos/asf/carbondata/blob/4c692d18/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/AggregationDataMapSchema.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/AggregationDataMapSchema.java b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/AggregationDataMapSchema.java index 2bb6d18..c8bb5ad 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/AggregationDataMapSchema.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/AggregationDataMapSchema.java @@ -152,7 +152,7 @@ public class AggregationDataMapSchema extends DataMapSchema { List parentColumnTableRelations = columnSchema.getParentColumnTableRelations(); if (null != parentColumnTableRelations && parentColumnTableRelations.size() == 1 - && parentColumnTableRelations.get(0).getColumnName().equals(columName) && + && parentColumnTableRelations.get(0).getColumnName().equalsIgnoreCase(columName) && columnSchema.getColumnName().endsWith(columName)) { return columnSchema; } @@ -198,7 +19
carbondata git commit: [CARBONDATA-2915] update document links
Repository: carbondata Updated Branches: refs/heads/master a9cc43411 -> 73a5885a4 [CARBONDATA-2915] update document links update document links This closes #2707 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/73a5885a Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/73a5885a Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/73a5885a Branch: refs/heads/master Commit: 73a5885a4a4ab85aab45602bd2c6ab93f40f98dc Parents: a9cc434 Author: Raghunandan S Authored: Tue Sep 11 12:59:53 2018 +0800 Committer: kumarvishal09 Committed: Tue Sep 11 10:47:32 2018 +0530 -- README.md | 9 - docs/dml-of-carbondata.md | 6 +++--- docs/language-manual.md | 2 +- 3 files changed, 8 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/73a5885a/README.md -- diff --git a/README.md b/README.md index 960d4e9..ba2cbf7 100644 --- a/README.md +++ b/README.md @@ -48,8 +48,7 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com * [Quick Start](https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md) * [CarbonData File Structure](https://github.com/apache/carbondata/blob/master/docs/file-structure-of-carbondata.md) * [Data Types](https://github.com/apache/carbondata/blob/master/docs/supported-data-types-in-carbondata.md) -* [Data Management on CarbonData](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md) -* [Cluster Installation and Deployment](https://github.com/apache/carbondata/blob/master/docs/installation-guide.md) +* [Data Management on CarbonData](https://github.com/apache/carbondata/blob/master/docs/language-manual.md) * [Configuring Carbondata](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md) * [Streaming Ingestion](https://github.com/apache/carbondata/blob/master/docs/streaming-guide.md) * [SDK Guide](https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md) @@ -60,9 +59,9 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com * [CarbonData Lucene DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/lucene-datamap-guide.md) * [CarbonData Pre-aggregate DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md) * [CarbonData Timeseries DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/timeseries-datamap-guide.md) +* [Performance Tuning](https://github.com/apache/carbondata/blob/master/docs/performance-tuning.md) * [FAQ](https://github.com/apache/carbondata/blob/master/docs/faq.md) -* [Trouble Shooting](https://github.com/apache/carbondata/blob/master/docs/troubleshooting.md) -* [Useful Tips](https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md) +* [Use Cases](https://github.com/apache/carbondata/blob/master/docs/usecases.md) ## Other Technical Material * [Apache CarbonData meetup material](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609) @@ -70,7 +69,7 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com ## Fork and Contribute This is an active open source project for everyone, and we are always open to people who want to use this system or contribute to it. -This guide document introduce [how to contribute to CarbonData](https://github.com/apache/carbondata/blob/master/docs/How-to-contribute-to-Apache-CarbonData.md). +This guide document introduce [how to contribute to CarbonData](https://github.com/apache/carbondata/blob/master/docs/how-to-contribute-to-apache-carbondata.md). ## Contact us To get involved in CarbonData: http://git-wip-us.apache.org/repos/asf/carbondata/blob/73a5885a/docs/dml-of-carbondata.md -- diff --git a/docs/dml-of-carbondata.md b/docs/dml-of-carbondata.md index 42da655..98bb132 100644 --- a/docs/dml-of-carbondata.md +++ b/docs/dml-of-carbondata.md @@ -46,7 +46,7 @@ CarbonData DML statements are documented here,which includes: | --- | | | [DELIMITER](#delimiter) | Character used to separate the data in the input csv file| | [QUOTECHAR](#quotechar) | Character used to quote the data in the input csv file | -| [COMMENTCHAR](#commentchar) | Character used to comment the rows in the input csv file.Those rows will be skipped from processing | +| [COMMENTCHAR](#commentc
carbondata git commit: [CARBONDATA-2889]Add decoder based fallback mechanism in local dictionary to reduce memory footprint
Repository: carbondata Updated Branches: refs/heads/master 9ebab5748 -> 2ccdbb78c [CARBONDATA-2889]Add decoder based fallback mechanism in local dictionary to reduce memory footprint Problem Currently, when the fallback is initiated for a column page in case of local dictionary, we are keeping both encoded data and actual data in memory and then we form the new column page without dictionary encoding and then at last we free the Encoded Column Page.Because of this offheap memory footprint increases. Solution We can reduce the offheap memory footprint. This can be done using decoder based fallback mechanism. This means, no need to keep the actual data along with encoded data in encoded column page. We can keep only encoded data and to form a new column page, get the dictionary data from encoded column page by uncompressing and using dictionary data get the actual data using local dictionary generator and put it in new column page created and compress it again and give to consumer for writing blocklet. This closes #2662 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/2ccdbb78 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/2ccdbb78 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/2ccdbb78 Branch: refs/heads/master Commit: 2ccdbb78c461be8c68770f7732c233c319a65ad1 Parents: 9ebab57 Author: akashrn5 Authored: Mon Aug 20 10:29:26 2018 +0530 Committer: kumarvishal09 Committed: Mon Sep 10 14:24:55 2018 +0530 -- .../core/constants/CarbonCommonConstants.java | 10 ++ .../blocklet/BlockletEncodedColumnPage.java | 42 +- .../datastore/blocklet/EncodedBlocklet.java | 34 +++-- .../reader/dimension/AbstractChunkReader.java | 15 --- .../AbstractChunkReaderV2V3Format.java | 12 -- ...mpressedDimensionChunkFileBasedReaderV1.java | 2 +- ...mpressedDimensionChunkFileBasedReaderV2.java | 8 +- ...essedDimChunkFileBasedPageLevelReaderV3.java | 4 +- ...mpressedDimensionChunkFileBasedReaderV3.java | 10 +- .../page/ActualDataBasedFallbackEncoder.java| 67 ++ .../core/datastore/page/ColumnPage.java | 9 +- .../page/DecoderBasedFallbackEncoder.java | 132 +++ .../page/FallbackColumnPageEncoder.java | 86 .../datastore/page/LocalDictColumnPage.java | 19 ++- .../apache/carbondata/core/util/CarbonUtil.java | 73 ++ .../VectorizedCarbonRecordReader.java | 3 +- .../store/writer/v3/BlockletDataHolder.java | 11 +- .../writer/v3/CarbonFactDataWriterImplV3.java | 2 +- 18 files changed, 389 insertions(+), 150 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/2ccdbb78/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 3bdb2f7..7a34c98 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -881,6 +881,16 @@ public final class CarbonCommonConstants { public static final String LOCAL_DICTIONARY_SYSTEM_ENABLE = "carbon.local.dictionary.enable"; /** + * System property to enable or disable decoder based local dictionary fallback + */ + public static final String LOCAL_DICTIONARY_DECODER_BASED_FALLBACK = + "carbon.local.dictionary.decoder.fallback"; + + /** + * System property to enable or disable decoder based local dictionary fallback default value + */ + public static final String LOCAL_DICTIONARY_DECODER_BASED_FALLBACK_DEFAULT = "true"; + /** * Threshold value for local dictionary */ public static final String LOCAL_DICTIONARY_THRESHOLD = "local_dictionary_threshold"; http://git-wip-us.apache.org/repos/asf/carbondata/blob/2ccdbb78/core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java b/core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java index 8abc0e4..135b1e2 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java @@ -26,10 +26,12 @@ import java.util.concurrent.Future; import org.apache.ca
carbondata git commit: [CARBONDATA-2895] Fix Query result count is more than actual csv rows with Batch-sort in save to disk (sort temp files) scenario
Repository: carbondata Updated Branches: refs/heads/master 94d2089b2 -> 50248f51b [CARBONDATA-2895] Fix Query result count is more than actual csv rows with Batch-sort in save to disk (sort temp files) scenario probelm: Query result mismatch with Batch-sort in save to disk (sort temp files) scenario. scenario: a) Configure batchsort but give batch size more than UnsafeMemoryManager.INSTANCE.getUsableMemory(). b) Load data that is greater than batch size. Observe that unsafeMemoryManager save to disk happened as it cannot process one batch. c) so load happens in 2 batch. d) When query the results. There result data rows is more than expected data rows. root cause: For each batch, createSortDataRows() will be called. Files saved to disk during sorting of previous batch was considered for this batch. solution: Files saved to disk during sorting of previous batch ,should not be considered for this batch. Hence use batchID as rangeID field of sorttempfiles. So getFilesToMergeSort() will select files of only this batch. This closes #2664 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/50248f51 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/50248f51 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/50248f51 Branch: refs/heads/master Commit: 50248f51bcaf44f37429d2420c6ecf5c815c3770 Parents: 94d2089 Author: ajantha-bhat Authored: Mon Aug 27 20:55:03 2018 +0530 Committer: kumarvishal09 Committed: Wed Sep 5 20:30:59 2018 +0530 -- .../impl/UnsafeBatchParallelReadMergeSorterImpl.java | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/50248f51/processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java -- diff --git a/processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java b/processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java index 5cb099e..1b1d383 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java +++ b/processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java @@ -62,12 +62,17 @@ public class UnsafeBatchParallelReadMergeSorterImpl extends AbstractMergeSorter private AtomicLong rowCounter; + /* will be incremented for each batch. This ID is used in sort temp files name, + to identify files of that batch */ + private AtomicInteger batchId; + public UnsafeBatchParallelReadMergeSorterImpl(AtomicLong rowCounter) { this.rowCounter = rowCounter; } @Override public void initialize(SortParameters sortParameters) { this.sortParameters = sortParameters; +batchId = new AtomicInteger(0); } @@ -172,7 +177,7 @@ public class UnsafeBatchParallelReadMergeSorterImpl extends AbstractMergeSorter } - private static class SortBatchHolder + private class SortBatchHolder extends CarbonIterator { private SortParameters sortParameters; @@ -193,7 +198,7 @@ public class UnsafeBatchParallelReadMergeSorterImpl extends AbstractMergeSorter private final Object lock = new Object(); -public SortBatchHolder(SortParameters sortParameters, int numberOfThreads, +SortBatchHolder(SortParameters sortParameters, int numberOfThreads, ThreadStatusObserver threadStatusObserver) { this.sortParameters = sortParameters.getCopy(); this.iteratorCount = new AtomicInteger(numberOfThreads); @@ -203,6 +208,12 @@ public class UnsafeBatchParallelReadMergeSorterImpl extends AbstractMergeSorter } private void createSortDataRows() { + // For each batch, createSortDataRows() will be called. + // Files saved to disk during sorting of previous batch,should not be considered + // for this batch. + // Hence use batchID as rangeID field of sorttempfiles. + // so getFilesToMergeSort() will select only this batch files. + this.sortParameters.setRangeId(batchId.incrementAndGet()); int inMemoryChunkSizeInMB = CarbonProperties.getInstance().getSortMemoryChunkSizeInMB(); setTempLocation(sortParameters); this.finalMerger = new UnsafeSingleThreadFinalSortFilesMerger(sortParameters,
carbondata git commit: [HOTFIX] improve sdk multi-thread performance
Repository: carbondata Updated Branches: refs/heads/master af2c469bb -> 94d2089b2 [HOTFIX] improve sdk multi-thread performance problem: currently SDK writer will create multiple iterators in multi-thread scenario. But filling each iterator is not happening concurrently as it is synchronized at method level. solution: In SDK multi-thread write scenario, don't synchronize method level. Synchronize at iterator level. As each iterator has its own queue, it can be done concurrently. Also for Avro can use sdkUserCore in input processor step. This closes #2672 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/94d2089b Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/94d2089b Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/94d2089b Branch: refs/heads/master Commit: 94d2089b246d2e4dc0ea2a673a89553e5eff1e35 Parents: af2c469 Author: ajantha-bhat Authored: Wed Aug 29 23:11:09 2018 +0530 Committer: kumarvishal09 Committed: Wed Sep 5 20:25:39 2018 +0530 -- .../hadoop/api/CarbonTableOutputFormat.java | 27 +++-- .../loading/DataLoadProcessBuilder.java | 4 +- .../loading/model/CarbonLoadModel.java | 14 +-- .../loading/steps/InputProcessorStepImpl.java | 7 +- .../InputProcessorStepWithNoConverterImpl.java | 31 + .../steps/JsonInputProcessorStepImpl.java | 9 +- .../util/CarbonDataProcessorUtil.java | 6 +- .../sdk/file/CarbonWriterBuilder.java | 6 +- .../sdk/file/ConcurrentAvroSdkWriterTest.java | 116 +++ 9 files changed, 162 insertions(+), 58 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/94d2089b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java -- diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java index 5cc275b..99d8532 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java @@ -23,6 +23,7 @@ import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; +import java.util.concurrent.atomic.AtomicLong; import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.constants.CarbonLoadOptionConstants; @@ -235,8 +236,8 @@ public class CarbonTableOutputFormat extends FileOutputFormat 0) ? sdkUserCore : 1; +short sdkWriterCores = loadModel.getSdkWriterCores(); +int itrSize = (sdkWriterCores > 0) ? sdkWriterCores : 1; final CarbonOutputIteratorWrapper[] iterators = new CarbonOutputIteratorWrapper[itrSize]; for (int i = 0; i < itrSize; i++) { iterators[i] = new CarbonOutputIteratorWrapper(); @@ -273,7 +274,7 @@ public class CarbonTableOutputFormat extends FileOutputFormat 0) { +if (sdkWriterCores > 0) { // CarbonMultiRecordWriter handles the load balancing of the write rows in round robin. return new CarbonMultiRecordWriter(iterators, dataLoadExecutor, loadModel, future, executorService); @@ -460,27 +461,31 @@ public class CarbonTableOutputFormat extends FileOutputFormathttp://git-wip-us.apache.org/repos/asf/carbondata/blob/94d2089b/processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java -- diff --git a/processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java b/processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java index 666c598..a628d41 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java +++ b/processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java @@ -313,8 +313,8 @@ public final class DataLoadProcessBuilder { } TableSpec tableSpec = new TableSpec(carbonTable); configuration.setTableSpec(tableSpec); -if (loadModel.getSdkUserCores() > 0) { - configuration.setWritingCoresCount(loadModel.getSdkUserCores()); +if (loadModel.getSdkWriterCores() > 0) { + configuration.setWritingCoresCount(loadModel.getSdkWriterCores()); } return configuration; } http://git-wip-us.apache.org/repos/asf/carbondata/blob/94d2089b/processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java -
carbondata git commit: [CARBONDATA-2898] Fix double boundary condition and clear datamaps issue
Repository: carbondata Updated Branches: refs/heads/master e8ddbbb02 -> de0f54516 [CARBONDATA-2898] Fix double boundary condition and clear datamaps issue 1.DataMaps are not clearing properly as it creates temp table for each request. Now it searches the datamap using the tablepath to clear and also to get the datamap In double value bounadry cases loading fails as carbon does not handle infinite properly. Now added a check for infinite value. Added validations for sort columns cannot be used while infering the schema. This closes #2666 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/de0f5451 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/de0f5451 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/de0f5451 Branch: refs/heads/master Commit: de0f54516431eb1454a588d48641e0f540127279 Parents: e8ddbbb Author: ravipesala Authored: Tue Aug 28 17:17:38 2018 +0530 Committer: kumarvishal09 Committed: Thu Aug 30 11:53:06 2018 +0530 -- .../core/datamap/DataMapStoreManager.java | 41 -- .../core/datastore/impl/FileFactory.java| 20 + .../statistics/PrimitivePageStatsCollector.java | 14 +++- .../datasources/SparkCarbonFileFormat.scala | 17 ++-- .../datasource/SparkCarbonDataSourceTest.scala | 86 +++- .../loading/model/CarbonLoadModelBuilder.java | 4 +- 6 files changed, 157 insertions(+), 25 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/de0f5451/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java index 6e4fb4d..22db211 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java @@ -315,6 +315,13 @@ public final class DataMapStoreManager { String tableUniqueName = table.getAbsoluteTableIdentifier().getCarbonTableIdentifier().getTableUniqueName(); List tableIndices = allDataMaps.get(tableUniqueName); +if (tableIndices == null) { + String keyUsingTablePath = getKeyUsingTablePath(table.getTablePath()); + if (keyUsingTablePath != null) { +tableUniqueName = keyUsingTablePath; +tableIndices = allDataMaps.get(tableUniqueName); + } +} TableDataMap dataMap = null; if (tableIndices != null) { dataMap = getTableDataMap(dataMapSchema.getDataMapName(), tableIndices); @@ -341,6 +348,18 @@ public final class DataMapStoreManager { return dataMap; } + private String getKeyUsingTablePath(String tablePath) { +if (tablePath != null) { + // Try get using table path + for (Map.Entry entry : tablePathMap.entrySet()) { +if (new Path(entry.getValue()).equals(new Path(tablePath))) { + return entry.getKey(); +} + } +} +return null; + } + /** * Return a new datamap instance and registered in the store manager. * The datamap is created using datamap name, datamap factory class and table identifier. @@ -379,6 +398,13 @@ public final class DataMapStoreManager { getTableSegmentRefresher(table); List tableIndices = allDataMaps.get(tableUniqueName); if (tableIndices == null) { + String keyUsingTablePath = getKeyUsingTablePath(table.getTablePath()); + if (keyUsingTablePath != null) { +tableUniqueName = keyUsingTablePath; +tableIndices = allDataMaps.get(tableUniqueName); + } +} +if (tableIndices == null) { tableIndices = new ArrayList<>(); } @@ -434,14 +460,11 @@ public final class DataMapStoreManager { CarbonTable carbonTable = getCarbonTable(identifier); String tableUniqueName = identifier.getCarbonTableIdentifier().getTableUniqueName(); List tableIndices = allDataMaps.get(tableUniqueName); -if (tableIndices == null && identifier.getTablePath() != null) { - // Try get using table path - for (Map.Entry entry : tablePathMap.entrySet()) { -if (new Path(entry.getValue()).equals(new Path(identifier.getTablePath( { - tableIndices = allDataMaps.get(entry.getKey()); - tableUniqueName = entry.getKey(); - break; -} +if (tableIndices == null) { + String keyUsingTablePath = getKeyUsingTablePath(identifier.getTablePath()); + if (keyUsingTablePath != null) { +tableUniqueName = keyUsingTablePath; +tableIndices = allDataMaps.get(tableUniqueName); } } if (null != carbonTable && tableInd
carbondata git commit: [CARBONDATA-2887] Fix complex filters on spark carbon file format
Repository: carbondata Updated Branches: refs/heads/master d801548aa -> 2f537b724 [CARBONDATA-2887] Fix complex filters on spark carbon file format Problem: Filters on complex types are not working using carbon fileformat as it try to push down nonull filter of complex type to carbon, but carbon does not handle any type of filters in complex types. Solution: Removed all types complex filters pushed down from carbon fileformat This closes #2659 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/2f537b72 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/2f537b72 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/2f537b72 Branch: refs/heads/master Commit: 2f537b724f6f03ab40c95f7ecc8ebd38f6500099 Parents: d801548 Author: ravipesala Authored: Fri Aug 24 20:43:07 2018 +0530 Committer: kumarvishal09 Committed: Wed Aug 29 13:27:08 2018 +0530 -- .../spark/sql/test/TestQueryExecutor.scala | 1 + .../execution/datasources/CarbonFileIndex.scala | 15 +- .../CarbonFileIndexReplaceRule.scala| 2 +- .../datasources/CarbonSparkDataSourceUtil.scala | 34 ++- .../datasources/SparkCarbonFileFormat.scala | 33 ++- .../src/test/resources/Array.csv| 21 ++ .../spark-datasource/src/test/resources/j2.csv | 1 + .../src/test/resources/structofarray.csv| 21 ++ .../datasource/SparkCarbonDataSourceTest.scala | 267 +-- ...tCreateTableUsingSparkCarbonFileFormat.scala | 9 +- .../sql/carbondata/datasource/TestUtil.scala| 16 +- .../InputProcessorStepWithNoConverterImpl.java | 21 +- 12 files changed, 355 insertions(+), 86 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/2f537b72/integration/spark-common/src/main/scala/org/apache/spark/sql/test/TestQueryExecutor.scala -- diff --git a/integration/spark-common/src/main/scala/org/apache/spark/sql/test/TestQueryExecutor.scala b/integration/spark-common/src/main/scala/org/apache/spark/sql/test/TestQueryExecutor.scala index d3a20c3..f69a142 100644 --- a/integration/spark-common/src/main/scala/org/apache/spark/sql/test/TestQueryExecutor.scala +++ b/integration/spark-common/src/main/scala/org/apache/spark/sql/test/TestQueryExecutor.scala @@ -153,6 +153,7 @@ object TestQueryExecutor { TestQueryExecutor.projectPath + "/core/target", TestQueryExecutor.projectPath + "/hadoop/target", TestQueryExecutor.projectPath + "/processing/target", +TestQueryExecutor.projectPath + "/integration/spark-datasource/target", TestQueryExecutor.projectPath + "/integration/spark-common/target", TestQueryExecutor.projectPath + "/integration/spark2/target", TestQueryExecutor.projectPath + "/integration/spark-common/target/jars", http://git-wip-us.apache.org/repos/asf/carbondata/blob/2f537b72/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala -- diff --git a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala index 8471181..c330fcb 100644 --- a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala +++ b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala @@ -21,14 +21,13 @@ import java.util import scala.collection.JavaConverters._ -import org.apache.hadoop.conf.Configuration import org.apache.hadoop.fs.Path import org.apache.hadoop.mapred.JobConf import org.apache.hadoop.mapreduce.Job import org.apache.spark.deploy.SparkHadoopUtil import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.expressions.Expression -import org.apache.spark.sql.execution.datasources.{InMemoryFileIndex, _} +import org.apache.spark.sql.execution.datasources._ import org.apache.spark.sql.types.StructType import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, HDFSCarbonFile} @@ -79,9 +78,9 @@ class CarbonFileIndex( } private def prune(dataFilters: Seq[Expression], - directories: Seq[PartitionDirectory]) = { + directories: Seq[PartitionDirectory]): Seq[PartitionDirectory] = { val tablePath = parameters.get("path") -if (tablePath.nonEmpty) { +if (tablePath.nonEmpty && dataFilters.nonEmpty) { val hadoopConf = sparkSession.sessionState.newHad
carbondata git commit: [CARBONDATA-2885] Broadcast Issue and Small file distribution Issue
Repository: carbondata Updated Branches: refs/heads/master f81543e95 -> 1fb1f19f2 [CARBONDATA-2885] Broadcast Issue and Small file distribution Issue Issue :- In External Table Carbon Relation sizeInByte is wrong (always 0) because of this Join Queries are identified for broadcast even Table actual size is > 10MB( default broadcast).This is making fail some of the join table ( table which should select sortmergeJoin but because of wrong calculation it gone for broadcast join) if Merge_small_file task distribution is enabled ,Join queries are failed (TPCH). carbon opens many carbon files but it not getting closed. Root Cause :- 1. Current relation size calculation is based on tablestatus file but since External Table does not have tablestatus file so always zero was returned. 2. if Merge_small_file task distribution is enabled carbon opens many carbon files but it not getting closed. Solution :- if Table is External Table then calculate size from TablePath . close the carbon files for scan is finished. This closes #2658 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/1fb1f19f Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/1fb1f19f Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/1fb1f19f Branch: refs/heads/master Commit: 1fb1f19f207eb157711ae0c7a79fd39b883e4621 Parents: f81543e Author: BJangir Authored: Fri Aug 24 14:47:49 2018 +0530 Committer: kumarvishal09 Committed: Mon Aug 27 12:57:22 2018 +0530 -- .../AbstractDetailQueryResultIterator.java | 5 ++ .../apache/spark/sql/hive/CarbonRelation.scala | 65 +++- 2 files changed, 42 insertions(+), 28 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/1fb1f19f/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java b/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java index 01aa939..26925d3 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java @@ -254,6 +254,11 @@ public abstract class AbstractDetailQueryResultIterator extends CarbonIterato private DataBlockIterator getDataBlockIterator() { if (blockExecutionInfos.size() > 0) { + try { +fileReader.finish(); + } catch (IOException e) { +throw new RuntimeException(e); + } BlockExecutionInfo executionInfo = blockExecutionInfos.get(0); blockExecutionInfos.remove(executionInfo); return new DataBlockIterator(executionInfo, fileReader, batchSize, queryStatisticsModel, http://git-wip-us.apache.org/repos/asf/carbondata/blob/1fb1f19f/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala index f700441..80257b8 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala @@ -156,39 +156,48 @@ case class CarbonRelation( private var sizeInBytesLocalValue = 0L def sizeInBytes: Long = { -val tableStatusNewLastUpdatedTime = SegmentStatusManager.getTableStatusLastModifiedTime( - carbonTable.getAbsoluteTableIdentifier) -if (tableStatusLastUpdateTime != tableStatusNewLastUpdatedTime) { - if (new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier) -.getValidAndInvalidSegments.getValidSegments.isEmpty) { -sizeInBytesLocalValue = 0L - } else { -val tablePath = carbonTable.getTablePath -val fileType = FileFactory.getFileType(tablePath) -if (FileFactory.isFileExist(tablePath, fileType)) { - // get the valid segments - val segments = new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier) -.getValidAndInvalidSegments.getValidSegments.asScala - var size = 0L - // for each segment calculate the size - segments.foreach {validSeg => -// for older store -if (null != validSeg.getLoadMetadataDetails.getDataSize && -null != validSeg.getLoadMetadataDetails.getIndexSize) { -
[3/4] carbondata git commit: [CARBONDATA-2872] Added Spark FileFormat interface implementation in Carbon
http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark-datasource/pom.xml -- diff --git a/integration/spark-datasource/pom.xml b/integration/spark-datasource/pom.xml new file mode 100644 index 000..38cf629 --- /dev/null +++ b/integration/spark-datasource/pom.xml @@ -0,0 +1,196 @@ + + +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> + + 4.0.0 + + +org.apache.carbondata +carbondata-parent +1.5.0-SNAPSHOT +../../pom.xml + + + carbondata-spark-datasource + Apache CarbonData :: Spark Datasource + + +${basedir}/../../dev +true + + + + + org.apache.carbondata + carbondata-hadoop + ${project.version} + + + org.apache.carbondata + carbondata-store-sdk + ${project.version} + + + org.apache.spark + spark-hive-thriftserver_${scala.binary.version} + + + org.apache.spark + spark-repl_${scala.binary.version} + + + junit + junit + test + + + org.scalatest + scalatest_${scala.binary.version} + test + + + org.apache.hadoop + hadoop-aws + ${hadoop.version} + + + com.fasterxml.jackson.core + jackson-core + + + com.fasterxml.jackson.core + jackson-annotations + + + com.fasterxml.jackson.core + jackson-databind + + + + + + +src/test/scala + + +src/resources + + +. + + CARBON_SPARK_INTERFACELogResource.properties + + + + + +org.scala-tools +maven-scala-plugin +2.15.2 + + +compile + + compile + +compile + + +testCompile + + testCompile + +test + + +process-resources + + compile + + + + + +maven-compiler-plugin + + 1.7 + 1.7 + + + +org.apache.maven.plugins +maven-surefire-plugin +2.18 + + + ${project.build.directory}/surefire-reports + -Xmx3g -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=512m + +true + ${carbon.hive.based.metastore} + + false + + + +org.scalatest +scalatest-maven-plugin +1.0 + + + ${project.build.directory}/surefire-reports + . + CarbonTestSuite.txt + ${argLine} -ea -Xmx3g -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=512m + + + + + +true + ${carbon.hive.based.metastore} + + + + +test + + test + + + + + + + + + build-all + +2.2.1 +2.11 +2.11.8 + + + + sdvtest + +true + + + + http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark-datasource/src/main/scala/org/apache/carbondata/converter/SparkDataTypeConverterImpl.java -- diff --git a/integration/spark-datasource/src/main/scala/org/apache/carbondata/converter/SparkDataTypeConverterImpl.java b/integration/spark-datasource/src/main/scala/org/apache/carbondata/converter/SparkDataTypeConverterImpl.java new file mode 100644 index 000..7e38691 --- /dev/null +++ b/integration/spark-datasource/src/main/scala/org/apache/carbondata/converter/SparkDataTypeConverterImpl.java @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.converter; +
[1/4] carbondata git commit: [CARBONDATA-2872] Added Spark FileFormat interface implementation in Carbon
Repository: carbondata Updated Branches: refs/heads/master 137245057 -> 347b8e1db http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala index 91197fd..d8e8251 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala @@ -34,6 +34,7 @@ import org.apache.spark.sql.optimizer.{CarbonDecoderRelation, CarbonFilters} import org.apache.spark.sql.sources.{BaseRelation, Filter} import org.apache.spark.sql.types._ import org.apache.spark.sql.CarbonExpressions.{MatchCast => Cast} +import org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException import org.apache.carbondata.core.constants.CarbonCommonConstants @@ -445,7 +446,7 @@ private[sql] class CarbonLateDecodeStrategy extends SparkStrategy { attrRef match { case Some(attr: AttributeReference) => Some(AttributeReference(attr.name, - CarbonScalaUtil.convertCarbonToSparkDataType(n.getDataType), + CarbonSparkDataSourceUtil.convertCarbonToSparkDataType(n.getDataType), attr.nullable, attr.metadata)(attr.exprId, attr.qualifier)) case _ => None http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala index 80d850b..f700441 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala @@ -20,14 +20,12 @@ import java.util.LinkedHashSet import scala.Array.canBuildFrom import scala.collection.JavaConverters._ -import scala.util.parsing.combinator.RegexParsers import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation import org.apache.spark.sql.catalyst.expressions.AttributeReference import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan} import org.apache.spark.sql.types._ -import org.apache.spark.sql.util.CarbonException -import org.apache.spark.util.{CarbonMetastoreTypes, SparkTypeConverter} +import org.apache.spark.sql.util.{CarbonMetastoreTypes, SparkTypeConverter} import org.apache.carbondata.core.datastore.impl.FileFactory import org.apache.carbondata.core.metadata.datatype.DataTypes http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala index c052cd7..1ee22b6 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala @@ -29,6 +29,7 @@ import org.apache.spark.sql.types._ import org.apache.spark.sql.CarbonContainsWith import org.apache.spark.sql.CarbonEndsWith import org.apache.spark.sql.CarbonExpressions.{MatchCast => Cast} +import org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.hive.CarbonSessionCatalog @@ -46,7 +47,6 @@ import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.core.util.ThreadLocalSessionInfo import org.apache.carbondata.datamap.{TextMatch, TextMatchLimit} import org.apache.carbondata.spark.CarbonAliasDecoderRelation -import org.apache.carbondata.spark.util.CarbonScalaUtil /** @@ -128,13 +128,15 @@ object CarbonFilters { Some(new SparkUnknownExpression(expr.transform { case AttributeReference(name, dataType, _, _) => CarbonBoundReference(new CarbonColumnExpression(name.toString, -CarbonScalaUtil.convertSparkToCarbonDataType(dataType)), dataType, expr.nullable) + CarbonSparkDataSourceUtil.convertSparkToCarbonDataType(dataType)), +dataType,
[4/4] carbondata git commit: [CARBONDATA-2872] Added Spark FileFormat interface implementation in Carbon
[CARBONDATA-2872] Added Spark FileFormat interface implementation in Carbon Added new package carbondata-spark-datasource under /integration/spark-datasource It contains the implementation of Spark's FileFormat and user can use carbon as format in spark For example create table test_table(c1 string, c2 int) using carbon or dataframe.write.format("carbon").saveAsTable("test_table") There are few classes moved to this datasource package as part of refactoring and spark2 and spark-common packages now depends on spark-datasource package. This closes #2647 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/347b8e1d Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/347b8e1d Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/347b8e1d Branch: refs/heads/master Commit: 347b8e1dbaef22fb1773e4771be8db8bad644a57 Parents: 1372450 Author: ravipesala Authored: Wed Aug 22 11:32:21 2018 +0530 Committer: kumarvishal09 Committed: Fri Aug 24 18:31:05 2018 +0530 -- .../core/datamap/DataMapStoreManager.java | 20 + .../carbondata/core/datamap/DataMapUtil.java| 3 + .../core/metadata/AbsoluteTableIdentifier.java | 4 + .../LatestFilesReadCommittedScope.java | 115 +++--- .../executor/impl/AbstractQueryExecutor.java| 2 +- .../apache/carbondata/core/util/CarbonUtil.java | 32 +- .../hadoop/api/CarbonFileInputFormat.java | 5 +- .../hadoop/api/CarbonInputFormat.java | 26 ++ .../hadoop/api/CarbonTableInputFormat.java | 5 +- ...FileInputFormatWithExternalCarbonTable.scala | 4 +- ...tCreateTableUsingSparkCarbonFileFormat.scala | 356 - .../DBLocationCarbonTableTestCase.scala | 18 +- .../iud/UpdateCarbonTableTestCase.scala | 16 +- integration/spark-common/pom.xml| 5 + .../spark/util/SparkDataTypeConverterImpl.java | 219 -- .../org/apache/carbondata/spark/util/Util.java | 73 .../carbondata/spark/rdd/CarbonMergerRDD.scala | 3 +- .../spark/rdd/CarbonScanPartitionRDD.scala | 2 +- .../carbondata/spark/rdd/CarbonScanRDD.scala| 3 +- .../carbondata/spark/rdd/StreamHandoffRDD.scala | 3 +- .../carbondata/spark/util/CarbonScalaUtil.scala | 58 +-- .../spark/util/CarbonMetastoreTypes.scala | 104 - .../apache/spark/util/SparkTypeConverter.scala | 137 --- integration/spark-datasource/pom.xml| 196 + .../converter/SparkDataTypeConverterImpl.java | 175 .../vectorreader/CarbonDictionaryWrapper.java | 44 ++ .../vectorreader/ColumnarVectorWrapper.java | 272 + .../VectorizedCarbonRecordReader.java | 333 .../execution/datasources/CarbonFileIndex.scala | 149 +++ .../CarbonFileIndexReplaceRule.scala| 85 .../datasources/CarbonSparkDataSourceUtil.scala | 251 .../datasources/SparkCarbonFileFormat.scala | 398 +++ .../readsupport/SparkUnsafeRowReadSuport.scala | 44 ++ .../spark/sql/util/CarbonMetastoreTypes.scala | 104 + .../spark/sql/util/SparkTypeConverter.scala | 138 +++ apache.spark.sql.sources.DataSourceRegister | 17 + .../datasource/SparkCarbonDataSourceTest.scala | 302 ++ ...tCreateTableUsingSparkCarbonFileFormat.scala | 326 +++ .../sql/carbondata/datasource/TestUtil.scala| 134 +++ .../vectorreader/CarbonDictionaryWrapper.java | 44 -- .../vectorreader/ColumnarVectorWrapper.java | 272 - .../VectorizedCarbonRecordReader.java | 317 --- .../datamap/IndexDataMapRebuildRDD.scala| 2 +- .../carbondata/stream/StreamJobManager.scala| 4 +- .../spark/sql/CarbonDictionaryDecoder.scala | 2 +- .../spark/sql/SparkUnknownExpression.scala | 5 +- .../management/CarbonLoadDataCommand.scala | 3 +- .../stream/CarbonCreateStreamCommand.scala | 4 +- .../datasources/SparkCarbonFileFormat.scala | 291 -- .../datasources/SparkCarbonTableFormat.scala| 2 +- .../strategy/CarbonLateDecodeStrategy.scala | 3 +- .../apache/spark/sql/hive/CarbonRelation.scala | 4 +- .../spark/sql/optimizer/CarbonFilters.scala | 27 +- .../sql/hive/CarbonInMemorySessionState.scala | 8 +- apache.spark.sql.sources.DataSourceRegister | 3 +- .../register/TestRegisterCarbonTable.scala | 22 +- pom.xml | 1 + .../sdk/file/CarbonWriterBuilder.java | 10 +- 58 files changed, 3272 insertions(+), 1933 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/core/src/main/java/org/apache/carbondata/core/datamap/Data
[2/4] carbondata git commit: [CARBONDATA-2872] Added Spark FileFormat interface implementation in Carbon
http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark-datasource/src/main/scala/org/apache/spark/sql/util/SparkTypeConverter.scala -- diff --git a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/util/SparkTypeConverter.scala b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/util/SparkTypeConverter.scala new file mode 100644 index 000..facb4f1 --- /dev/null +++ b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/util/SparkTypeConverter.scala @@ -0,0 +1,138 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.util + +import java.util.Objects + +import scala.collection.JavaConverters._ + +import org.apache.spark.sql.types +import org.apache.spark.sql.types._ + +import org.apache.carbondata.core.metadata.datatype.{DataTypes => CarbonDataTypes} +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.metadata.schema.table.column.{CarbonColumn, CarbonDimension, ColumnSchema} + +private[spark] object SparkTypeConverter { + + def createSparkSchema(table: CarbonTable, columns: Seq[String]): StructType = { +Objects.requireNonNull(table) +Objects.requireNonNull(columns) +if (columns.isEmpty) { + throw new IllegalArgumentException("column list is empty") +} +val fields = new java.util.ArrayList[StructField](columns.size) +val allColumns = table.getTableInfo.getFactTable.getListOfColumns.asScala + +// find the column and add it to fields array +columns.foreach { column => + val col = allColumns.find(_.getColumnName.equalsIgnoreCase(column)).getOrElse( +throw new IllegalArgumentException(column + " does not exist") + ) + fields.add(StructField(col.getColumnName, convertCarbonToSparkDataType(col, table))) +} +StructType(fields) + } + + /** + * Converts from carbon datatype to corresponding spark datatype. + */ + def convertCarbonToSparkDataType( + columnSchema: ColumnSchema, + table: CarbonTable): types.DataType = { +if (CarbonDataTypes.isDecimal(columnSchema.getDataType)) { + val scale = columnSchema.getScale + val precision = columnSchema.getPrecision + if (scale == 0 && precision == 0) { +DecimalType(18, 2) + } else { +DecimalType(precision, scale) + } +} else if (CarbonDataTypes.isArrayType(columnSchema.getDataType)) { + CarbonMetastoreTypes +.toDataType(s"array<${ getArrayChildren(table, columnSchema.getColumnName) }>") +} else if (CarbonDataTypes.isStructType(columnSchema.getDataType)) { + CarbonMetastoreTypes +.toDataType(s"struct<${ getStructChildren(table, columnSchema.getColumnName) }>") +} else { + columnSchema.getDataType match { +case CarbonDataTypes.STRING => StringType +case CarbonDataTypes.SHORT => ShortType +case CarbonDataTypes.INT => IntegerType +case CarbonDataTypes.LONG => LongType +case CarbonDataTypes.DOUBLE => DoubleType +case CarbonDataTypes.BOOLEAN => BooleanType +case CarbonDataTypes.TIMESTAMP => TimestampType +case CarbonDataTypes.DATE => DateType + } +} + } + + def getArrayChildren(table: CarbonTable, dimName: String): String = { +table.getChildren(dimName).asScala.map(childDim => { + childDim.getDataType.getName.toLowerCase match { +case "array" => s"array<${ getArrayChildren(table, childDim.getColName) }>" +case "struct" => s"struct<${ getStructChildren(table, childDim.getColName) }>" +case dType => addDecimalScaleAndPrecision(childDim, dType) + } +}).mkString(",") + } + + def getStructChildren(table: CarbonTable, dimName: String): String = { +table.getChildren(dimName).asScala.map(childDim => { + childDim.getDataType.getName.toLowerCase match { +case "array" => s"${ + childDim.getColName.substring(dimName.length + 1) +}:array<${ getArrayChildren(table, childDim.getColName) }>" +case "struct" => s"${ + childDim.getColName.substring(dimName.length + 1) +
carbondata git commit: [HOTFIX]Fixed int overflow and comparison gone wrong during blocklet min/max
Repository: carbondata Updated Branches: refs/heads/master 7158d5203 -> 8affab843 [HOTFIX]Fixed int overflow and comparison gone wrong during blocklet min/max Problem: During calculating min/max for blocklet, it needs to calculate from all the pages. During that comparison, it is typecasting to int and overflows, so there is a chance the negative becomes positive and positive become negative. That's why min max of long comes wrong for bigger values. Solution: Don't typecast directly, instead check first the negative or positive and then return. Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/8affab84 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/8affab84 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/8affab84 Branch: refs/heads/master Commit: 8affab8433bb4dea70fbb4ea9d3abc7eaf9fd7b2 Parents: 7158d52 Author: ravipesala Authored: Tue Aug 7 21:19:36 2018 +0530 Committer: kumarvishal09 Committed: Thu Aug 9 15:01:12 2018 +0530 -- .../core/util/CarbonMetadataUtil.java | 16 +++- .../core/util/CarbonMetadataUtilTest.java | 39 +--- 2 files changed, 39 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/8affab84/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java b/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java index 8fc648b..70443d8 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java @@ -368,7 +368,13 @@ public class CarbonMetadataUtil { secondBuffer.put(second); firstBuffer.flip(); secondBuffer.flip(); - return (int) (firstBuffer.getDouble() - secondBuffer.getDouble()); + double compare = firstBuffer.getDouble() - secondBuffer.getDouble(); + if (compare > 0) { +compare = 1; + } else if (compare < 0) { +compare = -1; + } + return (int) compare; } else if (dataType == DataTypes.LONG || dataType == DataTypes.INT || dataType == DataTypes.SHORT) { firstBuffer = ByteBuffer.allocate(8); @@ -377,7 +383,13 @@ public class CarbonMetadataUtil { secondBuffer.put(second); firstBuffer.flip(); secondBuffer.flip(); - return (int) (firstBuffer.getLong() - secondBuffer.getLong()); + long compare = firstBuffer.getLong() - secondBuffer.getLong(); + if (compare > 0) { +compare = 1; + } else if (compare < 0) { +compare = -1; + } + return (int) compare; } else if (DataTypes.isDecimal(dataType)) { return DataTypeUtil.byteToBigDecimal(first).compareTo(DataTypeUtil.byteToBigDecimal(second)); } else { http://git-wip-us.apache.org/repos/asf/carbondata/blob/8affab84/core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java -- diff --git a/core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java b/core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java index 2909dc4..14cd57a 100644 --- a/core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java +++ b/core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java @@ -17,40 +17,28 @@ package org.apache.carbondata.core.util; +import java.lang.reflect.Method; import java.nio.ByteBuffer; import java.util.ArrayList; import java.util.List; -import org.apache.carbondata.core.datastore.block.SegmentProperties; -import org.apache.carbondata.core.datastore.page.EncodedTablePage; -import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage; -import org.apache.carbondata.core.datastore.page.key.TablePageKey; -import org.apache.carbondata.core.datastore.page.statistics.PrimitivePageStatsCollector; import org.apache.carbondata.core.metadata.ValueEncoderMeta; +import org.apache.carbondata.core.metadata.datatype.DataTypes; import org.apache.carbondata.core.metadata.index.BlockIndexInfo; import org.apache.carbondata.format.BlockIndex; -import org.apache.carbondata.format.BlockletIndex; import org.apache.carbondata.format.BlockletInfo; -import org.apache.carbondata.format.BlockletInfo3; -import org.apache.carbondata.format.BlockletMinMaxIndex; import org.apache.carbondata.format.ColumnSchema; import org.apache.carbondata.format.DataChunk; -import org.apache.carbondata.format.DataChunk2; import org.apache.carbondata.format.DataType; import org.apache.carbondata.
carbondata git commit: [CARBONDATA-2817]Thread Leak in Update and in No sort flow
Repository: carbondata Updated Branches: refs/heads/master 8f7b594a3 -> 7158d5203 [CARBONDATA-2817]Thread Leak in Update and in No sort flow Issue :- After Update Command is finished , Loading threads are not getting stopped. Root Cause :- In Update flow DataLoadExecutor 's close method is not called so all Executors services are not closed. In Exceptions are not handled property in AFDW class's closeExecutorService() which is cuasing Thread leak if Job is killed from SparkUI.. Solution :- Add Task Completion Listener and call close method of DataLoadExecutor to it . Handle Exception in closeExecutor Service so that all Writer steps Threads can be closed. This closes #2606 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/7158d520 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/7158d520 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/7158d520 Branch: refs/heads/master Commit: 7158d5203d84feaef23a5bb17a90b67c79ba52d0 Parents: 8f7b594 Author: BJangir Authored: Thu Aug 2 21:51:07 2018 +0530 Committer: kumarvishal09 Committed: Wed Aug 8 17:42:04 2018 +0530 -- .../core/util/BlockletDataMapUtil.java | 4 +- .../carbondata/spark/rdd/UpdateDataLoad.scala | 9 +++- .../CarbonRowDataWriterProcessorStepImpl.java | 52 +--- .../steps/DataWriterBatchProcessorStepImpl.java | 25 -- .../store/writer/AbstractFactDataWriter.java| 16 -- .../writer/v3/CarbonFactDataWriterImplV3.java | 19 +-- 6 files changed, 103 insertions(+), 22 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/7158d520/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java b/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java index 68ce1fb..404b426 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java @@ -115,7 +115,7 @@ public class BlockletDataMapUtil { CarbonTable.updateTableByTableInfo(carbonTable, carbonTable.getTableInfo()); } String blockPath = footer.getBlockInfo().getTableBlockInfo().getFilePath(); - if (null != fileNameToMetaInfoMapping && null == blockMetaInfoMap.get(blockPath)) { + if (null == blockMetaInfoMap.get(blockPath)) { BlockMetaInfo blockMetaInfo = createBlockMetaInfo(fileNameToMetaInfoMapping, blockPath); // if blockMetaInfo is null that means the file has been deleted from the file system. // This can happen in case IUD scenarios where after deleting or updating the data the @@ -123,8 +123,6 @@ public class BlockletDataMapUtil { if (null != blockMetaInfo) { blockMetaInfoMap.put(blockPath, blockMetaInfo); } - } else { -blockMetaInfoMap.put(blockPath, new BlockMetaInfo(new String[] {},0)); } } return blockMetaInfoMap; http://git-wip-us.apache.org/repos/asf/carbondata/blob/7158d520/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala -- diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala index 2e7c307..f4fdbc1 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala @@ -25,8 +25,10 @@ import org.apache.spark.sql.Row import org.apache.carbondata.common.CarbonIterator import org.apache.carbondata.common.logging.LogServiceFactory import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatus} +import org.apache.carbondata.core.util.ThreadLocalTaskInfo import org.apache.carbondata.processing.loading.{DataLoadExecutor, TableProcessingOperations} import org.apache.carbondata.processing.loading.model.CarbonLoadModel +import org.apache.carbondata.spark.util.CommonUtil /** * Data load in case of update command . @@ -54,7 +56,12 @@ object UpdateDataLoad { loader.initialize() loadMetadataDetails.setSegmentStatus(SegmentStatus.SUCCESS) - new DataLoadExecutor().execute(carbonLoadModel, + val executor = new DataLoadExecutor + TaskContext.get().addTaskCompletionListener { context => +executor.close() + CommonUtil.clearUnsafeMemory(ThreadLocalTaskInfo.getCarbonTask
carbondata git commit: [CARBONDATA-2775] Adaptive encoding fails for Unsafe OnHeap. if, target datatype is SHORT_INT
Repository: carbondata Updated Branches: refs/heads/master 8d3e8b82c -> 4d95dfcff [CARBONDATA-2775] Adaptive encoding fails for Unsafe OnHeap. if, target datatype is SHORT_INT problem: [CARBONDATA-2775] Adaptive encoding fails for Unsafe OnHeap if, target data type is SHORT_INT solution: If ENABLE_OFFHEAP_SORT = false, in carbon property. UnsafeFixLengthColumnPage.java will use different compress logic. Not the raw compression. In that case, for SHORT_INT data type , conversion need to handle. This closes #2546 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/4d95dfcf Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/4d95dfcf Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/4d95dfcf Branch: refs/heads/master Commit: 4d95dfcff2895ce0aed8ba6f75ce9946ae5172af Parents: 8d3e8b8 Author: ajantha-bhat Authored: Tue Jul 24 12:33:47 2018 +0530 Committer: kumarvishal09 Committed: Sun Jul 29 11:52:30 2018 +0530 -- .../page/UnsafeFixLengthColumnPage.java | 2 + ...UnsafeHeapColumnPageForComplexDataType.scala | 61 2 files changed, 63 insertions(+) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/4d95dfcf/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java index bcb74c0..f75deb6 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java @@ -495,6 +495,8 @@ public class UnsafeFixLengthColumnPage extends ColumnPage { return totalLength / ByteUtil.SIZEOF_BYTE; } else if (dataType == DataTypes.SHORT) { return totalLength / ByteUtil.SIZEOF_SHORT; +} else if (dataType == DataTypes.SHORT_INT) { + return totalLength / ByteUtil.SIZEOF_SHORT_INT; } else if (dataType == DataTypes.INT) { return totalLength / ByteUtil.SIZEOF_INT; } else if (dataType == DataTypes.LONG) { http://git-wip-us.apache.org/repos/asf/carbondata/blob/4d95dfcf/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestAdaptiveEncodingUnsafeHeapColumnPageForComplexDataType.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestAdaptiveEncodingUnsafeHeapColumnPageForComplexDataType.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestAdaptiveEncodingUnsafeHeapColumnPageForComplexDataType.scala new file mode 100644 index 000..acf75c1 --- /dev/null +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestAdaptiveEncodingUnsafeHeapColumnPageForComplexDataType.scala @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.integration.spark.testsuite.complexType + +import java.io.File + +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties + +/** + * Test class of Adaptive Encoding UnSafe Column Page with Complex Data type + * + */ + +class TestAdaptiveEncodingUnsafeHeapColumnPageForComplexDataType + extends QueryTest with BeforeAndAfterAll with TestAdaptiveComplexType { + + override def beforeAll(): Unit = { + +new File(CarbonProperties.getInstance().getSystemFolderLocation).delete() +sql("DROP TABLE IF EXISTS adaptive") +CarbonProperties.
carbondata git commit: [CARBONDATA-2753][Compatibility] Row count of page is calculated wrong for old store(V2 store)
Repository: carbondata Updated Branches: refs/heads/master c79fc90d5 -> 8d3e8b82c [CARBONDATA-2753][Compatibility] Row count of page is calculated wrong for old store(V2 store) Row count of page is calculated wrong for V2 store. Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/8d3e8b82 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/8d3e8b82 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/8d3e8b82 Branch: refs/heads/master Commit: 8d3e8b82cbb0d75c66219119c281ed910ac185e6 Parents: c79fc90 Author: dhatchayani Authored: Wed Jul 25 14:41:58 2018 +0530 Committer: kumarvishal09 Committed: Sun Jul 29 11:47:25 2018 +0530 -- .../blockletindex/BlockletDataRefNode.java| 18 +- .../scan/scanner/impl/BlockletFullScanner.java| 9 + 2 files changed, 14 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/8d3e8b82/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java index a11ae8d..5681528 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java @@ -61,18 +61,26 @@ public class BlockletDataRefNode implements DataRefNode { int numberOfPagesCompletelyFilled = detailInfo.getRowCount(); // no. of rows to a page is 12 in V2 and 32000 in V3, same is handled to get the number // of pages filled - if (blockInfo.getVersion() == ColumnarFormatVersion.V2) { + int lastPageRowCount; + int fullyFilledRowsCount; + if (blockInfo.getVersion() == ColumnarFormatVersion.V2 + || blockInfo.getVersion() == ColumnarFormatVersion.V1) { numberOfPagesCompletelyFilled /= CarbonVersionConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT_V2; +lastPageRowCount = detailInfo.getRowCount() +% CarbonVersionConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT_V2; +fullyFilledRowsCount = + CarbonVersionConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT_V2; } else { numberOfPagesCompletelyFilled /= CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT; +lastPageRowCount = detailInfo.getRowCount() +% CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT; +fullyFilledRowsCount = + CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT; } - int lastPageRowCount = detailInfo.getRowCount() - % CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT; for (int i = 0; i < numberOfPagesCompletelyFilled; i++) { -pageRowCount[i] = - CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT; +pageRowCount[i] = fullyFilledRowsCount; } if (lastPageRowCount > 0) { pageRowCount[pageRowCount.length - 1] = lastPageRowCount; http://git-wip-us.apache.org/repos/asf/carbondata/blob/8d3e8b82/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java b/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java index c3d4df8..f61a8b1 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java @@ -19,7 +19,6 @@ package org.apache.carbondata.core.scan.scanner.impl; import java.io.IOException; import org.apache.carbondata.core.constants.CarbonCommonConstants; -import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants; import org.apache.carbondata.core.datastore.DataRefNode; import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage; import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; @@ -123,13 +122,7 @@ public class BlockletFullScanner implements BlockletScanner { if (numberOfRows == null) { numberOfRows = new int[rawBlockletColumnChunks.getDataBlock().numberOfPages()]; for (int i = 0; i < numberOfRows.length; i++) { -nu
carbondata git commit: [CARBONDATA-2772] Size based dictionary fallback is failing even threshold is not reached.
Repository: carbondata Updated Branches: refs/heads/master f8fa29e64 -> 005db3fa3 [CARBONDATA-2772] Size based dictionary fallback is failing even threshold is not reached. Issue:- Size Based Fallback happened even threshold is not reached. RootCause:- Current size calculation is wrong. it is calculated for each data. instead of generated dictionary data . Solution :- Current size should be calculated only for generated dictionary data. This closes #2542 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/005db3fa Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/005db3fa Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/005db3fa Branch: refs/heads/master Commit: 005db3fa359808d7988b94307a25a49010a42ca6 Parents: f8fa29e Author: BJangir Authored: Mon Jul 23 22:14:12 2018 +0530 Committer: kumarvishal09 Committed: Thu Jul 26 14:09:51 2018 +0530 -- .../MapBasedDictionaryStore.java| 20 ++-- .../ColumnLocalDictionaryGenerator.java | 8 2 files changed, 14 insertions(+), 14 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/005db3fa/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java b/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java index 05ca002..7b8617a 100644 --- a/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java +++ b/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java @@ -55,6 +55,11 @@ public class MapBasedDictionaryStore implements DictionaryStore { */ private boolean isThresholdReached; + /** + * current datasize + */ + private long currentSize; + public MapBasedDictionaryStore(int dictionaryThreshold) { this.dictionaryThreshold = dictionaryThreshold; this.dictionary = new ConcurrentHashMap<>(); @@ -86,11 +91,9 @@ public class MapBasedDictionaryStore implements DictionaryStore { if (null == value) { // increment the value value = ++lastAssignValue; + currentSize += data.length; // if new value is greater than threshold - if (value > dictionaryThreshold) { -// clear the dictionary -dictionary.clear(); -referenceDictionaryArray = null; + if (value > dictionaryThreshold || currentSize >= Integer.MAX_VALUE) { // set the threshold boolean to true isThresholdReached = true; // throw exception @@ -108,8 +111,13 @@ public class MapBasedDictionaryStore implements DictionaryStore { private void checkIfThresholdReached() throws DictionaryThresholdReachedException { if (isThresholdReached) { - throw new DictionaryThresholdReachedException( - "Unable to generate dictionary value. Dictionary threshold reached"); + if (currentSize >= Integer.MAX_VALUE) { +throw new DictionaryThresholdReachedException( +"Unable to generate dictionary. Dictionary Size crossed 2GB limit"); + } else { +throw new DictionaryThresholdReachedException( +"Unable to generate dictionary value. Dictionary threshold reached"); + } } } http://git-wip-us.apache.org/repos/asf/carbondata/blob/005db3fa/core/src/main/java/org/apache/carbondata/core/localdictionary/generator/ColumnLocalDictionaryGenerator.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/localdictionary/generator/ColumnLocalDictionaryGenerator.java b/core/src/main/java/org/apache/carbondata/core/localdictionary/generator/ColumnLocalDictionaryGenerator.java index b0c7275..c55a289 100644 --- a/core/src/main/java/org/apache/carbondata/core/localdictionary/generator/ColumnLocalDictionaryGenerator.java +++ b/core/src/main/java/org/apache/carbondata/core/localdictionary/generator/ColumnLocalDictionaryGenerator.java @@ -33,8 +33,6 @@ public class ColumnLocalDictionaryGenerator implements LocalDictionaryGenerator */ private DictionaryStore dictionaryHolder; - private long currentSize; - public ColumnLocalDictionaryGenerator(int threshold, int lvLength) { // adding 1 to threshold for null value int newThreshold = threshold + 1; @@ -54,7 +52,6 @@ public class ColumnLocalDictionaryGenerator implements LocalDictionaryGenerator