[carbondata] branch master updated: [CARBONDATA-3661] Fix target file size check fail when upload local file to carbon store
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new e2ddc41 [CARBONDATA-3661] Fix target file size check fail when upload local file to carbon store e2ddc41 is described below commit e2ddc415e6530d5dae85ecea43e7bb96504df36b Author: liuzhi <371684...@qq.com> AuthorDate: Fri Jan 10 12:54:24 2020 +0800 [CARBONDATA-3661] Fix target file size check fail when upload local file to carbon store Why is this PR needed? Multi flink tasks write carbon data may use the same carbon data file name, it will cause target file size check fail when upload local file to carbon store. What changes were proposed in this PR? Make different flink task use different carbon data file name. use UUID as write task ID. Does this PR introduce any user interface change? No Is any new testcase added? No This closes #3573 --- .../java/org/apache/carbon/flink/CarbonLocalWriter.java | 1 + .../main/java/org/apache/carbon/flink/CarbonS3Writer.java | 1 + .../org/apache/carbondata/sdk/file/CarbonWriterBuilder.java | 13 + 3 files changed, 15 insertions(+) diff --git a/integration/flink/src/main/java/org/apache/carbon/flink/CarbonLocalWriter.java b/integration/flink/src/main/java/org/apache/carbon/flink/CarbonLocalWriter.java index db88cd4..a8068a3 100644 --- a/integration/flink/src/main/java/org/apache/carbon/flink/CarbonLocalWriter.java +++ b/integration/flink/src/main/java/org/apache/carbon/flink/CarbonLocalWriter.java @@ -62,6 +62,7 @@ final class CarbonLocalWriter extends CarbonWriter { try { final CarbonWriterBuilder writerBuilder = org.apache.carbondata.sdk.file.CarbonWriter.builder() + .taskNo(UUID.randomUUID().toString().replace("-", "")) .outputPath(super.getWritePath(row)) .writtenBy("flink") .withSchemaFile(CarbonTablePath.getSchemaFilePath(table.getTablePath())) diff --git a/integration/flink/src/main/java/org/apache/carbon/flink/CarbonS3Writer.java b/integration/flink/src/main/java/org/apache/carbon/flink/CarbonS3Writer.java index ecae32a..d23c668 100644 --- a/integration/flink/src/main/java/org/apache/carbon/flink/CarbonS3Writer.java +++ b/integration/flink/src/main/java/org/apache/carbon/flink/CarbonS3Writer.java @@ -65,6 +65,7 @@ final class CarbonS3Writer extends CarbonWriter { try { final CarbonWriterBuilder writerBuilder = org.apache.carbondata.sdk.file.CarbonWriter.builder() + .taskNo(UUID.randomUUID().toString().replace("-", "")) .outputPath(super.getWritePath(row)) .writtenBy("flink") .withSchemaFile(CarbonTablePath.getSchemaFilePath(table.getTablePath())) diff --git a/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java b/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java index eb47a8d..cbf899f 100644 --- a/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java +++ b/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java @@ -152,6 +152,19 @@ public class CarbonWriterBuilder { } /** + * sets the taskNo for the writer. SDKs concurrently running + * will set taskNo in order to avoid conflicts in file's name during write. + * + * @param taskNo is the TaskNo user wants to specify. + * by default it is system time in nano seconds. + * @return updated CarbonWriterBuilder + */ + public CarbonWriterBuilder taskNo(String taskNo) { +this.taskNo = taskNo; +return this; + } + + /** * to set the timestamp in the carbondata and carbonindex index files * * @param timestamp is a timestamp to be used in the carbondata and carbonindex index files.
[carbondata] branch master updated: [CARBONDATA-3650] Remove file format V1 and V2 reader
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 5bd345b [CARBONDATA-3650] Remove file format V1 and V2 reader 5bd345b is described below commit 5bd345ba0aa1fe822831e7dabfcbfa88ec635614 Author: Jacky Li AuthorDate: Sun Dec 29 21:55:05 2019 +0800 [CARBONDATA-3650] Remove file format V1 and V2 reader V1 and V2 file format is deprecated in CarbonData 2.0 This closes #3543 --- .../core/constants/CarbonVersionConstants.java | 5 - .../impl/VariableLengthDimensionColumnPage.java| 15 -- .../chunk/reader/CarbonDataReaderFactory.java | 30 +-- .../reader/dimension/AbstractChunkReader.java | 96 -- ...rmat.java => AbstractDimensionChunkReader.java} | 48 - .../CompressedDimensionChunkFileBasedReaderV1.java | 181 -- .../CompressedDimensionChunkFileBasedReaderV2.java | 203 - ...aderV3.java => DimensionChunkPageReaderV3.java} | 6 +- ...edReaderV3.java => DimensionChunkReaderV3.java} | 6 +- .../reader/measure/AbstractMeasureChunkReader.java | 86 +++-- .../AbstractMeasureChunkReaderV2V3Format.java | 111 --- .../CompressedMeasureChunkFileBasedReaderV1.java | 112 .../CompressedMeasureChunkFileBasedReaderV2.java | 152 --- ...ReaderV3.java => MeasureChunkPageReaderV3.java} | 6 +- ...asedReaderV3.java => MeasureChunkReaderV3.java} | 6 +- .../datastore/page/encoding/EncodingFactory.java | 8 - .../statistics/PrimitivePageStatsCollector.java| 6 +- .../blockletindex/BlockletDataRefNode.java | 27 +-- .../core/keygenerator/mdkey/NumberCompressor.java | 181 -- .../core/metadata/ColumnarFormatVersion.java | 4 +- .../core/metadata/blocklet/BlockletInfo.java | 69 --- .../core/metadata/datatype/DataType.java | 2 - .../core/metadata/datatype/DataTypes.java | 5 - .../core/metadata/datatype/LegacyLongType.java | 33 .../apache/carbondata/core/util/CarbonUtil.java| 15 -- .../core/util/DataFileFooterConverterFactory.java | 7 +- .../apache/carbondata/core/util/DataTypeUtil.java | 4 - .../carbondata/core/util/path/CarbonTablePath.java | 8 +- .../mdkey/NumberCompressorUnitTest.java| 116 .../carbondata/core/util/CarbonTestUtil.java | 3 - .../carbondata/core/util/CarbonUtilTest.java | 10 +- .../CarbonV1toV3CompatabilityTestCase.scala| 98 -- .../LoadTableWithLocalDictionaryTestCase.scala | 4 +- .../TestNonTransactionalCarbonTable.scala | 4 +- .../LocalDictionarySupportLoadTableTest.scala | 4 +- .../spark/rdd/CarbonDataRDDFactory.scala | 2 +- .../processing/store/CarbonDataWriterFactory.java | 5 +- 37 files changed, 164 insertions(+), 1514 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonVersionConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonVersionConstants.java index 2382bd8..50c8ffd 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonVersionConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonVersionConstants.java @@ -50,11 +50,6 @@ public final class CarbonVersionConstants { */ public static final String CARBONDATA_BUILD_DATE; - /** - * number of rows per blocklet column page default value for V2 version - */ - public static final int NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT_V2 = 12; - static { // create input stream for CARBONDATA_VERSION_INFO_FILE InputStream resourceStream = Thread.currentThread().getContextClassLoader() diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java index 2e941b2..2a71934 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java @@ -36,21 +36,6 @@ public class VariableLengthDimensionColumnPage extends AbstractDimensionColumnPa * @param invertedIndexReverse reverse inverted index * @param numberOfRows number of rows * @param dictionary carbon local dictionary for string column. - */ - public VariableLengthDimensionColumnPage(byte[] dataChunks, int[] invertedIndex, - int[] invertedIndexReverse, int numberOfRows, DimensionStoreType dimStoreType, - CarbonDictionary dictionary, int dataLength) { -this(dataChunks, invertedIndex, invertedIndexReverse, numberOfRows
[carbondata] 28/33: [CARBONDATA-3520] CTAS should fail if select query contains duplicate columns
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 21bbc4a5306ff850fcf14b488e4cb0452415213c Author: Indhumathi27 AuthorDate: Mon Sep 16 16:25:03 2019 +0530 [CARBONDATA-3520] CTAS should fail if select query contains duplicate columns Problem: If Select query contains Duplicate columns, CTAS was creating a table with only one column, which is wrong Solution: Throw error message if Select query contains duplicate columns. This closes #3388 --- .../createTable/TestCreateTableAsSelect.scala | 37 ++ .../sql/parser/CarbonSparkSqlParserUtil.scala | 23 +++--- 2 files changed, 56 insertions(+), 4 deletions(-) diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableAsSelect.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableAsSelect.scala index 3896061..8e4d8fa 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableAsSelect.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableAsSelect.scala @@ -407,6 +407,43 @@ class TestCreateTableAsSelect extends QueryTest with BeforeAndAfterAll { checkAnswer(sql("SELECT * FROM target_table"), Seq(Row("shenzhen", 24.5))) } + test("test duplicate columns with select query") { +sql("DROP TABLE IF EXISTS target_table") +sql("DROP TABLE IF EXISTS source_table") +// create carbon table and insert data +sql( + """ +| CREATE TABLE source_table( +| id INT, +| name STRING, +| city STRING, +| age INT) +| STORED BY 'carbondata' +| """.stripMargin) +sql("INSERT INTO source_table SELECT 1,'bob','shenzhen',27") +val e = intercept[AnalysisException] { + sql( +""" + | CREATE TABLE target_table + | STORED BY 'carbondata' + | AS + | SELECT t1.city, t2.city + | FROM source_table t1, source_table t2 where t1.city=t2.city and t1.city = 'shenzhen' + """.stripMargin) +} +e.getMessage().toString.contains("Duplicated column names found in table definition of " + + "`target_table`: [\"city\"]") +sql( + """ +| CREATE TABLE target_table +| STORED BY 'carbondata' +| AS +| SELECT t1.city as a, t2.city as b +| FROM source_table t1, source_table t2 where t1.city=t2.city and t1.city = 'shenzhen' + """.stripMargin) +checkAnswer(sql("select * from target_table"), Seq(Row("shenzhen", "shenzhen"))) + } + override def afterAll { sql("DROP TABLE IF EXISTS carbon_ctas_test") sql("DROP TABLE IF EXISTS parquet_ctas_test") diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala index 5c008f2..4d85e88 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala @@ -119,6 +119,8 @@ object CarbonSparkSqlParserUtil { case _ => // ignore this case } +val columnNames = fields.map(_.name.get) +checkIfDuplicateColumnExists(columns, tableIdentifier, columnNames) if (partitionFields.nonEmpty && options.isStreaming) { operationNotAllowed("Streaming is not allowed on partitioned table", partitionColumns) } @@ -355,16 +357,29 @@ object CarbonSparkSqlParserUtil { // Ensuring whether no duplicate name is used in table definition val colNames: Seq[String] = cols.map(_.name) +checkIfDuplicateColumnExists(columns, tableIdentifier, colNames) +colNames + } + + private def checkIfDuplicateColumnExists(columns: ColTypeListContext, + tableIdentifier: TableIdentifier, + colNames: Seq[String]): Unit = { if (colNames.length != colNames.distinct.length) { val duplicateColumns = colNames.groupBy(identity).collect { case (x, ys) if ys.length > 1 => "\"" + x + "\"" } - operationNotAllowed(s"Duplicated column names found in table definition of " + - s"$tableIdentifier: ${ duplicateColumns.mkString("[", ",", "]
[carbondata] 18/33: [CARBONDATA-3506]Fix alter table failures on parition table with hive.metastore.disallow.incompatible.col.type.changes as true
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit ef26a4a0a556d574cc30c74c674234a4564f34c1 Author: akashrn5 AuthorDate: Wed Aug 28 12:05:13 2019 +0530 [CARBONDATA-3506]Fix alter table failures on parition table with hive.metastore.disallow.incompatible.col.type.changes as true Problem: In case of spark2.2 and above and , when we call alterExternalCatalogForTableWithUpdatedSchema to update the new schema to external catalog in case of add column, spark gets the catalog table and then it itself adds the partition columns if the table is partition table for all the new data schema sent by carbon, so there will be duplicate partition columns, so validation fails in hive When the table has only two columns and one of them is partition column, then dropping non partition column is invalid because, if we allow it is like table with all columns as partition columns. So with the above property as true, drop column will fail to update the hive metastore. in spark2.2 and above if the datatype change is done on partition column, with the above property as true, it also fails, as we are not sending partition column for schema alter in hive Solution: when sending the new schema to spark to update in catalog, do not send the partition columns in case of spark2.2 and above, as spark will take care of adding parition columns to new schema sent by us. In the above scenario of drop, do not allow drop column, if after dropping the specific column, if table has only partition columns. Block the operation on datatype change on partition column on spark2.2 and above. This closes #3367 --- .../StandardPartitionTableQueryTestCase.scala | 29 + .../schema/CarbonAlterTableAddColumnCommand.scala | 20 +--- ...nAlterTableColRenameDataTypeChangeCommand.scala | 36 +++--- .../schema/CarbonAlterTableDropColumnCommand.scala | 35 + 4 files changed, 99 insertions(+), 21 deletions(-) diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala index c19c0b9..fb4b511 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala @@ -21,8 +21,10 @@ import org.apache.spark.sql.execution.strategy.CarbonDataSourceScan import org.apache.spark.sql.test.Spark2TestQueryExecutor import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.{DataFrame, Row} +import org.apache.spark.util.SparkUtil import org.scalatest.BeforeAndAfterAll +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.datastore.impl.FileFactory import org.apache.carbondata.core.util.CarbonProperties @@ -439,18 +441,32 @@ test("Creation of partition table should fail if the colname in table schema and test("validate data in partition table after dropping and adding a column") { sql("drop table if exists par") -sql("create table par(name string) partitioned by (age double) stored by " + +sql("create table par(name string, add string) partitioned by (age double) stored by " + "'carbondata' TBLPROPERTIES('cache_level'='blocklet')") -sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + -s"('header'='false')") +sql("insert into par select 'joey','NY',32 union all select 'chandler','NY',32") sql("alter table par drop columns(name)") sql("alter table par add columns(name string)") -sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into table par options" + -s"('header'='false')") -checkAnswer(sql("select name from par"), Seq(Row("a"),Row("b"), Row(null), Row(null))) +sql("insert into par select 'joey','NY',32 union all select 'joey','NY',32") +checkAnswer(sql("select name from par"), Seq(Row("NY"),Row("NY"), Row(null), Row(null))) sql("drop table if exists par") } + test("test drop column when after dropping only partition column remains and datatype change on partition column") { +
[carbondata] 31/33: [CARBONDATA-3527] Fix 'String length cannot exceed 32000 characters' issue when load data with 'GLOBAL_SORT' from csv files which include big complex type data
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 93425458a93871fd18b1c3c41da396dbb06c02c8 Author: Zhang Zhichao <441586...@qq.com> AuthorDate: Wed Sep 25 15:58:35 2019 +0800 [CARBONDATA-3527] Fix 'String length cannot exceed 32000 characters' issue when load data with 'GLOBAL_SORT' from csv files which include big complex type data Problem: When complex type data is used more than 32000 characters to indicate in csv file, and load data with 'GLOBAL_SORT' from these csv files, it will throw 'String length cannot exceed 32000 characters' exception. Cause: Use 'GLOBAL_SORT' to load data from csv files, it reads files and firstly store data in StringArrayRow, the type of all data are string, when call 'CarbonScalaUtil.getString' in 'NewRddIterator.next', it will check the length of all data and throw 'String length cannot exceed 32000 characters' exception even if it's complex type data which store as more than 32000 characters in csv files. Solution: In 'FieldConverter.objectToString' (called in 'CarbonScalaUtil.getString'), if the data type of field is complex type, don't check the length. This closes #3399 --- .../src/test/resources/complexdata3.csv| 10 + .../complexType/TestComplexDataType.scala | 52 ++ .../spark/rdd/NewCarbonDataLoadRDD.scala | 6 ++- .../carbondata/spark/util/CarbonScalaUtil.scala| 4 +- .../streaming/parser/FieldConverter.scala | 14 +++--- 5 files changed, 79 insertions(+), 7 deletions(-) diff --git a/integration/spark-common-test/src/test/resources/complexdata3.csv b/integration/spark-common-test/src/test/resources/complexdata3.csv new file mode 100644 index 000..63cd44b --- /dev/null +++ b/integration/spark-common-test/src/test/resources/complexdata3.csv @@ -0,0 +1,10 @@ +e01a1773-bd37-40be-a1de-d7e74837a281 (0551)96116063 886 00315 (0551)46819921 853 4 0 1568220618904 50 asp fk 2745000 1 0 0 0 0 -0.19569306\0020.10781755\002-0.06963766\002-0.06576662\002-0.17820272\002-0.01949397\0020.08014756\002-0.05287997\0020.02067086\002-0.11302640\0020.07383678\0020.07296083\0020.11693181\002-0.06988186\0020.05753217\002-0.02308202\002-0.03685183\0020.05840293\0020.03959572\002-0.01631518\0020.05918765\0020.07385136\002-0.05143059\002-0.19158234\0020.13839211\002 [...] +f72ce5cb-2ea6-423b-8c1f-6dadfd6f52e7 (0551)73382297 853 00314 (0551)73382297 49 9 0 156827510 1559asp fk 5821000 1 0 0 0 0 -0.19569308\0020.10781755\002-0.06963766\002-0.06576661\002-0.17820270\002-0.01949396\0020.08014755\002-0.05287996\0020.02067086\002-0.11302640\0020.07383677\0020.07296082\0020.11693182\002-0.06988187\0020.05753216\002-0.02308202\002-0.03685183\0020.05840293\0020.03959572\002-0.01631517\0020.05918765\0020.07385137\002-0.05143059\002-0.19158235\0020.13839212\00 [...] +e282ecb5-9be8-4a0e-8faf-d10e535ab877 13396633307 49 00319 13918448986 1 7 0 1568260253193 1150asp fk 3884000 1 0 0 0 0 -0.19569308\0020.10781755\002-0.06963766\002-0.06576661\002-0.17820270\002-0.01949396\0020.08014755\002-0.05287996\0020.02067086\002-0.11302640\0020.07383677\0020.07296082\0020.11693182\002-0.06988187\0020.05753216\002-0.02308202\002-0.03685183\0020.05840293\0020.03959572\002-0.01631517\0020.05918765\0020.07385137\002-0.05143059\002-0.19158235\0020.13839212\002-0.0826 [...] +01e36a06-b4fd-4638-862c-2785f9e4331b 13924865616 82 00310 0086(021)60080162 82 6 0 1568293725356 2108 asp fk 3152000 1 0 0 0 0 -0.19569308\0020.10781755\002-0.06963766\002-0.06576661\002-0.17820270\002-0.01949396\0020.08014755\002-0.05287996\0020.02067086\002-0.11302640\0020.07383677\0020.07296082\0020.11693182\002-0.06988187\0020.05753216\002-0.02308202\002-0.03685183\0020.05840293\0020.03959572\002-0.01631517\0020.05918765\0020.07385137\002-0.05143059\002-0.19158235\0020.13839212\002 [...] +a451790d-42f8-48e5-88f4-ba21118e63e6 13326037312 81 00318 (0551)17198025 852 2 0 1568294179731 2116asp fk 1127000 1 0 0 0 0 -0.19569308\0020.10781755\002-0.06963766\002-0.06576661\002-0.17820270\002-0.01949396\0020.08014755\002-0.05287996\0020.02067086\002-0.11302640\0020.07383677\0020.07296082\0020.11693182\002-0.06988187\0020.05753216\002-0.02308202\002-0.03685183\0020.05840293\0020.03959572\002-0.01631517\0020.05918765\0020.07385137\002-0.05143059\002-0.19158235\0020.13839212\002-0 [...] +9d26e280-4e87-4cb
[carbondata] 15/33: [CARBONDATA-3507] Fix Create Table As Select Failure in Spark-2.3
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit d509cd19e3a9249f18c6b8b0ab2bbe19df017e65 Author: manishnalla1994 AuthorDate: Thu Aug 29 12:00:11 2019 +0530 [CARBONDATA-3507] Fix Create Table As Select Failure in Spark-2.3 Problem: Create table as select fails with Spark-2.3. Cause: When creating the table location path the function removes the "hdfs://" part from the path and then stores it, due to which in later stages the file is treated as a Local Carbon File. Solution: Get the original table path without removing the prefix. This closes #3368 --- .../main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala| 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala index b19b11c..684bcbb 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala @@ -567,9 +567,7 @@ class CarbonFileMetastore extends CarbonMetaStore { } val tableLocation = catalogTable.storage.locationUri match { case tableLoc@Some(uri) => -if (tableLoc.get.isInstanceOf[URI]) { - FileFactory.getUpdatedFilePath(tableLoc.get.asInstanceOf[URI].getPath) -} +FileFactory.getUpdatedFilePath(tableLoc.get.toString) case None => CarbonEnv.getTablePath(tableIdentifier.database, tableIdentifier.table)(sparkSession) }
[carbondata] 22/33: [HOTFIX] fix incorrect word in index-server doc
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 8ffbc1d6afda2ad77efeb738168b972239c89731 Author: lamber-ken <2217232...@qq.com> AuthorDate: Tue Sep 17 02:08:40 2019 +0800 [HOTFIX] fix incorrect word in index-server doc fix incorrect word in index-server doc This closes #3390 --- docs/index-server.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index-server.md b/docs/index-server.md index 9253f2a..0b888c4 100644 --- a/docs/index-server.md +++ b/docs/index-server.md @@ -191,7 +191,7 @@ that will authenticate the user to access the index server and no other service. ## Starting the Server ``` -./bin/spark-submit --master [yarn/local] --[o ptional parameters] --class org.apache.carbondata.indexserver.IndexServer [path to carbondata-spark2-.jar] +./bin/spark-submit --master [yarn/local] --[optional parameters] --class org.apache.carbondata.indexserver.IndexServer [path to carbondata-spark2-.jar] ``` Or ```
[carbondata] 11/33: [CARBONDATA-3497] Support to write long string for streaming table
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 99e0c7cb59fdf9d89a797b1b13923b64639dfc30 Author: Zhang Zhichao <441586...@qq.com> AuthorDate: Tue Aug 27 11:32:48 2019 +0800 [CARBONDATA-3497] Support to write long string for streaming table This closes #3366 --- .../hadoop/stream/StreamRecordReader.java | 19 +- .../resources/streamSample_with_long_string.csv| 6 + .../streaming/CarbonAppendableStreamSink.scala | 19 +- .../converter/SparkDataTypeConverterImpl.java | 6 +- .../TestStreamingTableWithLongString.scala | 649 + .../streaming/CarbonStreamRecordWriter.java| 11 +- .../streaming/parser/CSVStreamParserImp.java | 5 +- .../streaming/parser/CarbonStreamParser.java | 3 +- .../streaming/parser/RowStreamParserImp.scala | 11 +- 9 files changed, 715 insertions(+), 14 deletions(-) diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/stream/StreamRecordReader.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/stream/StreamRecordReader.java index 75e36be..1e40baa 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/stream/StreamRecordReader.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/stream/StreamRecordReader.java @@ -81,6 +81,7 @@ public class StreamRecordReader extends RecordReader { protected CarbonTable carbonTable; private CarbonColumn[] storageColumns; private boolean[] isRequired; + private boolean[] dimensionsIsVarcharTypeMap; private DataType[] measureDataTypes; private int dimensionCount; private int measureCount; @@ -163,6 +164,10 @@ public class StreamRecordReader extends RecordReader { .getDirectDictionaryGenerator(storageColumns[i].getDataType()); } } +dimensionsIsVarcharTypeMap = new boolean[dimensionCount]; +for (int i = 0; i < dimensionCount; i++) { + dimensionsIsVarcharTypeMap[i] = storageColumns[i].getDataType() == DataTypes.VARCHAR; +} measureDataTypes = new DataType[measureCount]; for (int i = 0; i < measureCount; i++) { measureDataTypes[i] = storageColumns[dimensionCount + i].getDataType(); @@ -387,7 +392,12 @@ public class StreamRecordReader extends RecordReader { } } else { if (isNoDictColumn[colCount]) { - int v = input.readShort(); + int v = 0; + if (dimensionsIsVarcharTypeMap[colCount]) { +v = input.readInt(); + } else { +v = input.readShort(); + } if (isRequired[colCount]) { byte[] b = input.readBytes(v); if (isFilterRequired[colCount]) { @@ -561,7 +571,12 @@ public class StreamRecordReader extends RecordReader { outputValues[colCount] = CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY; } else { if (isNoDictColumn[colCount]) { - int v = input.readShort(); + int v = 0; + if (dimensionsIsVarcharTypeMap[colCount]) { +v = input.readInt(); + } else { +v = input.readShort(); + } outputValues[colCount] = input.readBytes(v); } else { outputValues[colCount] = input.readInt(); diff --git a/integration/spark-common-test/src/test/resources/streamSample_with_long_string.csv b/integration/spark-common-test/src/test/resources/streamSample_with_long_string.csv new file mode 100644 index 000..b010c07 --- /dev/null +++ b/integration/spark-common-test/src/test/resources/streamSample_with_long_string.csv @@ -0,0 +1,6 @@ +id,name,city,salary,tax,percent,birthday,register,updated,longstr,file +10001,batch_1,city_1,0.1,0.01,80.01,1990-01-01,2010-01-01 10:01:01,2010-01-01 10:01:01,1abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabca [...] +10002,batch_2,city_2,0.2,0.02,80.02,1990-01-02,2010-01-02 10:01:01,2010-01-02 10:01:01,2abcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabcabca [...] +10003,batch_3,city_3,0.3,0.03,80.03,1990-01-03,2010-01-03 10:01:01,2010-
[carbondata] 26/33: [HOTFIX] Fix wrong min/max index of measure
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit de81b380b71b111061bf27d8837b50a995631268 Author: QiangCai AuthorDate: Wed Sep 18 21:02:15 2019 +0800 [HOTFIX] Fix wrong min/max index of measure This closes #3394 --- .../carbondata/core/util/CarbonMetadataUtil.java | 64 .../org/apache/carbondata/sdk/file/MinMaxTest.java | 161 + 2 files changed, 188 insertions(+), 37 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java b/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java index f35afc0..7414ab7 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java @@ -477,50 +477,40 @@ public class CarbonMetadataUtil { ByteBuffer firstBuffer = null; ByteBuffer secondBuffer = null; if (dataType == DataTypes.BOOLEAN || dataType == DataTypes.BYTE) { - return first[0] - second[0]; + if (first[0] > second[0]) { +return 1; + } else if (first[0] < second[0]) { +return -1; + } + return 0; } else if (dataType == DataTypes.DOUBLE) { - firstBuffer = ByteBuffer.allocate(8); - firstBuffer.put(first); - secondBuffer = ByteBuffer.allocate(8); - secondBuffer.put(second); - firstBuffer.flip(); - secondBuffer.flip(); - double compare = firstBuffer.getDouble() - secondBuffer.getDouble(); - if (compare > 0) { -compare = 1; - } else if (compare < 0) { -compare = -1; + double firstValue = ((ByteBuffer) (ByteBuffer.allocate(8).put(first).flip())).getDouble(); + double secondValue = ((ByteBuffer) (ByteBuffer.allocate(8).put(second).flip())).getDouble(); + if (firstValue > secondValue) { +return 1; + } else if (firstValue < secondValue) { +return -1; } - return (int) compare; + return 0; } else if (dataType == DataTypes.FLOAT) { - firstBuffer = ByteBuffer.allocate(8); - firstBuffer.put(first); - secondBuffer = ByteBuffer.allocate(8); - secondBuffer.put(second); - firstBuffer.flip(); - secondBuffer.flip(); - double compare = firstBuffer.getFloat() - secondBuffer.getFloat(); - if (compare > 0) { -compare = 1; - } else if (compare < 0) { -compare = -1; + float firstValue = ((ByteBuffer) (ByteBuffer.allocate(8).put(first).flip())).getFloat(); + float secondValue = ((ByteBuffer) (ByteBuffer.allocate(8).put(second).flip())).getFloat(); + if (firstValue > secondValue) { +return 1; + } else if (firstValue < secondValue) { +return -1; } - return (int) compare; + return 0; } else if (dataType == DataTypes.LONG || dataType == DataTypes.INT || dataType == DataTypes.SHORT) { - firstBuffer = ByteBuffer.allocate(8); - firstBuffer.put(first); - secondBuffer = ByteBuffer.allocate(8); - secondBuffer.put(second); - firstBuffer.flip(); - secondBuffer.flip(); - long compare = firstBuffer.getLong() - secondBuffer.getLong(); - if (compare > 0) { -compare = 1; - } else if (compare < 0) { -compare = -1; + long firstValue = ((ByteBuffer) (ByteBuffer.allocate(8).put(first).flip())).getLong(); + long secondValue = ((ByteBuffer) (ByteBuffer.allocate(8).put(second).flip())).getLong(); + if (firstValue > secondValue) { +return 1; + } else if (firstValue < secondValue) { +return -1; } - return (int) compare; + return 0; } else if (DataTypes.isDecimal(dataType)) { return DataTypeUtil.byteToBigDecimal(first).compareTo(DataTypeUtil.byteToBigDecimal(second)); } else { diff --git a/store/sdk/src/test/java/org/apache/carbondata/sdk/file/MinMaxTest.java b/store/sdk/src/test/java/org/apache/carbondata/sdk/file/MinMaxTest.java new file mode 100644 index 000..c26fdd5 --- /dev/null +++ b/store/sdk/src/test/java/org/apache/carbondata/sdk/file/MinMaxTest.java @@ -0,0 +1,161 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES
[carbondata] 10/33: [CARBONDATA-3505] Drop database cascade fix
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 2328707b4477a11b7713aee8f123b780ad48cc25 Author: kunal642 AuthorDate: Tue Aug 27 14:49:58 2019 +0530 [CARBONDATA-3505] Drop database cascade fix Problem: When 2 databases are created on same location and one of them is dropped then the folder is also deleted from backend. If we try to drop the 2nd database then it would try to lookup the other table, but the schema file would not exist in the backend and the drop will fail. Solution: Add a check to call CarbonDropDatabaseCommand only if the database location exists in the backend. This closes #3365 --- .../main/scala/org/apache/spark/sql/CarbonEnv.scala | 19 ++- .../command/cache/CarbonShowCacheCommand.scala| 4 ++-- .../spark/sql/execution/strategy/DDLStrategy.scala| 4 +++- .../apache/spark/sql/hive/CarbonFileMetastore.scala | 4 ++-- 4 files changed, 25 insertions(+), 6 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala index 1cbd156..f2a52d2 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala @@ -20,7 +20,7 @@ package org.apache.spark.sql import java.util.concurrent.ConcurrentHashMap import org.apache.spark.sql.catalyst.TableIdentifier -import org.apache.spark.sql.catalyst.analysis.NoSuchTableException +import org.apache.spark.sql.catalyst.analysis.{NoSuchDatabaseException, NoSuchTableException} import org.apache.spark.sql.catalyst.catalog.SessionCatalog import org.apache.spark.sql.events.{MergeBloomIndexEventListener, MergeIndexEventListener} import org.apache.spark.sql.execution.command.cache._ @@ -267,6 +267,23 @@ object CarbonEnv { } /** + * Returns true with the database folder exists in file system. False in all other scenarios. + */ + def databaseLocationExists(dbName: String, + sparkSession: SparkSession, ifExists: Boolean): Boolean = { +try { + FileFactory.getCarbonFile(getDatabaseLocation(dbName, sparkSession)).exists() +} catch { + case e: NoSuchDatabaseException => +if (ifExists) { + false +} else { + throw e +} +} + } + + /** * The method returns the database location * if carbon.storeLocation does point to spark.sql.warehouse.dir then returns * the database locationUri as database location else follows the old behaviour diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala index 45e811a..4b7f680 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala @@ -443,9 +443,9 @@ case class CarbonShowCacheCommand(tableIdentifier: Option[TableIdentifier], case (_, _, sum, provider) => provider.toLowerCase match { case `bloomFilterIdentifier` => -allIndexSize += sum - case _ => allDatamapSize += sum + case _ => +allIndexSize += sum } } (allIndexSize, allDatamapSize) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala index 4791687..3ef8cfa 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala @@ -37,6 +37,7 @@ import org.apache.spark.util.{CarbonReflectionUtils, DataMapUtil, FileUtils, Spa import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.core.datastore.impl.FileFactory import org.apache.carbondata.core.metadata.schema.table.CarbonTable import org.apache.carbondata.core.util.{CarbonProperties, DataTypeUtil, ThreadLocalSessionInfo} import org.apache.carbondata.spark.util.Util @@ -115,7 +116,8 @@ class DDLStrategy(sparkSession: SparkSession) extends SparkStrategy { .setConfigurationToCurrentThread(sparkSession.sessionState.newHadoopConf()) FileUtils.createDatabaseDirectory(dbName, dbLocation, sparkSession.sparkContext) ExecutedCommandExec(createDb) :: Nil - case drop@DropDatabaseCommand(dbName, ifEx
[carbondata] 30/33: [CARBONDATA-3523] Store data file size into index file
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 49e9ea3c6a75532e4f51d924bce4334687597a9c Author: QiangCai AuthorDate: Tue Aug 13 10:25:31 2019 +0800 [CARBONDATA-3523] Store data file size into index file In BlockIndex, the file_size is always zero. We can set the actual value during data loading and use it during the query to improve the query performance. 1. avoid invoking listFiles for each segment 2. avoid invoking getFileStatus for each data file This closes #3356 --- .../core/datastore/block/TableBlockInfo.java | 13 + .../carbondata/core/metadata/index/BlockIndexInfo.java | 18 ++ .../core/util/AbstractDataFileFooterConverter.java | 3 +++ .../carbondata/core/util/BlockletDataMapUtil.java | 17 ++--- .../carbondata/core/util/CarbonMetadataUtil.java | 1 + .../store/writer/AbstractFactDataWriter.java | 5 +++-- .../store/writer/v3/CarbonFactDataWriterImplV3.java| 18 ++ 7 files changed, 62 insertions(+), 13 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java b/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java index 25d82f8..4dd1403 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java @@ -54,6 +54,11 @@ public class TableBlockInfo implements Distributable, Serializable { private String filePath; /** + * file size of the block + */ + private long fileSize; + + /** * block offset in the file */ private long blockOffset; @@ -439,6 +444,14 @@ public class TableBlockInfo implements Distributable, Serializable { this.filePath = filePath; } + public long getFileSize() { +return fileSize; + } + + public void setFileSize(long fileSize) { +this.fileSize = fileSize; + } + public BlockletDetailInfo getDetailInfo() { return detailInfo; } diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/index/BlockIndexInfo.java b/core/src/main/java/org/apache/carbondata/core/metadata/index/BlockIndexInfo.java index ae99ed8..f7f2d3c 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/index/BlockIndexInfo.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/index/BlockIndexInfo.java @@ -51,6 +51,11 @@ public class BlockIndexInfo { private BlockletInfo blockletInfo; /** + * file size + */ + private long fileSize; + + /** * Constructor * * @param numberOfRows number of rows @@ -80,6 +85,12 @@ public class BlockIndexInfo { this.blockletInfo = blockletInfo; } + public BlockIndexInfo(long numberOfRows, String fileName, long offset, + BlockletIndex blockletIndex, BlockletInfo blockletInfo, long fileSize) { +this(numberOfRows, fileName, offset, blockletIndex, blockletInfo); +this.fileSize = fileSize; + } + /** * @return the numberOfRows */ @@ -114,4 +125,11 @@ public class BlockIndexInfo { public BlockletInfo getBlockletInfo() { return blockletInfo; } + + /** + * @return file size + */ + public long getFileSize() { +return fileSize; + } } diff --git a/core/src/main/java/org/apache/carbondata/core/util/AbstractDataFileFooterConverter.java b/core/src/main/java/org/apache/carbondata/core/util/AbstractDataFileFooterConverter.java index 64d30c2..f16a3ae 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/AbstractDataFileFooterConverter.java +++ b/core/src/main/java/org/apache/carbondata/core/util/AbstractDataFileFooterConverter.java @@ -244,6 +244,9 @@ public abstract class AbstractDataFileFooterConverter { } fileName = (CarbonCommonConstants.FILE_SEPARATOR + fileName).replaceAll("//", "/"); tableBlockInfo.setFilePath(parentPath + fileName); +if (readBlockIndexInfo.isSetFile_size()) { + tableBlockInfo.setFileSize(readBlockIndexInfo.getFile_size()); +} return tableBlockInfo; } diff --git a/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java b/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java index 6cd60a2..5a988c4 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java @@ -38,9 +38,11 @@ import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.datamap.Segment; import org.apache.carbondata.core.datastore.block.SegmentProperties; +import org.apache.carbondata.core.datastore.block.Tab
[carbondata] 03/33: [CARBONDATA-3493] Initialize Profiler in CarbonEnv
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 90b6c648d3b20e563e685b9aa8c8bafcefbcf3ad Author: akashrn5 AuthorDate: Wed Jul 31 18:54:41 2019 +0530 [CARBONDATA-3493] Initialize Profiler in CarbonEnv Problem: After enabling "enable.query.statistics", exception is thrown while querying because profiler is not initialized before setting up the rpc end point connection. Solution: Initialized Profiler in CarbonEnv before setting up the rpc end point connection. This closes #3342 Co-authored-by: shivamasn --- integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala | 2 ++ .../spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala | 2 -- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala index c13e7b9..1cbd156 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala @@ -28,6 +28,7 @@ import org.apache.spark.sql.execution.command.mv._ import org.apache.spark.sql.execution.command.preaaggregate._ import org.apache.spark.sql.execution.command.timeseries.TimeSeriesFunction import org.apache.spark.sql.hive._ +import org.apache.spark.sql.profiler.Profiler import org.apache.carbondata.common.logging.LogServiceFactory import org.apache.carbondata.core.constants.CarbonCommonConstants @@ -121,6 +122,7 @@ class CarbonEnv { initialized = true } } +Profiler.initialize(sparkSession.sparkContext) LOGGER.info("Initialize CarbonEnv completed...") } } diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala index 7b1bf4c..deefcd1 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonSession.scala @@ -259,8 +259,6 @@ object CarbonSession { } options.foreach { case (k, v) => session.sessionState.conf.setConfString(k, v) } SparkSession.setDefaultSession(session) -// Setup monitor end point and register CarbonMonitorListener -Profiler.initialize(sparkContext) // Register a successfully instantiated context to the singleton. This should be at the // end of the class definition so that the singleton is updated only if there is no // exception in the construction of the instance.
[carbondata] 01/33: [CARBONDATA-3480] Fixed unnecessary refresh for table by removing modified mdt file
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 56879f747d150f4efae0f2998a38f4297706bc5e Author: kunal642 AuthorDate: Fri Jul 26 14:52:36 2019 +0530 [CARBONDATA-3480] Fixed unnecessary refresh for table by removing modified mdt file This closes #3339 --- .../carbondata/core/datamap/DataMapFilter.java | 47 +++ .../core/datamap/DataMapStoreManager.java | 14 +- .../carbondata/core/metadata/CarbonMetadata.java | 9 + .../core/metadata/schema/table/CarbonTable.java| 4 +- .../core/metadata/schema/table/TableSchema.java| 4 + .../statusmanager/SegmentUpdateStatusManager.java | 26 -- .../apache/carbondata/core/util/CarbonUtil.java| 1 - .../core/metadata/CarbonMetadataTest.java | 7 +- .../ThriftWrapperSchemaConverterImplTest.java | 4 +- .../metadata/schema/table/CarbonTableTest.java | 8 +- .../table/CarbonTableWithComplexTypesTest.java | 6 +- .../dblocation/DBLocationCarbonTableTestCase.scala | 25 -- .../apache/spark/sql/hive/CarbonSessionUtil.scala | 6 +- .../carbondata/indexserver/IndexServer.scala | 10 +- .../scala/org/apache/spark/sql/CarbonEnv.scala | 51 ++- .../command/datamap/CarbonDropDataMapCommand.scala | 1 - .../management/RefreshCarbonTableCommand.scala | 2 - .../CarbonAlterTableDropPartitionCommand.scala | 12 +- .../CarbonAlterTableSplitPartitionCommand.scala| 3 - .../command/preaaggregate/PreAggregateUtil.scala | 19 +- .../command/table/CarbonDropTableCommand.scala | 13 + .../spark/sql/hive/CarbonFileMetastore.scala | 425 + .../spark/sql/hive/CarbonHiveMetaStore.scala | 10 +- .../apache/spark/sql/hive/CarbonMetaStore.scala| 10 +- .../scala/org/apache/spark/util/CleanFiles.scala | 3 - .../scala/org/apache/spark/util/Compaction.scala | 2 - .../apache/spark/util/DeleteSegmentByDate.scala| 2 - .../org/apache/spark/util/DeleteSegmentById.scala | 2 - .../scala/org/apache/spark/util/TableLoader.scala | 2 - .../apache/spark/sql/hive/CarbonSessionState.scala | 31 +- .../AlterTableColumnRenameTestCase.scala | 4 +- 31 files changed, 322 insertions(+), 441 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java index c20d0d5..ac4886d 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java @@ -18,10 +18,15 @@ package org.apache.carbondata.core.datamap; import java.io.Serializable; +import java.util.HashSet; +import java.util.Set; import org.apache.carbondata.core.datastore.block.SegmentProperties; import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; import org.apache.carbondata.core.scan.executor.util.RestructureUtil; +import org.apache.carbondata.core.scan.expression.ColumnExpression; import org.apache.carbondata.core.scan.expression.Expression; import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; @@ -39,9 +44,51 @@ public class DataMapFilter implements Serializable { public DataMapFilter(CarbonTable table, Expression expression) { this.table = table; this.expression = expression; +if (expression != null) { + checkIfFilterColumnExistsInTable(); +} resolve(); } + private Set extractColumnExpressions(Expression expression) { +Set columnExpressionList = new HashSet<>(); +for (Expression expressions: expression.getChildren()) { + if (expressions != null && expressions.getChildren() != null + && expressions.getChildren().size() > 0) { +columnExpressionList.addAll(extractColumnExpressions(expressions)); + } else if (expressions instanceof ColumnExpression) { +columnExpressionList.add(((ColumnExpression) expressions).getColumnName()); + } +} +return columnExpressionList; + } + + private void checkIfFilterColumnExistsInTable() { +Set columnExpressionList = extractColumnExpressions(expression); +for (String colExpression : columnExpressionList) { + if (colExpression.equalsIgnoreCase("positionid")) { +continue; + } + boolean exists = false; + for (CarbonMeasure carbonMeasure : table.getAllMeasures()) { +if (!carbonMeasure.isInvisible() && carbonMeasure.getColName() +.equalsIgnoreCase(colExpression)) { + exists = true; +} + } + for (CarbonDimension carbonDimension : table.getAllDimensions())
[carbondata] 13/33: [CARBONDATA-3452] dictionary include udf handle all the scenarios
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 6f90b28dd3d3f008f33668884807cc63cb6b5db5 Author: ajantha-bhat AuthorDate: Wed Aug 14 20:36:13 2019 +0530 [CARBONDATA-3452] dictionary include udf handle all the scenarios Problem: select query failure when substring on dictionary column with join. Cause: when dictionary include is present, data type is updated to int from string in plan attribute. so substring was unresolved on int column. Join operation try to reference this attribute which is unresolved. Solution: Need to handle this for all the scenarios in CarbonLateDecodeRule This closes #3358 --- .../hadoop/api/CarbonTableOutputFormat.java| 5 +- .../spark/sql/optimizer/CarbonLateDecodeRule.scala | 141 ++--- .../carbondata/query/SubQueryJoinTestSuite.scala | 19 +++ .../processing/util/CarbonDataProcessorUtil.java | 5 +- 4 files changed, 120 insertions(+), 50 deletions(-) diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java index 9ba5e97..16703bf 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java @@ -19,6 +19,7 @@ package org.apache.carbondata.hadoop.api; import java.io.IOException; import java.util.List; +import java.util.UUID; import java.util.concurrent.ExecutionException; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; @@ -221,8 +222,8 @@ public class CarbonTableOutputFormat extends FileOutputFormat attr + case a@Alias(attr: AttributeReference, _) => a + case others => +// datatype need to change for dictionary columns if only alias +// or attribute ref present. +// If anything else present, no need to change data type. +needChangeDatatype = false +others +} +needChangeDatatype + } + private def updateTempDecoder(plan: LogicalPlan, aliasMapOriginal: CarbonAliasDecoderRelation, attrMap: java.util.HashMap[AttributeReferenceWrapper, CarbonDecoderRelation]): @@ -650,44 +665,71 @@ class CarbonLateDecodeRule extends Rule[LogicalPlan] with PredicateHelper { cd case sort: Sort => val sortExprs = sort.order.map { s => - s.transform { -case attr: AttributeReference => - updateDataType(attr, attrMap, allAttrsNotDecode, aliasMap) - }.asInstanceOf[SortOrder] + if (needDataTypeUpdate(s)) { +s.transform { + case attr: AttributeReference => +updateDataType(attr, attrMap, allAttrsNotDecode, aliasMap) +}.asInstanceOf[SortOrder] + } else { +s + } } Sort(sortExprs, sort.global, sort.child) case agg: Aggregate if !agg.child.isInstanceOf[CarbonDictionaryCatalystDecoder] => val aggExps = agg.aggregateExpressions.map { aggExp => - aggExp.transform { -case attr: AttributeReference => - updateDataType(attr, attrMap, allAttrsNotDecode, aliasMap) + if (needDataTypeUpdate(aggExp)) { +aggExp.transform { + case attr: AttributeReference => +updateDataType(attr, attrMap, allAttrsNotDecode, aliasMap) +} + } else { +aggExp } }.asInstanceOf[Seq[NamedExpression]] - val grpExps = agg.groupingExpressions.map { gexp => - gexp.transform { -case attr: AttributeReference => - updateDataType(attr, attrMap, allAttrsNotDecode, aliasMap) + if (needDataTypeUpdate(gexp)) { +gexp.transform { + case attr: AttributeReference => +updateDataType(attr, attrMap, allAttrsNotDecode, aliasMap) +} + } else { +gexp } } Aggregate(grpExps, aggExps, agg.child) case expand: Expand => -val ex = expand.transformExpressions { - case attr: AttributeReference => -updateDataType(attr, attrMap, allAttrsNotDecode, aliasMap) +// can't use needDataTypeUpdate here as argument is of type Expand +var needChangeDatatype: Boolean = true +expand.transformExpressions { + case attr: AttributeReference => attr + case a@Alias(attr: AttributeReference, _) => a + case others => +// datatype need to change for dictionary columns if only alias +// or attribute ref present. +// If anything else present, no need to change data type. +
[carbondata] 23/33: [CARBONDATA-3489] Optimized the comparator instances in sort
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 9e4664742f04b4995bd8a2ecd465e8797ce7c2d2 Author: Vikram Ahuja AuthorDate: Tue Aug 6 12:21:11 2019 +0530 [CARBONDATA-3489] Optimized the comparator instances in sort Root cause: In case of sorting in the comparator classes(NewRowComparator, RawRowComparator, IntermediateSortTempRowComparator and UnsafeRowComparator) a new SerializableComparator object is been created in the compare method everytime two objects are passed for comparison. Solution: We can reduce the number of SerializeableComparator objects that are been created by storing the SerializeableComparators of primitive datatypes in a map and getting it from the map instead of creating a new SerializeableComparator everytime. This closes #3354 --- .../core/util/comparator/Comparator.java | 49 +++--- .../partition/impl/RawRowComparatorTest.java | 142 .../IntermediateSortTempRowComparatorTest.java | 178 + .../sort/sortdata/NewRowComparatorTest.java| 109 + 4 files changed, 453 insertions(+), 25 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/util/comparator/Comparator.java b/core/src/main/java/org/apache/carbondata/core/util/comparator/Comparator.java index 6981405..d7e8f80 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/comparator/Comparator.java +++ b/core/src/main/java/org/apache/carbondata/core/util/comparator/Comparator.java @@ -25,24 +25,23 @@ import org.apache.carbondata.core.util.ByteUtil; public final class Comparator { + //Comparators are made static so that only one instance is generated + private static final SerializableComparator BOOLEAN = new BooleanSerializableComparator(); + private static final SerializableComparator INT = new IntSerializableComparator(); + private static final SerializableComparator SHORT = new ShortSerializableComparator(); + private static final SerializableComparator DOUBLE = new DoubleSerializableComparator(); + private static final SerializableComparator FLOAT = new FloatSerializableComparator(); + private static final SerializableComparator LONG = new LongSerializableComparator(); + private static final SerializableComparator DECIMAL = new BigDecimalSerializableComparator(); + private static final SerializableComparator BYTE = new ByteArraySerializableComparator(); + public static SerializableComparator getComparator(DataType dataType) { -if (dataType == DataTypes.BOOLEAN) { - return new BooleanSerializableComparator(); -} else if (dataType == DataTypes.INT) { - return new IntSerializableComparator(); -} else if (dataType == DataTypes.SHORT) { - return new ShortSerializableComparator(); -} else if (dataType == DataTypes.DOUBLE) { - return new DoubleSerializableComparator(); -} else if (dataType == DataTypes.FLOAT) { - return new FloatSerializableComparator(); -} else if (dataType == DataTypes.LONG || dataType == DataTypes.DATE -|| dataType == DataTypes.TIMESTAMP) { - return new LongSerializableComparator(); -} else if (DataTypes.isDecimal(dataType)) { - return new BigDecimalSerializableComparator(); +if (dataType == DataTypes.DATE || dataType == DataTypes.TIMESTAMP) { + return LONG; +} else if (dataType == DataTypes.STRING) { + return BYTE; } else { - return new ByteArraySerializableComparator(); + return getComparatorByDataTypeForMeasure(dataType); } } @@ -54,21 +53,21 @@ public final class Comparator { */ public static SerializableComparator getComparatorByDataTypeForMeasure(DataType dataType) { if (dataType == DataTypes.BOOLEAN) { - return new BooleanSerializableComparator(); + return BOOLEAN; } else if (dataType == DataTypes.INT) { - return new IntSerializableComparator(); + return INT; } else if (dataType == DataTypes.SHORT) { - return new ShortSerializableComparator(); + return SHORT; } else if (dataType == DataTypes.LONG) { - return new LongSerializableComparator(); + return LONG; } else if (dataType == DataTypes.DOUBLE) { - return new DoubleSerializableComparator(); + return DOUBLE; } else if (dataType == DataTypes.FLOAT) { - return new FloatSerializableComparator(); + return FLOAT; } else if (DataTypes.isDecimal(dataType)) { - return new BigDecimalSerializableComparator(); + return DECIMAL; } else if (dataType == DataTypes.BYTE) { - return new ByteArraySerializableComparator(); + return BYTE; } else { throw new IllegalArgumentException("Unsupported data type: " + dataType.getName()); } @@ -198,4 +197,4 @@ class BigDecimalSerializable
[carbondata] 25/33: [DOC] Update doc for alter sort_columns
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit ab86705f64f460b639e05a110d6a4e13977cc773 Author: QiangCai AuthorDate: Fri Sep 20 09:54:52 2019 +0800 [DOC] Update doc for alter sort_columns This closes #3395 --- docs/ddl-of-carbondata.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md index 7ab0e5f..9c9a02f 100644 --- a/docs/ddl-of-carbondata.md +++ b/docs/ddl-of-carbondata.md @@ -817,7 +817,7 @@ Users can specify which columns to include and exclude for local dictionary gene ``` **NOTE:** -* The future version will enhance "custom" compaction to sort the old segment one by one. +* The "custom" compaction support re-sorting the old segment one by one in version 1.6 or later. * The streaming table is not supported for SORT_COLUMNS modification. * If the inverted index columns are removed from the new SORT_COLUMNS, they will not create the inverted index. But the old configuration of INVERTED_INDEX will be kept.
[carbondata] 06/33: [CARBONDATA-3509] Support disable query prefetch by configuration
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 129b1634aa8eab29e6ec75653a9d9bd451d965b1 Author: ajantha-bhat AuthorDate: Fri Aug 30 11:08:09 2019 +0530 [CARBONDATA-3509] Support disable query prefetch by configuration Support disable query prefetch by configuration: Prefetch runs in asynch thread in query and it is always enable in query flow. If some user wants to disable it, they can use this property to disable and observe in logs. This closes #3370 --- .../carbondata/core/constants/CarbonCommonConstants.java | 9 + .../core/scan/executor/impl/AbstractQueryExecutor.java | 2 ++ .../apache/carbondata/core/scan/model/QueryModel.java| 4 +++- .../apache/carbondata/core/util/CarbonProperties.java| 16 docs/configuration-parameters.md | 1 + 5 files changed, 31 insertions(+), 1 deletion(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 17b191d..67fa13f 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1483,6 +1483,15 @@ public final class CarbonCommonConstants { public static final String CARBON_MAX_EXECUTOR_THREADS_FOR_BLOCK_PRUNING_DEFAULT = "4"; + /* + * whether to enable prefetch for query + */ + @CarbonProperty + public static final String CARBON_QUERY_PREFETCH_ENABLE = + "carbon.query.prefetch.enable"; + + public static final String CARBON_QUERY_PREFETCH_ENABLE_DEFAULT = "true"; + // // Datamap parameter start here // diff --git a/core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java b/core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java index b3d4780..6760e77 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java @@ -493,7 +493,9 @@ public abstract class AbstractQueryExecutor implements QueryExecutor { segmentProperties.getDimensionOrdinalToChunkMapping().size()); if (queryModel.isReadPageByPage()) { blockExecutionInfo.setPrefetchBlocklet(false); + LOGGER.info("Query prefetch is: false, read page by page"); } else { + LOGGER.info("Query prefetch is: " + queryModel.isPreFetchData()); blockExecutionInfo.setPrefetchBlocklet(queryModel.isPreFetchData()); } // In case of fg datamap it should not go to direct fill. diff --git a/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java b/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java index 267527f..4d10492 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java @@ -34,6 +34,7 @@ import org.apache.carbondata.core.scan.expression.UnknownExpression; import org.apache.carbondata.core.scan.expression.conditional.ConditionalExpression; import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; import org.apache.carbondata.core.stats.QueryStatisticsRecorder; +import org.apache.carbondata.core.util.CarbonProperties; import org.apache.carbondata.core.util.CarbonUtil; import org.apache.carbondata.core.util.DataTypeConverter; @@ -110,7 +111,7 @@ public class QueryModel { // whether to clear/free unsafe memory or not private boolean freeUnsafeMemory = true; - private boolean preFetchData = true; + private boolean preFetchData; /** * It fills the vector directly from decoded column page with out any staging and conversions. @@ -125,6 +126,7 @@ public class QueryModel { tableBlockInfos = new ArrayList(); this.table = carbonTable; this.queryId = String.valueOf(System.nanoTime()); +this.preFetchData = CarbonProperties.getQueryPrefetchEnable(); } public static QueryModel newInstance(CarbonTable carbonTable) { diff --git a/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java b/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java index c60dad8..adf4905 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java +++ b/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java @@ -1773
[carbondata] 09/33: [CARBONDATA-3502] Select query with UDF having Match expression inside IN expression Fails
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 75e207cf7ac3fa262ba04cc3c8d7f2d902256882 Author: manishnalla1994 AuthorDate: Mon Aug 26 17:25:34 2019 +0530 [CARBONDATA-3502] Select query with UDF having Match expression inside IN expression Fails Problem: Select query with UDF having Match expression inside IN expression Fails with ArrayIndexOutOfBounds exception. Cause: The expression should not be treated as Match expression, instead should be treated as SparkUnknownExpression. Solution: Removed the check for Match Expression as it was only added for Lucene Search mode, which is no longer present. This closes #3363 --- .../src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala | 2 -- 1 file changed, 2 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala index c4415f8..0fd07bb 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala @@ -425,8 +425,6 @@ object CarbonFilters { new AndExpression(l, r) case strTrim: StringTrim if isStringTrimCompatibleWithCarbon(strTrim) => transformExpression(strTrim) - case s: ScalaUDF => -new MatchExpression(s.children.head.toString()) case _ => new SparkUnknownExpression(expr.transform { case AttributeReference(name, dataType, _, _) =>
[carbondata] 27/33: [CARBONDATA-3473] Fix data size calcution of the last column in CarbonCli
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 57309d70d08675c31975d2a60692835e7a6c22cf Author: Manhua AuthorDate: Wed Jul 17 17:39:29 2019 +0800 [CARBONDATA-3473] Fix data size calcution of the last column in CarbonCli When update last column chunk data size, current code use columnDataSize.add(fileSizeInBytes - footerSizeInBytes - previousChunkOffset) for every blocklet. This leads to wrong result for calculting the data size of the last column, especially when a carbon data file has multiple blocklet. In this PR, we fix this problem and modify the calcultion by remarking the end offset of blocklet. This closes #3330 --- .../java/org/apache/carbondata/tool/DataFile.java | 32 +++--- .../org/apache/carbondata/tool/CarbonCliTest.java | 6 ++-- 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java b/tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java index e553a78..4ed3945 100644 --- a/tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java +++ b/tools/cli/src/main/java/org/apache/carbondata/tool/DataFile.java @@ -121,16 +121,21 @@ class DataFile { this.partNo = CarbonTablePath.DataFileUtil.getPartNo(fileName); // calculate blocklet size and column size -// first calculate the header size, it equals the offset of first -// column chunk in first blocklet -long headerSizeInBytes = footer.blocklet_info_list3.get(0).column_data_chunks_offsets.get(0); -long previousOffset = headerSizeInBytes; -for (BlockletInfo3 blockletInfo3 : footer.blocklet_info_list3) { +for (int j = 0; j < footer.getBlocklet_info_list3().size(); j++) { + // remark start and end offset of current blocklet for computing blocklet size + // and chunk data size of the last column + BlockletInfo3 blockletInfo3 = footer.blocklet_info_list3.get(j); + long blockletEndOffset; + if (j != footer.getBlocklet_info_list3().size() - 1) { +// use start offset of next blocklet as end offset of current blocklet +blockletEndOffset = footer.blocklet_info_list3.get(j + 1).column_data_chunks_offsets.get(j); + } else { +// use start offset of footer as end offset of current blocklet if it is the last blocklet +blockletEndOffset = fileSizeInBytes - footerSizeInBytes; + } // calculate blocklet size in bytes - long blockletOffset = blockletInfo3.column_data_chunks_offsets.get(0); - blockletSizeInBytes.add(blockletOffset - previousOffset); - previousOffset = blockletOffset; - + this.blockletSizeInBytes.add( + blockletEndOffset - blockletInfo3.column_data_chunks_offsets.get(0)); // calculate column size in bytes for each column LinkedList columnDataSize = new LinkedList<>(); LinkedList columnMetaSize = new LinkedList<>(); @@ -140,17 +145,12 @@ class DataFile { columnMetaSize.add(blockletInfo3.column_data_chunks_length.get(i).longValue()); previousChunkOffset = blockletInfo3.column_data_chunks_offsets.get(i); } - // last column chunk data size - columnDataSize.add(fileSizeInBytes - footerSizeInBytes - previousChunkOffset); + // update chunk data size of the last column + columnDataSize.add(blockletEndOffset - previousChunkOffset); columnDataSize.removeFirst(); this.columnDataSizeInBytes.add(columnDataSize); this.columnMetaSizeInBytes.add(columnMetaSize); - } -// last blocklet size -blockletSizeInBytes.add( -fileSizeInBytes - footerSizeInBytes - headerSizeInBytes - previousOffset); -this.blockletSizeInBytes.removeFirst(); assert (blockletSizeInBytes.size() == getNumBlocklets()); } diff --git a/tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java b/tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java index af8d51d..4d89777 100644 --- a/tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java +++ b/tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java @@ -234,11 +234,11 @@ public class CarbonCliTest { expectedOutput = buildLines( "BLK BLKLT Meta Size Data Size LocalDict DictEntries DictSize AvgPageSize Min% Max% Min Max " , -"00 3.36KB 5.14MB false 00.0B 93.76KB 0.0 100.0 0290 " , +"00 3.36KB 2.57MB false 00.0B 93.76KB 0.0 100.0 0290 " , "01 3.36KB 2.57MB false 00.0B 93.76KB 0.0 100.0 1292 " , -"10 3.36KB 5.14MB false 0
[carbondata] 20/33: [HOTFIX] fix missing quotation marks in datamap doc
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 204f29047b2685d03f27febcec24095f56053a31 Author: lamber-ken <2217232...@qq.com> AuthorDate: Wed Sep 11 21:48:21 2019 +0800 [HOTFIX] fix missing quotation marks in datamap doc This closes #3383 --- docs/datamap/datamap-management.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/datamap/datamap-management.md b/docs/datamap/datamap-management.md index 199cd14..f910559 100644 --- a/docs/datamap/datamap-management.md +++ b/docs/datamap/datamap-management.md @@ -74,7 +74,7 @@ If user perform following command on the main table, system will return failure. `ALTER TABLE RENAME`. Note that adding a new column is supported, and for dropping columns and change datatype command, CarbonData will check whether it will impact the pre-aggregate table, if not, the operation is allowed, otherwise operation will be rejected by throwing exception. -3. Partition management command: `ALTER TABLE ADD/DROP PARTITION +3. Partition management command: `ALTER TABLE ADD/DROP PARTITION`. If user do want to perform above operations on the main table, user can first drop the datamap, perform the operation, and re-create the datamap again.
[carbondata] 12/33: [CARBONDATA-3513] fix 'taskNo' exceeding Long.MAX_VALUE issue when execute major compaction
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 41ae280c16905687c7ea08b3cd05acef9e359c26 Author: changchun wang AuthorDate: Thu Sep 5 16:28:41 2019 +0800 [CARBONDATA-3513] fix 'taskNo' exceeding Long.MAX_VALUE issue when execute major compaction Probelm: Major compaction command runs error. java.lang.NumberFormatException is thrown.java.lang.NumberFormatException: For input string: "328812001110" Through code analysis it was found that taskno is "long" type. taskno generate algorithm may generate a number bigger than "Long.MAX_VALUE". carbondata-3325 change taskno type to string. But in some places it still using long. Solution: Change taskno type to string. This closes #3376 --- .../apache/carbondata/processing/merger/AbstractResultProcessor.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/processing/src/main/java/org/apache/carbondata/processing/merger/AbstractResultProcessor.java b/processing/src/main/java/org/apache/carbondata/processing/merger/AbstractResultProcessor.java index f557e9b..951339a 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/merger/AbstractResultProcessor.java +++ b/processing/src/main/java/org/apache/carbondata/processing/merger/AbstractResultProcessor.java @@ -61,7 +61,7 @@ public abstract class AbstractResultProcessor { carbonDataFileAttributes = new CarbonDataFileAttributes(index, loadModel.getFactTimeStamp()); } else { carbonDataFileAttributes = - new CarbonDataFileAttributes(Long.parseLong(loadModel.getTaskNo()), + new CarbonDataFileAttributes(loadModel.getTaskNo(), loadModel.getFactTimeStamp()); } carbonFactDataHandlerModel.setCarbonDataFileAttributes(carbonDataFileAttributes);
[carbondata] 16/33: [CARBONDATA-3508] Support CG datamap pruning fallback while querying
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 0e2d3e20cad0e7032d959c5f9107249eaa258685 Author: shivamasn AuthorDate: Thu Aug 29 11:49:41 2019 +0530 [CARBONDATA-3508] Support CG datamap pruning fallback while querying Problem: Select query fails when the cg datamap is dropped concurrently while running the select query on filter column on which datamap is created. Solution: Handle the exception from datamap blocklet pruning if it fails and consider only the pruned blocklets from default datamap pruning. This closes #3369 --- .../core/indexstore/BlockletDataMapIndexStore.java | 2 +- .../statusmanager/SegmentUpdateStatusManager.java | 6 ++-- .../datamap/bloom/BloomCoarseGrainDataMap.java | 2 +- .../carbondata/hadoop/api/CarbonInputFormat.java | 32 ++ 4 files changed, 27 insertions(+), 15 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java b/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java index 32ee9cb..fd549e0 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java @@ -80,7 +80,7 @@ public class BlockletDataMapIndexStore return get(identifierWrapper, null); } - private BlockletDataMapIndexWrapper get(TableBlockIndexUniqueIdentifierWrapper identifierWrapper, + public BlockletDataMapIndexWrapper get(TableBlockIndexUniqueIdentifierWrapper identifierWrapper, Map> segInfoCache) throws IOException { TableBlockIndexUniqueIdentifier identifier = identifierWrapper.getTableBlockIndexUniqueIdentifier(); diff --git a/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java b/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java index f7083dc..bc794f4 100644 --- a/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java +++ b/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java @@ -27,8 +27,10 @@ import java.io.InputStreamReader; import java.io.OutputStreamWriter; import java.util.ArrayList; import java.util.HashMap; +import java.util.HashSet; import java.util.List; import java.util.Map; +import java.util.Set; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.constants.CarbonCommonConstants; @@ -790,8 +792,8 @@ public class SegmentUpdateStatusManager { final long deltaEndTimestamp = getEndTimeOfDeltaFile(CarbonCommonConstants.DELETE_DELTA_FILE_EXT, block); -List files = -new ArrayList<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE); +Set files = +new HashSet<>(CarbonCommonConstants.DEFAULT_COLLECTION_SIZE); for (CarbonFile eachFile : allSegmentFiles) { String fileName = eachFile.getName(); diff --git a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java index fea48c3..f931353 100644 --- a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java +++ b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java @@ -232,7 +232,7 @@ public class BloomCoarseGrainDataMap extends CoarseGrainDataMap { LOGGER.warn(String.format("HitBlocklets is empty in bloom filter prune method. " + "bloomQueryModels size is %d, filterShards size if %d", bloomQueryModels.size(), filteredShard.size())); - return null; + return new ArrayList<>(); } return new ArrayList<>(hitBlocklets); } diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java index ac9e11e..45041e4 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java @@ -573,19 +573,29 @@ m filterExpression if (cgDataMapExprWrapper != null) { // Prune segments from already pruned blocklets DataMapUtil.pruneSegments(segmentIds, prunedBlocklets); -List cgPrunedBlocklets; +List cgPrunedBlocklets = new ArrayList<>(); +boolean isCGPruneFallback = false; // Again prune with CG datamap. -if (distributedCG && dataMapJob != null) { - cgPrunedBlocklets = DataMapUtil - .executeDataMapJob(carb
[carbondata] 05/33: [CARBONDATA-3491] Return updated/deleted rows count when execute update/delete sql
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 6c9bbfe8ec5606262b93b101986352578b2b Author: Zhang Zhichao <441586...@qq.com> AuthorDate: Tue Aug 13 11:00:23 2019 +0800 [CARBONDATA-3491] Return updated/deleted rows count when execute update/delete sql Return updated/deleted rows count when execute update/delete sql This closes #3357 --- .../testsuite/iud/DeleteCarbonTableTestCase.scala | 19 + .../testsuite/iud/UpdateCarbonTableTestCase.scala | 33 ++ .../scala/org/apache/carbondata/spark/KeyVal.scala | 10 +++ .../apache/spark/util/CarbonReflectionUtils.scala | 16 +++ .../apache/spark/sql/CarbonCatalystOperators.scala | 6 ++-- .../mutation/CarbonProjectForDeleteCommand.scala | 21 ++ .../mutation/CarbonProjectForUpdateCommand.scala | 19 - .../command/mutation/DeleteExecution.scala | 27 ++ .../spark/sql/hive/CarbonAnalysisRules.scala | 12 +++- 9 files changed, 129 insertions(+), 34 deletions(-) diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/DeleteCarbonTableTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/DeleteCarbonTableTestCase.scala index f26283b..4565d7a 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/DeleteCarbonTableTestCase.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/DeleteCarbonTableTestCase.scala @@ -361,6 +361,25 @@ class DeleteCarbonTableTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table if exists decimal_table") } + test("[CARBONDATA-3491] Return updated/deleted rows count when execute update/delete sql") { +sql("drop table if exists test_return_row_count") + +sql("create table test_return_row_count (a string, b string, c string) stored by 'carbondata'").show() +sql("insert into test_return_row_count select 'aaa','bbb','ccc'").show() +sql("insert into test_return_row_count select 'bbb','bbb','ccc'").show() +sql("insert into test_return_row_count select 'ccc','bbb','ccc'").show() +sql("insert into test_return_row_count select 'ccc','bbb','ccc'").show() + +checkAnswer(sql("delete from test_return_row_count where a = 'aaa'"), +Seq(Row(1)) +) +checkAnswer(sql("select * from test_return_row_count"), +Seq(Row("bbb", "bbb", "ccc"), Row("ccc", "bbb", "ccc"), Row("ccc", "bbb", "ccc")) +) + +sql("drop table if exists test_return_row_count").show() + } + override def afterAll { sql("use default") sql("drop database if exists iud_db cascade") diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala index cf45600..ef18035 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/UpdateCarbonTableTestCase.scala @@ -826,6 +826,39 @@ class UpdateCarbonTableTestCase extends QueryTest with BeforeAndAfterAll { sql("""drop table iud.dest11""").show } + test("[CARBONDATA-3491] Return updated/deleted rows count when execute update/delete sql") { +sql("drop table if exists test_return_row_count") +sql("drop table if exists test_return_row_count_source") + +sql("create table test_return_row_count (a string, b string, c string) stored by 'carbondata'").show() +sql("insert into test_return_row_count select 'bbb','bbb','ccc'").show() +sql("insert into test_return_row_count select 'ccc','bbb','ccc'").show() +sql("insert into test_return_row_count select 'ccc','bbb','ccc'").show() + +sql("create table test_return_row_count_source (a string, b string, c string) stored by 'carbondata'").show() +sql("insert into test_return_row_count_source select 'aaa','eee','ccc'").show() +sql("insert into test_return_row_count_source select 'bbb','bbb','ccc'").show() +sql("insert into test_return_row_count_source select 'ccc','bbb','ccc'").show() +sql("insert into test_return_row_count_source select 'ccc','bbb','ccc'").show() + +checkAnswer
[carbondata] 08/33: [CARBONDATA-3499] Fix insert failure with customFileProvider
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 1943ddae258c754ab11c466557877378fb9a748e Author: ajantha-bhat AuthorDate: Thu Aug 22 17:42:12 2019 +0530 [CARBONDATA-3499] Fix insert failure with customFileProvider Problem: Below exception is thrown when the custom file system is used with first time insert randomly. IllegalArgumentException("Path belongs to unsupported file system") from FileFactory.getFileType() Cause: DefaultFileTypeProvider.initializeCustomFileProvider is called concurrently during insert. Hence one thread got the provider and other thread didn't get as flag is set to true. so other thread failed as it tried with default provider. Solution: synchronize the initialization of custom file provider. This closes #3362 --- .../datastore/impl/DefaultFileTypeProvider.java| 31 +- 1 file changed, 19 insertions(+), 12 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/impl/DefaultFileTypeProvider.java b/core/src/main/java/org/apache/carbondata/core/datastore/impl/DefaultFileTypeProvider.java index cdb1a20..4572cc4 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/impl/DefaultFileTypeProvider.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/impl/DefaultFileTypeProvider.java @@ -43,7 +43,9 @@ public class DefaultFileTypeProvider implements FileTypeInterface { */ protected FileTypeInterface customFileTypeProvider = null; - protected boolean customFileTypeProviderInitialized = false; + protected Boolean customFileTypeProviderInitialized = false; + + private final Object lock = new Object(); public DefaultFileTypeProvider() { } @@ -52,17 +54,22 @@ public class DefaultFileTypeProvider implements FileTypeInterface { * This method is required apart from Constructor to handle the below circular dependency. * CarbonProperties-->FileFactory-->DefaultTypeProvider-->CarbonProperties */ - private void initializeCustomFileprovider() { + private void initializeCustomFileProvider() { if (!customFileTypeProviderInitialized) { - customFileTypeProviderInitialized = true; - String customFileProvider = - CarbonProperties.getInstance().getProperty(CarbonCommonConstants.CUSTOM_FILE_PROVIDER); - if (customFileProvider != null && !customFileProvider.trim().isEmpty()) { -try { - customFileTypeProvider = - (FileTypeInterface) Class.forName(customFileProvider).newInstance(); -} catch (Exception e) { - LOGGER.error("Unable load configured FileTypeInterface class. Ignored.", e); + // This initialization can happen in concurrent threads. + synchronized (lock) { +if (!customFileTypeProviderInitialized) { + String customFileProvider = CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CUSTOM_FILE_PROVIDER); + if (customFileProvider != null && !customFileProvider.trim().isEmpty()) { +try { + customFileTypeProvider = + (FileTypeInterface) Class.forName(customFileProvider).newInstance(); +} catch (Exception e) { + LOGGER.error("Unable load configured FileTypeInterface class. Ignored.", e); +} +customFileTypeProviderInitialized = true; + } } } } @@ -77,7 +84,7 @@ public class DefaultFileTypeProvider implements FileTypeInterface { * @return true if supported by the custom */ @Override public boolean isPathSupported(String path) { -initializeCustomFileprovider(); +initializeCustomFileProvider(); if (customFileTypeProvider != null) { return customFileTypeProvider.isPathSupported(path); }
[carbondata] 33/33: [CARBONDATA-3526]Fix cache issue during update and query
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit be9580b5768389d6adf0392fc820fb7e7186bd4c Author: akashrn5 AuthorDate: Thu Sep 12 14:30:48 2019 +0530 [CARBONDATA-3526]Fix cache issue during update and query Problem: When multiple updates happen on table, cache is loaded during update operation, but since on second update the horizontal compaction happens inside the segment, already loaded into cache are invalid. So if we do clean files, physical deletion of horizontal compacted takes place, but still the cache contains old files. So when select query is fired, query fails with file not found exception. Solution: once after horizontal compaction is finished, new compacted files are generated, so the segments inside cache are now invalid, so clear the cache of invalid segment after horizontal compaction. During drop cache command, clear the cache of segmentMap also. This closes #3385 --- .../sql/execution/command/cache/CarbonDropCacheCommand.scala | 8 +++- .../sql/execution/command/mutation/HorizontalCompaction.scala| 9 - 2 files changed, 11 insertions(+), 6 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonDropCacheCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonDropCacheCommand.scala index 1554f6a..7b8e10f 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonDropCacheCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonDropCacheCommand.scala @@ -25,7 +25,7 @@ import org.apache.spark.sql.execution.command.MetadataCommand import org.apache.carbondata.common.logging.LogServiceFactory import org.apache.carbondata.core.cache.CacheProvider -import org.apache.carbondata.core.datamap.DataMapUtil +import org.apache.carbondata.core.datamap.{DataMapStoreManager, DataMapUtil} import org.apache.carbondata.core.metadata.schema.table.CarbonTable import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.events.{DropTableCacheEvent, OperationContext, OperationListenerBus} @@ -55,13 +55,11 @@ case class CarbonDropCacheCommand(tableIdentifier: TableIdentifier, internalCall carbonTable.getTableName)) { DataMapUtil.executeClearDataMapJob(carbonTable, DataMapUtil.DISTRIBUTED_JOB_NAME) } else { -val allIndexFiles = CacheUtil.getAllIndexFiles(carbonTable)(sparkSession) // Extract dictionary keys for the table and create cache keys from those val dictKeys: List[String] = CacheUtil.getAllDictCacheKeys(carbonTable) - // Remove elements from cache -val keysToRemove = allIndexFiles ++ dictKeys -cache.removeAll(keysToRemove.asJava) +cache.removeAll(dictKeys.asJava) + DataMapStoreManager.getInstance().clearDataMaps(carbonTable.getAbsoluteTableIdentifier) } } LOGGER.info("Drop cache request served for table " + carbonTable.getTableUniqueName) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala index fb20e4f..62a3486 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/mutation/HorizontalCompaction.scala @@ -28,7 +28,7 @@ import org.apache.spark.sql.execution.command.management.CarbonAlterTableCompact import org.apache.spark.sql.util.SparkSQLUtil import org.apache.carbondata.common.logging.LogServiceFactory -import org.apache.carbondata.core.datamap.Segment +import org.apache.carbondata.core.datamap.{DataMapStoreManager, Segment} import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier import org.apache.carbondata.core.metadata.schema.table.CarbonTable import org.apache.carbondata.core.statusmanager.SegmentUpdateStatusManager @@ -106,6 +106,13 @@ object HorizontalCompaction { segmentUpdateStatusManager, deleteTimeStamp, segLists) + +// If there are already index and data files are present for old update operation, then the +// cache will be loaded for those files during current update, but once after horizontal +// compaction is finished, new compacted files are generated, so the segments inside cache are +// now invalid, so clear the cache of invalid segment after horizontal compaction. +DataMapStoreManager.getInstance() + .clearInvalidSegments(carbonTable, segLists.asScala.map(_.getSegmentNo).asJava) } /**
[carbondata] 17/33: [HOTFIX] Fix NPE on windows
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 631941b566ba015921ef2cdc89434f35d88aef5f Author: Manhua AuthorDate: Thu Sep 5 21:13:28 2019 +0800 [HOTFIX] Fix NPE on windows Analyse carbon index files are merged but SegmentFile did not update, so it fails to get any default datamap for pruning. The reason for no updated is about path comparison like /[your_path_here]/examples/spark2/target/store/default/source/\Fact\Part0\Segment_0 vs D:/[your_path_here]/examples/spark2/target/store/default/source/Fact/Part0/Segment_0 Solution use the AbsoluteTableIdentifier of table from CarbonMetadata instead of the newly created object, keep the path style same This closes #3377 --- .../main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala| 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala index 684bcbb..900b69c 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala @@ -151,8 +151,8 @@ class CarbonFileMetastore extends CarbonMetaStore { val tables = Option(CarbonMetadata.getInstance.getCarbonTable(database, tableName)) tables match { case Some(t) => -if (isSchemaRefreshed(absIdentifier, sparkSession)) { - readCarbonSchema(absIdentifier, parameters) +if (isSchemaRefreshed(t.getAbsoluteTableIdentifier, sparkSession)) { + readCarbonSchema(t.getAbsoluteTableIdentifier, parameters) } else { CarbonRelation(database, tableName, CarbonSparkUtil.createSparkMeta(t), t) }
[carbondata] 21/33: [CARBONDATA-3454] optimized index server output for count(*)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 43a8086e15ec34648c02f9022975e683fc139f25 Author: kunal642 AuthorDate: Thu Jun 27 14:32:11 2019 +0530 [CARBONDATA-3454] optimized index server output for count(*) Optimised the output for count(*) queries so that only a long is send back to the driver to reduce the network transfer cost for index server This closes #3308 --- .../apache/carbondata/core/datamap/DataMapJob.java | 2 + .../carbondata/core/datamap/DataMapUtil.java | 13 ++- .../core/datamap/DistributableDataMapFormat.java | 34 +-- .../core/indexstore/ExtendedBlocklet.java | 68 - .../core/indexstore/ExtendedBlockletWrapper.java | 27 +++-- .../ExtendedBlockletWrapperContainer.java | 19 ++-- .../carbondata/hadoop/api/CarbonInputFormat.java | 52 -- .../hadoop/api/CarbonTableInputFormat.java | 22 ++-- .../carbondata/indexserver/DataMapJobs.scala | 15 ++- .../indexserver/DistributedCountRDD.scala | 111 + .../indexserver/DistributedPruneRDD.scala | 29 ++ .../indexserver/DistributedRDDUtils.scala | 13 +++ .../carbondata/indexserver/IndexServer.scala | 19 13 files changed, 319 insertions(+), 105 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java index 9eafe7c..326282d 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java @@ -35,4 +35,6 @@ public interface DataMapJob extends Serializable { List execute(DistributableDataMapFormat dataMapFormat); + Long executeCountJob(DistributableDataMapFormat dataMapFormat); + } diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java index dd9debc..bca7409 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java @@ -230,7 +230,7 @@ public class DataMapUtil { List validSegments, List invalidSegments, DataMapLevel level, List segmentsToBeRefreshed) throws IOException { return executeDataMapJob(carbonTable, resolver, dataMapJob, partitionsToPrune, validSegments, -invalidSegments, level, false, segmentsToBeRefreshed); +invalidSegments, level, false, segmentsToBeRefreshed, false); } /** @@ -241,7 +241,8 @@ public class DataMapUtil { public static List executeDataMapJob(CarbonTable carbonTable, FilterResolverIntf resolver, DataMapJob dataMapJob, List partitionsToPrune, List validSegments, List invalidSegments, DataMapLevel level, - Boolean isFallbackJob, List segmentsToBeRefreshed) throws IOException { + Boolean isFallbackJob, List segmentsToBeRefreshed, boolean isCountJob) + throws IOException { List invalidSegmentNo = new ArrayList<>(); for (Segment segment : invalidSegments) { invalidSegmentNo.add(segment.getSegmentNo()); @@ -250,9 +251,11 @@ public class DataMapUtil { DistributableDataMapFormat dataMapFormat = new DistributableDataMapFormat(carbonTable, resolver, validSegments, invalidSegmentNo, partitionsToPrune, false, level, isFallbackJob); -List prunedBlocklets = dataMapJob.execute(dataMapFormat); -// Apply expression on the blocklets. -return prunedBlocklets; +if (isCountJob) { + dataMapFormat.setCountStarJob(); + dataMapFormat.setIsWriteToFile(false); +} +return dataMapJob.execute(dataMapFormat); } public static SegmentStatusManager.ValidAndInvalidSegmentsInfo getValidAndInvalidSegments( diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java b/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java index 8426fcb..b430c5d 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java @@ -28,7 +28,6 @@ import java.util.UUID; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.constants.CarbonCommonConstants; -import org.apache.carbondata.core.datamap.dev.DataMap; import org.apache.carbondata.core.datamap.dev.expr.DataMapDistributableWrapper; import org.apache.carbondata.core.datastore.impl.FileFactory; import org.apache.carbondata.core.indexstore.ExtendedBlocklet; @@ -91,6 +90,8 @@ public class DistributableDataMapFormat extends FileInputFormat validSegments, List invalidSegments, List part
[carbondata] 14/33: [CARBONDATA-3495] Fix Insert into Complex data type of Binary failure with Carbon & SparkFileFormat
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 3aea196715a9954e38dccdc50824e8f6a0a75de3 Author: Indhumathi27 AuthorDate: Thu Aug 22 16:01:49 2019 +0530 [CARBONDATA-3495] Fix Insert into Complex data type of Binary failure with Carbon & SparkFileFormat Problem: Insert into Complex data type(Array/Struct/Map) of binary data type fails with Invalid data type name, because Binary with complex data types is not handled Solution: Handle Binary data type to work with complex data types This closes #3361 --- .../core/datastore/page/ComplexColumnPage.java | 1 + .../apache/carbondata/core/util/DataTypeUtil.java | 3 + .../src/test/resources/complexbinary.csv | 3 + .../complexType/TestComplexDataType.scala | 114 + .../spark/sql/catalyst/CarbonDDLSqlParser.scala| 2 + .../SparkCarbonDataSourceBinaryTest.scala | 88 .../processing/datatypes/PrimitiveDataType.java| 3 + .../org/apache/carbondata/sdk/file/ImageTest.java | 41 8 files changed, 255 insertions(+) diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/ComplexColumnPage.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/ComplexColumnPage.java index 921ae50..c4f8849 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/ComplexColumnPage.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/ComplexColumnPage.java @@ -124,6 +124,7 @@ public class ComplexColumnPage { DataTypes.isMapType(dataType) || (dataType == DataTypes.STRING) || (dataType == DataTypes.VARCHAR) || +(dataType == DataTypes.BINARY) || (dataType == DataTypes.DATE) || DataTypes.isDecimal(dataType) { // For all these above condition the ColumnPage should be Taken as BYTE_ARRAY diff --git a/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java b/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java index 9aea579..adb63cd 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java @@ -530,6 +530,7 @@ public final class DataTypeUtil { public static boolean isFixedSizeDataType(DataType dataType) { if (dataType == DataTypes.STRING || dataType == DataTypes.VARCHAR || +dataType == DataTypes.BINARY || DataTypes.isDecimal(dataType)) { return false; } else { @@ -1019,6 +1020,8 @@ public final class DataTypeUtil { return DataTypes.BYTE_ARRAY; } else if (DataTypes.BYTE_ARRAY.getName().equalsIgnoreCase(name)) { return DataTypes.BYTE_ARRAY; +} else if (DataTypes.BINARY.getName().equalsIgnoreCase(name)) { + return DataTypes.BINARY; } else if (name.equalsIgnoreCase("decimal")) { return DataTypes.createDefaultDecimalType(); } else if (name.equalsIgnoreCase("array")) { diff --git a/integration/spark-common-test/src/test/resources/complexbinary.csv b/integration/spark-common-test/src/test/resources/complexbinary.csv new file mode 100644 index 000..3870f5f --- /dev/null +++ b/integration/spark-common-test/src/test/resources/complexbinary.csv @@ -0,0 +1,3 @@ +1,true,abc,binary1$binary2,binary1,1 +2,false,abcd,binary11$binary12,binary11,1 +3,true,abcde,binary13$binary13,binary13,1 \ No newline at end of file diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala index b5f77c2..9d6b4d1 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestComplexDataType.scala @@ -1013,4 +1013,118 @@ class TestComplexDataType extends QueryTest with BeforeAndAfterAll { checkAnswer(sql("select id,name,structField.intval,name,structField.stringval from table1"),Seq(Row(null,"aaa",23,"aaa","bb"))) } + test("test array of binary data type") { +sql("drop table if exists carbon_table") +sql("drop table if exists hive_table") +sql("create table if not exists hive_table(id int, label boolean, name string," + +"binaryField array, autoLabel boolean) row format delimited fields terminated by ','") +sql("insert into hive_table values(1,true,'abc',array('binary'),false)") +
[carbondata] 24/33: [HOTFIX] Remove duplicate case for BYTE_ARRAY
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 336848365c06cc10d6bb39691b35f657be565c10 Author: Manhua AuthorDate: Fri Sep 20 14:26:48 2019 +0800 [HOTFIX] Remove duplicate case for BYTE_ARRAY This closes #3396 --- core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java | 4 1 file changed, 4 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java b/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java index adb63cd..3e0edb1 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java @@ -1018,8 +1018,6 @@ public final class DataTypeUtil { return DataTypes.NULL; } else if (DataTypes.BYTE_ARRAY.getName().equalsIgnoreCase(name)) { return DataTypes.BYTE_ARRAY; -} else if (DataTypes.BYTE_ARRAY.getName().equalsIgnoreCase(name)) { - return DataTypes.BYTE_ARRAY; } else if (DataTypes.BINARY.getName().equalsIgnoreCase(name)) { return DataTypes.BINARY; } else if (name.equalsIgnoreCase("decimal")) { @@ -1070,8 +1068,6 @@ public final class DataTypeUtil { return DataTypes.NULL; } else if (DataTypes.BYTE_ARRAY.getName().equalsIgnoreCase(dataType.getName())) { return DataTypes.BYTE_ARRAY; -} else if (DataTypes.BYTE_ARRAY.getName().equalsIgnoreCase(dataType.getName())) { - return DataTypes.BYTE_ARRAY; } else if (DataTypes.BINARY.getName().equalsIgnoreCase(dataType.getName())) { return DataTypes.BINARY; } else if (dataType.getName().equalsIgnoreCase("decimal")) {
[carbondata] 02/33: [CARBONDATA-3494]Fix NullPointerException in drop table and Correct the document formatting
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 7571b99ecb257956a05528a0d2910f66f4fd7daf Author: akashrn5 AuthorDate: Thu Aug 15 17:56:57 2019 +0530 [CARBONDATA-3494]Fix NullPointerException in drop table and Correct the document formatting This closes #Problem: Fix the formatting of the document in index server md file. drop table is calling drop datamap command with force drop as true. Due to this table is removed from meta and physically. Then when processData is called for drop table, it tried to create carbonTable object by reading schema which causes NullPointerException. Solution: correct the formatting Skip ProcessData if carbonTable is null This closes #3359 --- .../org/apache/carbondata/core/datamap/DataMapStoreManager.java| 7 ++- docs/index-server.md | 6 ++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java index ce0d6a6..f1f48fa 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java @@ -598,6 +598,11 @@ public final class DataMapStoreManager { */ public void deleteDataMap(AbsoluteTableIdentifier identifier, String dataMapName) { CarbonTable carbonTable = getCarbonTable(identifier); +if (carbonTable == null) { + // If carbon table is null then it means table is already deleted, therefore return without + // doing any further changes. + return; +} String tableUniqueName = identifier.getCarbonTableIdentifier().getTableUniqueName(); if (CarbonProperties.getInstance() .isDistributedPruningEnabled(identifier.getDatabaseName(), identifier.getTableName())) { @@ -613,7 +618,7 @@ public final class DataMapStoreManager { if (tableIndices != null) { int i = 0; for (TableDataMap tableDataMap : tableIndices) { - if (carbonTable != null && tableDataMap != null && dataMapName + if (tableDataMap != null && dataMapName .equalsIgnoreCase(tableDataMap.getDataMapSchema().getDataMapName())) { try { DataMapUtil diff --git a/docs/index-server.md b/docs/index-server.md index 5dd15c5..9253f2a 100644 --- a/docs/index-server.md +++ b/docs/index-server.md @@ -136,11 +136,9 @@ The Index Server is a long running service therefore the 'spark.yarn.keytab' and | Name | Default Value| Description | |:--:|:-:|:--: | | carbon.enable.index.server | false | Enable the use of index server for pruning for the whole application. | -| carbon.index.server.ip |NA | Specify the IP/HOST on which the server is started. Better to - specify the private IP. | +| carbon.index.server.ip |NA | Specify the IP/HOST on which the server is started. Better to specify the private IP. | | carbon.index.server.port | NA | The port on which the index server is started. | -| carbon.disable.index.server.fallback | false | Whether to enable/disable fallback for index server -. Should be used for testing purposes only. Refer: [Fallback](#Fallback)| +| carbon.disable.index.server.fallback | false | Whether to enable/disable fallback for index server. Should be used for testing purposes only. Refer: [Fallback](#Fallback)| |carbon.index.server.max.jobname.length|NA|The max length of the job to show in the index server service UI. For bigger queries this may impact performance as the whole string would be sent from JDBCServer to IndexServer.|
[carbondata] branch branch-1.6 updated (72169e5 -> be9580b)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git. from 72169e5 [maven-release-plugin] prepare for next development iteration new 56879f7 [CARBONDATA-3480] Fixed unnecessary refresh for table by removing modified mdt file new 7571b99 [CARBONDATA-3494]Fix NullPointerException in drop table and Correct the document formatting new 90b6c64 [CARBONDATA-3493] Initialize Profiler in CarbonEnv new 019c777 [CARBONDATA-3466] Fix NPE for carboncli command new 6c9bbfe [CARBONDATA-3491] Return updated/deleted rows count when execute update/delete sql new 129b163 [CARBONDATA-3509] Support disable query prefetch by configuration new dff8ab3 [HOTFIX] Remove hive-service from carbondata assembly jar new 1943dda [CARBONDATA-3499] Fix insert failure with customFileProvider new 75e207c [CARBONDATA-3502] Select query with UDF having Match expression inside IN expression Fails new 2328707 [CARBONDATA-3505] Drop database cascade fix new 99e0c7c [CARBONDATA-3497] Support to write long string for streaming table new 41ae280 [CARBONDATA-3513] fix 'taskNo' exceeding Long.MAX_VALUE issue when execute major compaction new 6f90b28 [CARBONDATA-3452] dictionary include udf handle all the scenarios new 3aea196 [CARBONDATA-3495] Fix Insert into Complex data type of Binary failure with Carbon & SparkFileFormat new d509cd1 [CARBONDATA-3507] Fix Create Table As Select Failure in Spark-2.3 new 0e2d3e2 [CARBONDATA-3508] Support CG datamap pruning fallback while querying new 631941b [HOTFIX] Fix NPE on windows new ef26a4a [CARBONDATA-3506]Fix alter table failures on parition table with hive.metastore.disallow.incompatible.col.type.changes as true new f750b6f [CARBONDATA-3515] Limit local dictionary size to 16MB and allow configuration. new 204f290 [HOTFIX] fix missing quotation marks in datamap doc new 43a8086 [CARBONDATA-3454] optimized index server output for count(*) new 8ffbc1d [HOTFIX] fix incorrect word in index-server doc new 9e46647 [CARBONDATA-3489] Optimized the comparator instances in sort new 3368483 [HOTFIX] Remove duplicate case for BYTE_ARRAY new ab86705 [DOC] Update doc for alter sort_columns new de81b38 [HOTFIX] Fix wrong min/max index of measure new 57309d7 [CARBONDATA-3473] Fix data size calcution of the last column in CarbonCli new 21bbc4a [CARBONDATA-3520] CTAS should fail if select query contains duplicate columns new 85dc030 [HOTFIX]Update Documentation for MV datamap new 49e9ea3 [CARBONDATA-3523] Store data file size into index file new 9342545 [CARBONDATA-3527] Fix 'String length cannot exceed 32000 characters' issue when load data with 'GLOBAL_SORT' from csv files which include big complex type data new 00d2fe9 [CARBONDATA-3501] Fix update table with varchar column new be9580b [CARBONDATA-3526]Fix cache issue during update and query The 33 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: README.md | 1 + assembly/pom.xml | 1 + .../core/constants/CarbonCommonConstants.java | 20 + .../carbondata/core/datamap/DataMapFilter.java | 47 ++ .../apache/carbondata/core/datamap/DataMapJob.java | 2 + .../core/datamap/DataMapStoreManager.java | 21 +- .../carbondata/core/datamap/DataMapUtil.java | 13 +- .../core/datamap/DistributableDataMapFormat.java | 34 +- .../core/datastore/block/TableBlockInfo.java | 13 + .../datastore/impl/DefaultFileTypeProvider.java| 31 +- .../core/datastore/page/ComplexColumnPage.java | 1 + .../core/indexstore/BlockletDataMapIndexStore.java | 2 +- .../core/indexstore/ExtendedBlocklet.java | 68 ++- .../core/indexstore/ExtendedBlockletWrapper.java | 27 +- .../ExtendedBlockletWrapperContainer.java | 19 +- .../dictionaryholder/MapBasedDictionaryStore.java | 16 +- .../carbondata/core/metadata/CarbonMetadata.java | 9 + .../core/metadata/index/BlockIndexInfo.java| 18 + .../core/metadata/schema/table/CarbonTable.java| 4 +- .../core/metadata/schema/table/TableSchema.java| 4 + .../scan/executor/impl/AbstractQueryExecutor.java | 2 + .../carbondata/core/scan/model/QueryModel.java | 4 +- .../statusmanager/SegmentUpdateStatusManager.java | 32 +- .../core/util/AbstractDataFileFooterConverter.java | 3 + .../carbondata/core/util/BlockletDataMapUtil.java | 17 +- .../carbondata/core/util/CarbonMetadataUtil.java | 65 +--
[carbondata] 19/33: [CARBONDATA-3515] Limit local dictionary size to 16MB and allow configuration.
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit f750b6f210ba87923793631f6b4a2cc4f7dbdd3d Author: ajantha-bhat AuthorDate: Tue Sep 10 10:48:26 2019 +0530 [CARBONDATA-3515] Limit local dictionary size to 16MB and allow configuration. problem: currently local dictionary max size is 2GB, because of this, for varchar columns or long string columns, local dictionary can be of 2GB size. so, as local dictionary is stored in blocklet. blocklet size will exceed 2 GB, even though configured maximum blocklet size is 64MB. some places inter overflow happens during casting. solution: Limit local dictionary size to 16MB and allow configuration. default size is 4MB This closes #3380 --- .../core/constants/CarbonCommonConstants.java | 11 ++ .../dictionaryholder/MapBasedDictionaryStore.java | 16 ++-- .../carbondata/core/util/CarbonProperties.java | 43 ++ docs/configuration-parameters.md | 1 + 4 files changed, 68 insertions(+), 3 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 67fa13f..ac77582 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1209,6 +1209,17 @@ public final class CarbonCommonConstants { public static final String CARBON_ENABLE_RANGE_COMPACTION_DEFAULT = "true"; + @CarbonProperty + /** + * size based threshold for local dictionary in mb. + */ + public static final String CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB = + "carbon.local.dictionary.size.threshold.inmb"; + + public static final int CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB_DEFAULT = 4; + + public static final int CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB_MAX = 16; + // // Query parameter start here // diff --git a/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java b/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java index 7b8617a..0a50451 100644 --- a/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java +++ b/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java @@ -20,7 +20,9 @@ import java.util.Map; import java.util.concurrent.ConcurrentHashMap; import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper; +import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.localdictionary.exception.DictionaryThresholdReachedException; +import org.apache.carbondata.core.util.CarbonProperties; /** * Map based dictionary holder class, it will use map to hold @@ -51,6 +53,11 @@ public class MapBasedDictionaryStore implements DictionaryStore { private int dictionaryThreshold; /** + * dictionary threshold size in bytes + */ + private long dictionarySizeThresholdInBytes; + + /** * for checking threshold is reached or not */ private boolean isThresholdReached; @@ -62,6 +69,8 @@ public class MapBasedDictionaryStore implements DictionaryStore { public MapBasedDictionaryStore(int dictionaryThreshold) { this.dictionaryThreshold = dictionaryThreshold; +this.dictionarySizeThresholdInBytes = Integer.parseInt(CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB)) << 20; this.dictionary = new ConcurrentHashMap<>(); this.referenceDictionaryArray = new DictionaryByteArrayWrapper[dictionaryThreshold]; } @@ -93,7 +102,7 @@ public class MapBasedDictionaryStore implements DictionaryStore { value = ++lastAssignValue; currentSize += data.length; // if new value is greater than threshold - if (value > dictionaryThreshold || currentSize >= Integer.MAX_VALUE) { + if (value > dictionaryThreshold || currentSize > dictionarySizeThresholdInBytes) { // set the threshold boolean to true isThresholdReached = true; // throw exception @@ -111,9 +120,10 @@ public class MapBasedDictionaryStore implements DictionaryStore { private void checkIfThresholdReached() throws DictionaryThresholdReachedException { if (isThresholdReached) { - if (currentSize &g
[carbondata] 29/33: [HOTFIX]Update Documentation for MV datamap
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 85dc0304743e74c7abc00d59d4d4b5e5f619d03e Author: Indhumathi27 AuthorDate: Thu Jul 25 17:02:26 2019 +0530 [HOTFIX]Update Documentation for MV datamap This closes #3335 --- README.md| 1 + docs/datamap/mv-datamap-guide.md | 2 ++ 2 files changed, 3 insertions(+) diff --git a/README.md b/README.md index 3226a30..da5b547 100644 --- a/README.md +++ b/README.md @@ -60,6 +60,7 @@ CarbonData is built using Apache Maven, to [build CarbonData](https://github.com * [CarbonData Lucene DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/lucene-datamap-guide.md) * [CarbonData Pre-aggregate DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md) * [CarbonData Timeseries DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/timeseries-datamap-guide.md) + * [CarbonData MV DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/mv-datamap-guide.md) * [SDK Guide](https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md) * [C++ SDK Guide](https://github.com/apache/carbondata/blob/master/docs/csdk-guide.md) * [Performance Tuning](https://github.com/apache/carbondata/blob/master/docs/performance-tuning.md) diff --git a/docs/datamap/mv-datamap-guide.md b/docs/datamap/mv-datamap-guide.md index d22357c..fc1ffd5 100644 --- a/docs/datamap/mv-datamap-guide.md +++ b/docs/datamap/mv-datamap-guide.md @@ -65,6 +65,7 @@ EXPLAIN SELECT a, sum(b) from maintable group by a; CREATE DATAMAP agg_sales ON TABLE sales USING "MV" + DMPROPERTIES('TABLE_BLOCKSIZE'='256 MB','LOCAL_DICTIONARY_ENABLE'='false') AS SELECT country, sex, sum(quantity), avg(price) FROM sales @@ -97,6 +98,7 @@ EXPLAIN SELECT a, sum(b) from maintable group by a; property is inherited from parent table, which allows user to provide different tableproperties for child table * MV creation with limit or union all ctas queries is unsupported + * MV datamap does not support Streaming How MV tables are selected
[carbondata] 32/33: [CARBONDATA-3501] Fix update table with varchar column
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 00d2fe930e713958e052e3a738616a852f1dbfe7 Author: Manhua AuthorDate: Wed Sep 25 11:47:01 2019 +0800 [CARBONDATA-3501] Fix update table with varchar column Problem Update on table with varchar column will throw exception Analyse In the loading part of update operation, it gets the isVarcharTypeMapping for each column in the order when table created. And this gives a hint for checking string length. It does not allow to exceeds 32000 char for a column which is not varchar type. However when changing the plan for updating in CarbonIUDRule, it first deletes the old expression and appends the new one, which makes the order differ to table created. Such that the string length checking fail. Solution Keep the order as table created when modify update plan This closes #3398 --- .../longstring/VarcharDataTypesBasicTestCase.scala | 10 ++ .../command/management/CarbonLoadDataCommand.scala | 2 +- .../org/apache/spark/sql/hive/CarbonAnalysisRules.scala | 4 ++-- .../org/apache/spark/sql/optimizer/CarbonIUDRule.scala | 17 ++--- 4 files changed, 27 insertions(+), 6 deletions(-) diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala index 4fd2cc0..9719cfc 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala @@ -389,6 +389,16 @@ class VarcharDataTypesBasicTestCase extends QueryTest with BeforeAndAfterEach wi sql("DROP TABLE IF EXISTS varchar_complex_table") } + + test("update table with long string column") { +prepareTable() +// update non-varchar column +sql(s"update $longStringTable set(id)=(0) where name is not null").show() +// update varchar column +sql(s"update $longStringTable set(description)=('empty') where name is not null").show() +// update non-varchar column +sql(s"update $longStringTable set(description, id)=('sth.', 1) where name is not null").show() + } // ignore this test in CI, because it will need at least 4GB memory to run successfully ignore("Exceed 2GB per column page for varchar datatype") { diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala index 6a03eab..b2f9a1e 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala @@ -1060,7 +1060,7 @@ case class CarbonLoadDataCommand( val dropAttributes = df.logicalPlan.output.dropRight(1) val finalOutput = catalogTable.schema.map { attr => dropAttributes.find { d => -val index = d.name.lastIndexOf("-updatedColumn") +val index = d.name.lastIndexOf(CarbonCommonConstants.UPDATED_COL_EXTENSION) if (index > 0) { d.name.substring(0, index).equalsIgnoreCase(attr.name) } else { diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala index 9b923b0..d11bf1e 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonAnalysisRules.scala @@ -122,9 +122,9 @@ case class CarbonIUDAnalysisRule(sparkSession: SparkSession) extends Rule[Logica val renamedProjectList = projectList.zip(columns).map { case (attr, col) => attr match { case UnresolvedAlias(child22, _) => - UnresolvedAlias(Alias(child22, col + "-updatedColumn")()) + UnresolvedAlias(Alias(child22, col + CarbonCommonConstants.UPDATED_COL_EXTENSION)()) case UnresolvedAttribute(_) => - UnresolvedAlias(Alias(attr, col + "-updatedColumn")()) + UnresolvedAlias(Alias(attr, col + CarbonCommonConstants.UPDATED_COL_EXTENSION)()) case _ => attr } } diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/optimiz
[carbondata] 07/33: [HOTFIX] Remove hive-service from carbondata assembly jar
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit dff8ab38a986b388182a12f60dd4272659e1cbb6 Author: Zhang Zhichao <441586...@qq.com> AuthorDate: Wed Sep 4 14:36:06 2019 +0800 [HOTFIX] Remove hive-service from carbondata assembly jar Problem: In some environments, there will occur 'No Such Method: registerCurrentOperationLog' exception while execute sql on carbon thrift server. Cause: spark hive thrift module rewrite class 'org.apache.hive.service.cli.operation.ExecuteStatementOperation' and add method 'registerCurrentOperationLog' in it, but when start carbon thrift server, it maybe load class 'ExecuteStatementOperation' first from carbondata assembly jar (includes 'org.apache.hive:hive-service'), this class 'ExecuteStatementOperation' which is from hive-service jar doesn't have method 'registerCurrentOperationLog', so it throws NoSuchMethodException. Solution: remove all artifacts of 'org.apache.hive' when assemble carbondata jar. This closes #3373 --- assembly/pom.xml | 1 + 1 file changed, 1 insertion(+) diff --git a/assembly/pom.xml b/assembly/pom.xml index 12d7e6e..bf729c5 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -107,6 +107,7 @@ org.apache.spark:* org.apache.zookeeper:* org.apache.avro:* + org.apache.hive:* com.google.guava:guava org.xerial.snappy:snappy-java
svn commit: r35419 - /release/carbondata/1.6.0/
Author: ravipesala Date: Wed Aug 28 06:24:37 2019 New Revision: 35419 Log: Upload 1.6.0 Added: release/carbondata/1.6.0/ release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar.asc (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar.sha512 (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.7.2.jar (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.7.2.jar.asc (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.7.2.jar.sha512 (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.3.2-hadoop2.7.2.jar (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.3.2-hadoop2.7.2.jar.asc (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.3.2-hadoop2.7.2.jar.sha512 (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-source-release.asc (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-source-release.zip (with props) release/carbondata/1.6.0/apache-carbondata-1.6.0-source-release.zip.sha512 (with props) Added: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar == Binary file - no diff available. Propchange: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar -- svn:executable = * Propchange: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar -- svn:mime-type = application/octet-stream Added: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar.asc == --- release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar.asc (added) +++ release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar.asc Wed Aug 28 06:24:37 2019 @@ -0,0 +1,11 @@ +-BEGIN PGP SIGNATURE- + +iQEzBAEBCAAdFiEEsZE8naWI0MngB++fuw0pZv1r+vAFAl1eFMcACgkQuw0pZv1r ++vC2Ggf/XVkeWV+DUF4szeS1Aw4FDFAi/SncuxA4znoFvZjtbSf8aiaMyS0pe0K5 +OcSC6KsVrDKI/l1C298ezbn4WpMWhlQEunjIlX7etSzviS1zjAaP+rL3lL6CVMHt +9vbXuIMUotRb+XdyEocHvsisMIxzabCqvw/Vouz4kV+IjT35pDpo7Nn3g+MBclBh +1BiKcnQQZ1irBRN63LmaO/oV5IDpVcEouTXri+i0ZF0h/8zzGxFXJ8MHWay/3SjA +EWaHgHLsxWAz9UyO54T3XdqvRjU09EN5TmgmPP3QBHjOFBynCD1Op+dYvEJLxVfq +DDKe56YVPBH2o/3k0aW5PR/lw1RuNw== +=OXRW +-END PGP SIGNATURE- Propchange: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar.asc -- svn:executable = * Added: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar.sha512 == --- release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar.sha512 (added) +++ release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar.sha512 Wed Aug 28 06:24:37 2019 @@ -0,0 +1 @@ +47d50e9c13d8fd3191788d8bf46d23ef2be40181655dd740ddfd3d53d2e23802645ac978a9a1c69ec1fc6359eee16ac894626c56f5146bd7064e1ffd99009cc0 apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar Propchange: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.1.0-hadoop2.7.2.jar.sha512 -- svn:executable = * Added: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.7.2.jar == Binary file - no diff available. Propchange: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.7.2.jar -- svn:executable = * Propchange: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.7.2.jar -- svn:mime-type = application/octet-stream Added: release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.7.2.jar.asc == --- release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.7.2.jar.asc (added) +++ release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.7.2.jar.asc Wed Aug 28 06:24:37 2019 @@ -0,0 +1,11 @@ +-BEGIN PGP SIGNATURE- + +iQEzBAEBCAAdFiEEsZE8naWI0MngB++fuw0pZv1r+vAFAl1eFTYACgkQuw0pZv1r ++vCuJwf
[carbondata] branch master updated: [CARBONDATA-3494]Fix NullPointerException in drop table and Correct the document formatting
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 499489c [CARBONDATA-3494]Fix NullPointerException in drop table and Correct the document formatting 499489c is described below commit 499489c64d0624308a734f8898a9ef23f4773224 Author: akashrn5 AuthorDate: Thu Aug 15 17:56:57 2019 +0530 [CARBONDATA-3494]Fix NullPointerException in drop table and Correct the document formatting This closes #Problem: Fix the formatting of the document in index server md file. drop table is calling drop datamap command with force drop as true. Due to this table is removed from meta and physically. Then when processData is called for drop table, it tried to create carbonTable object by reading schema which causes NullPointerException. Solution: correct the formatting Skip ProcessData if carbonTable is null This closes #3359 --- .../org/apache/carbondata/core/datamap/DataMapStoreManager.java| 7 ++- docs/index-server.md | 6 ++ 2 files changed, 8 insertions(+), 5 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java index ce0d6a6..f1f48fa 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java @@ -598,6 +598,11 @@ public final class DataMapStoreManager { */ public void deleteDataMap(AbsoluteTableIdentifier identifier, String dataMapName) { CarbonTable carbonTable = getCarbonTable(identifier); +if (carbonTable == null) { + // If carbon table is null then it means table is already deleted, therefore return without + // doing any further changes. + return; +} String tableUniqueName = identifier.getCarbonTableIdentifier().getTableUniqueName(); if (CarbonProperties.getInstance() .isDistributedPruningEnabled(identifier.getDatabaseName(), identifier.getTableName())) { @@ -613,7 +618,7 @@ public final class DataMapStoreManager { if (tableIndices != null) { int i = 0; for (TableDataMap tableDataMap : tableIndices) { - if (carbonTable != null && tableDataMap != null && dataMapName + if (tableDataMap != null && dataMapName .equalsIgnoreCase(tableDataMap.getDataMapSchema().getDataMapName())) { try { DataMapUtil diff --git a/docs/index-server.md b/docs/index-server.md index 5dd15c5..9253f2a 100644 --- a/docs/index-server.md +++ b/docs/index-server.md @@ -136,11 +136,9 @@ The Index Server is a long running service therefore the 'spark.yarn.keytab' and | Name | Default Value| Description | |:--:|:-:|:--: | | carbon.enable.index.server | false | Enable the use of index server for pruning for the whole application. | -| carbon.index.server.ip |NA | Specify the IP/HOST on which the server is started. Better to - specify the private IP. | +| carbon.index.server.ip |NA | Specify the IP/HOST on which the server is started. Better to specify the private IP. | | carbon.index.server.port | NA | The port on which the index server is started. | -| carbon.disable.index.server.fallback | false | Whether to enable/disable fallback for index server -. Should be used for testing purposes only. Refer: [Fallback](#Fallback)| +| carbon.disable.index.server.fallback | false | Whether to enable/disable fallback for index server. Should be used for testing purposes only. Refer: [Fallback](#Fallback)| |carbon.index.server.max.jobname.length|NA|The max length of the job to show in the index server service UI. For bigger queries this may impact performance as the whole string would be sent from JDBCServer to IndexServer.|
[carbondata] branch master updated: [CARBONDATA-3480] Fixed unnecessary refresh for table by removing modified mdt file
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new a5344df [CARBONDATA-3480] Fixed unnecessary refresh for table by removing modified mdt file a5344df is described below commit a5344df2bfe20560324f9a0b1ef92051540e70d8 Author: kunal642 AuthorDate: Fri Jul 26 14:52:36 2019 +0530 [CARBONDATA-3480] Fixed unnecessary refresh for table by removing modified mdt file This closes #3339 --- .../carbondata/core/datamap/DataMapFilter.java | 47 +++ .../core/datamap/DataMapStoreManager.java | 14 +- .../carbondata/core/metadata/CarbonMetadata.java | 9 + .../core/metadata/schema/table/CarbonTable.java| 4 +- .../core/metadata/schema/table/TableSchema.java| 4 + .../statusmanager/SegmentUpdateStatusManager.java | 26 -- .../apache/carbondata/core/util/CarbonUtil.java| 1 - .../core/metadata/CarbonMetadataTest.java | 7 +- .../ThriftWrapperSchemaConverterImplTest.java | 4 +- .../metadata/schema/table/CarbonTableTest.java | 8 +- .../table/CarbonTableWithComplexTypesTest.java | 6 +- .../dblocation/DBLocationCarbonTableTestCase.scala | 25 -- .../apache/spark/sql/hive/CarbonSessionUtil.scala | 6 +- .../carbondata/indexserver/IndexServer.scala | 10 +- .../scala/org/apache/spark/sql/CarbonEnv.scala | 51 ++- .../command/datamap/CarbonDropDataMapCommand.scala | 1 - .../management/RefreshCarbonTableCommand.scala | 2 - .../CarbonAlterTableDropPartitionCommand.scala | 12 +- .../CarbonAlterTableSplitPartitionCommand.scala| 3 - .../command/preaaggregate/PreAggregateUtil.scala | 19 +- .../command/table/CarbonDropTableCommand.scala | 13 + .../spark/sql/hive/CarbonFileMetastore.scala | 425 + .../spark/sql/hive/CarbonHiveMetaStore.scala | 10 +- .../apache/spark/sql/hive/CarbonMetaStore.scala| 10 +- .../scala/org/apache/spark/util/CleanFiles.scala | 3 - .../scala/org/apache/spark/util/Compaction.scala | 2 - .../apache/spark/util/DeleteSegmentByDate.scala| 2 - .../org/apache/spark/util/DeleteSegmentById.scala | 2 - .../scala/org/apache/spark/util/TableLoader.scala | 2 - .../apache/spark/sql/hive/CarbonSessionState.scala | 31 +- .../AlterTableColumnRenameTestCase.scala | 4 +- 31 files changed, 322 insertions(+), 441 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java index c20d0d5..ac4886d 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java @@ -18,10 +18,15 @@ package org.apache.carbondata.core.datamap; import java.io.Serializable; +import java.util.HashSet; +import java.util.Set; import org.apache.carbondata.core.datastore.block.SegmentProperties; import org.apache.carbondata.core.metadata.schema.table.CarbonTable; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension; +import org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure; import org.apache.carbondata.core.scan.executor.util.RestructureUtil; +import org.apache.carbondata.core.scan.expression.ColumnExpression; import org.apache.carbondata.core.scan.expression.Expression; import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; @@ -39,9 +44,51 @@ public class DataMapFilter implements Serializable { public DataMapFilter(CarbonTable table, Expression expression) { this.table = table; this.expression = expression; +if (expression != null) { + checkIfFilterColumnExistsInTable(); +} resolve(); } + private Set extractColumnExpressions(Expression expression) { +Set columnExpressionList = new HashSet<>(); +for (Expression expressions: expression.getChildren()) { + if (expressions != null && expressions.getChildren() != null + && expressions.getChildren().size() > 0) { +columnExpressionList.addAll(extractColumnExpressions(expressions)); + } else if (expressions instanceof ColumnExpression) { +columnExpressionList.add(((ColumnExpression) expressions).getColumnName()); + } +} +return columnExpressionList; + } + + private void checkIfFilterColumnExistsInTable() { +Set columnExpressionList = extractColumnExpressions(expression); +for (String colExpression : columnExpressionList) { + if (colExpression.equalsIgnoreCase("positionid")) { +continue; + } + boolean exists = false; + for (CarbonMeasure carbonMeasure : table.getAllMeasures()) { +if (!carbonMeasure.isInvi
[carbondata] branch branch-1.6 updated (79b533f -> a73cadd)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git. discard 79b533f [maven-release-plugin] prepare release apache-CarbonData-1.6.0-rc2 This update removed existing revisions from the reference, leaving the reference pointing at a previous point in the repository history. * -- * -- N refs/heads/branch-1.6 (a73cadd) \ O -- O -- O (79b533f) Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: assembly/pom.xml | 2 +- common/pom.xml| 2 +- core/pom.xml | 2 +- datamap/bloom/pom.xml | 6 -- datamap/examples/pom.xml | 6 -- datamap/lucene/pom.xml| 6 -- datamap/mv/core/pom.xml | 2 +- datamap/mv/plan/pom.xml | 2 +- examples/flink/pom.xml| 2 +- examples/spark2/pom.xml | 2 +- format/pom.xml| 2 +- hadoop/pom.xml| 2 +- integration/hive/pom.xml | 2 +- integration/presto/pom.xml| 2 +- integration/spark-common-test/pom.xml | 14 +++--- integration/spark-common/pom.xml | 2 +- integration/spark-datasource/pom.xml | 2 +- integration/spark2/pom.xml| 2 +- pom.xml | 4 ++-- processing/pom.xml| 2 +- store/sdk/pom.xml | 6 -- streaming/pom.xml | 6 -- tools/cli/pom.xml | 6 -- 23 files changed, 48 insertions(+), 36 deletions(-)
[carbondata] branch branch-1.6 updated (61a5bd3 -> a73cadd)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git. from 61a5bd3 [HOTFIX] Fix dictionary include issue with codegen failure add d5db3f4 [CARBONDATA-3488] Check the file size after move local file to carbon path add a73cadd [CARBONDATA-3490] Fix concurrent data load failure with carbondata FileNotFound exception No new revisions were added by this update. Summary of changes: .../apache/carbondata/core/util/CarbonUtil.java| 22 +++--- .../apache/carbondata/spark/util/CommonUtil.scala | 7 +-- 2 files changed, 24 insertions(+), 5 deletions(-)
[carbondata] branch master updated: [CARBONDATA-3490] Fix concurrent data load failure with carbondata FileNotFound exception
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new a73cadd [CARBONDATA-3490] Fix concurrent data load failure with carbondata FileNotFound exception a73cadd is described below commit a73cadda438de57713ffc5fd85a86b4fdb5442c7 Author: ajantha-bhat AuthorDate: Fri Aug 9 10:19:32 2019 +0530 [CARBONDATA-3490] Fix concurrent data load failure with carbondata FileNotFound exception problem: When two load is happening concurrently, one load is cleaning the temp directory of the concurrent load cause: temp directory to store the carbon files is created using system.get nano time, due to this two load have same store location. when one load is completed, it cleaned the temp directory. causing dataload failure for other load. solution: use UUID instead of nano time while creating the temp directory to have each load a unique directory. This closes #3352 --- .../main/scala/org/apache/carbondata/spark/util/CommonUtil.scala | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala index 7015279..8d6cdfb 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala @@ -21,6 +21,7 @@ import java.io.File import java.math.BigDecimal import java.text.SimpleDateFormat import java.util +import java.util.UUID import java.util.regex.{Matcher, Pattern} import scala.collection.JavaConverters._ @@ -777,8 +778,10 @@ object CommonUtil { val isCarbonUseYarnLocalDir = CarbonProperties.getInstance().getProperty( CarbonCommonConstants.CARBON_LOADING_USE_YARN_LOCAL_DIR, CarbonCommonConstants.CARBON_LOADING_USE_YARN_LOCAL_DIR_DEFAULT).equalsIgnoreCase("true") -val tmpLocationSuffix = - s"${File.separator}carbon${System.nanoTime()}${CarbonCommonConstants.UNDERSCORE}$index" +val tmpLocationSuffix = s"${ File.separator }carbon${ + UUID.randomUUID().toString +.replace("-", "") +}${ CarbonCommonConstants.UNDERSCORE }$index" if (isCarbonUseYarnLocalDir) { val yarnStoreLocations = Util.getConfiguredLocalDirs(SparkEnv.get.conf)
[carbondata] branch branch-1.6 updated (9724fd4 -> 61a5bd3)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git. discard 9724fd4 [maven-release-plugin] prepare for next development iteration omit 9ca7891 [maven-release-plugin] prepare release apache-CarbonData-1.6.0-rc2 omit 80438f7 [HOTFIX] Removed the hive-exec and commons dependency from hive module omit 575b711 [CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning omit 2ebc041 [CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after alter operation omit 917e041 [HOTFIX] CLI test case failed during release because of space differences add c8cc92b [CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after alter operation add 10f3747 [HOTFIX] CLI test case failed during release because of space differences add 765712a [CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning add d7d70a8 [HOTFIX] Removed the hive-exec and commons dependency from hive module add f005fd4 [CARBONDATA-3477] deal line break chars correctly after 'select' in 'update ... select columns' sql add 35f1501 [CARBONDATA-3483] don't require update.lock and compaction.lock again when execute 'IUD_UPDDEL_DELTA' compaction add e14c817 [CARBONDATA-3485] Data loading is failed from S3 to hdfs table having ~2K carbonfiles add 88ec830 [CARBONDATA-3476] Fix Read time and scan time stats in executor log for filter query add ebe4057 [CARBONDATA-3452] Fix select query failure when substring on dictionary column with join add bbeb974 [CARBONDATA-3487] wrong Input metrics (size/record) displayed in spark UI during insert into add aa67a99 [CARBONDATA-3486] Fix Serialization/Deserialization issue with DataType add 8f0724e [CARBONDATA-3482] Fixed NPE in Concurrent query add 61a5bd3 [HOTFIX] Fix dictionary include issue with codegen failure This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (9724fd4) \ N -- N -- N refs/heads/branch-1.6 (61a5bd3) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: assembly/pom.xml | 2 +- common/pom.xml | 2 +- core/pom.xml | 2 +- .../core/constants/CarbonCommonConstants.java | 10 .../block/SegmentPropertiesAndSchemaHolder.java| 21 --- .../core/indexstore/BlockletDataMapIndexStore.java | 2 +- .../indexstore/blockletindex/BlockDataMap.java | 36 .../indexstore/blockletindex/BlockletDataMap.java | 7 +-- .../carbondata/core/scan/filter/FilterUtil.java| 5 +- .../MeasureColumnResolvedFilterInfo.java | 3 +- .../AbstractDetailQueryResultIterator.java | 10 ++-- .../scan/scanner/impl/BlockletFilterScanner.java | 34 .../carbondata/core/util/CarbonProperties.java | 24 .../carbondata/core/util/TaskMetricsMap.java | 21 ++- datamap/bloom/pom.xml | 6 +- datamap/examples/pom.xml | 6 +- datamap/lucene/pom.xml | 6 +- datamap/mv/core/pom.xml| 2 +- datamap/mv/plan/pom.xml| 2 +- examples/flink/pom.xml | 2 +- examples/spark2/pom.xml| 2 +- format/pom.xml | 2 +- hadoop/pom.xml | 2 +- integration/hive/pom.xml | 2 +- integration/presto/pom.xml | 2 +- integration/spark-common-test/pom.xml | 14 ++--- ...ryWithColumnMetCacheAndCacheLevelProperty.scala | 27 + .../iud/HorizontalCompactionTestCase.scala | 64 +- .../testsuite/iud/UpdateCarbonTableTestCase.scala | 42 +- integration/spark-common/pom.xml | 2 +- .../apache/carbondata/spark/InitInputMetrics.java | 2 +- .../spark/load/DataLoadProcessBuilderOnSpark.scala | 2 +- .../apache/carbondata/spark/rdd/CarbonRDD.scala| 4 +- .../carbondata/spark/rdd/CarbonScanRDD.scala | 2 +-
[carbondata] branch master updated: [HOTFIX] Fix dictionary include issue with codegen failure
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 61a5bd3 [HOTFIX] Fix dictionary include issue with codegen failure 61a5bd3 is described below commit 61a5bd3351cfeaa85528abeb70a8eae9c6521db6 Author: ajantha-bhat AuthorDate: Fri Aug 9 17:33:34 2019 +0530 [HOTFIX] Fix dictionary include issue with codegen failure problem: when whole codegen is false, query on dictionary include column fails. cause: This is because, the data type is not updated for dictionary include column. solution: return the updated expression when data type is changed for the dictionary include column This closes #3353 --- .../scala/org/apache/spark/sql/optimizer/CarbonLateDecodeRule.scala | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonLateDecodeRule.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonLateDecodeRule.scala index 93773fc..961bf11 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonLateDecodeRule.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonLateDecodeRule.scala @@ -713,9 +713,10 @@ class CarbonLateDecodeRule extends Rule[LogicalPlan] with PredicateHelper { prExp.transform { case attr: AttributeReference => updateDataType(attr, attrMap, allAttrsNotDecode, aliasMap) -} +}.asInstanceOf[NamedExpression] + } else { +prExp } - prExp } Project(prExps, p.child) case wd: Window if relations.nonEmpty =>
[carbondata] branch master updated: [CARBONDATA-3482] Fixed NPE in Concurrent query
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 8f0724e [CARBONDATA-3482] Fixed NPE in Concurrent query 8f0724e is described below commit 8f0724e4256e960608aa6a0d66593acd2ceaa84e Author: kunal642 AuthorDate: Mon Jul 29 14:31:31 2019 +0530 [CARBONDATA-3482] Fixed NPE in Concurrent query Problem: In case of concurrent queries if Q1 is loading cache and Q2 is removing from cache then Q2 may remove the segmentPropertiesIndex which Q1 has allocated and is about to access. This will cause NullPointerException . Solution: Instead of adding index in BlockDataMap keep the reference of segmentPropertiesWrapper to be used. This closes #3351 --- .../block/SegmentPropertiesAndSchemaHolder.java| 21 ++--- .../core/indexstore/BlockletDataMapIndexStore.java | 2 +- .../indexstore/blockletindex/BlockDataMap.java | 36 -- .../indexstore/blockletindex/BlockletDataMap.java | 7 + ...ryWithColumnMetCacheAndCacheLevelProperty.scala | 27 ++-- 5 files changed, 26 insertions(+), 67 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java b/core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java index f2f2d8c..056a0e7 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/block/SegmentPropertiesAndSchemaHolder.java @@ -98,7 +98,7 @@ public class SegmentPropertiesAndSchemaHolder { * @param columnCardinality * @param segmentId */ - public int addSegmentProperties(CarbonTable carbonTable, + public SegmentPropertiesWrapper addSegmentProperties(CarbonTable carbonTable, List columnsInTable, int[] columnCardinality, String segmentId) { SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper segmentPropertiesWrapper = new SegmentPropertiesAndSchemaHolder.SegmentPropertiesWrapper(carbonTable, @@ -137,7 +137,7 @@ public class SegmentPropertiesAndSchemaHolder { .addMinMaxColumns(carbonTable); } } -return segmentIdSetAndIndexWrapper.getSegmentPropertiesIndex(); +return getSegmentPropertiesWrapper(segmentIdSetAndIndexWrapper.getSegmentPropertiesIndex()); } /** @@ -222,17 +222,14 @@ public class SegmentPropertiesAndSchemaHolder { * Method to remove the given segment ID * * @param segmentId - * @param segmentPropertiesIndex * @param clearSegmentWrapperFromMap flag to specify whether to clear segmentPropertiesWrapper * from Map if all the segment's using it have become stale */ - public void invalidate(String segmentId, int segmentPropertiesIndex, + public void invalidate(String segmentId, SegmentPropertiesWrapper segmentPropertiesWrapper, boolean clearSegmentWrapperFromMap) { -SegmentPropertiesWrapper segmentPropertiesWrapper = -indexToSegmentPropertiesWrapperMapping.get(segmentPropertiesIndex); -if (null != segmentPropertiesWrapper) { - SegmentIdAndSegmentPropertiesIndexWrapper segmentIdAndSegmentPropertiesIndexWrapper = - segmentPropWrapperToSegmentSetMap.get(segmentPropertiesWrapper); +SegmentIdAndSegmentPropertiesIndexWrapper segmentIdAndSegmentPropertiesIndexWrapper = +segmentPropWrapperToSegmentSetMap.get(segmentPropertiesWrapper); +if (segmentIdAndSegmentPropertiesIndexWrapper != null) { synchronized (getOrCreateTableLock(segmentPropertiesWrapper.getTableIdentifier())) { segmentIdAndSegmentPropertiesIndexWrapper.removeSegmentId(segmentId); // if after removal of given SegmentId, the segmentIdSet becomes empty that means this @@ -240,14 +237,16 @@ public class SegmentPropertiesAndSchemaHolder { // removed from all the holders if (clearSegmentWrapperFromMap && segmentIdAndSegmentPropertiesIndexWrapper.segmentIdSet .isEmpty()) { - indexToSegmentPropertiesWrapperMapping.remove(segmentPropertiesIndex); + indexToSegmentPropertiesWrapperMapping + .remove(segmentIdAndSegmentPropertiesIndexWrapper.getSegmentPropertiesIndex()); segmentPropWrapperToSegmentSetMap.remove(segmentPropertiesWrapper); } else if (!clearSegmentWrapperFromMap && segmentIdAndSegmentPropertiesIndexWrapper.segmentIdSet.isEmpty()) { // min max columns can very when cache is modified. So even though entry is not required // to be deleted from map clear the column cache so that it can filled again segmentPropertiesWrapper.clear(); - LOGGER.info(
[carbondata] branch branch-1.6 updated: [HOTFIX] CLI test case failed during release because of space differences
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/branch-1.6 by this push: new 917e041 [HOTFIX] CLI test case failed during release because of space differences 917e041 is described below commit 917e041439282985ec28ff89249db6088e6771df Author: ravipesala AuthorDate: Thu Aug 1 18:12:50 2019 +0530 [HOTFIX] CLI test case failed during release because of space differences CLI test case is failed if the release name is short and without snapshot, it adds more space. That's why changed test check the individual contains instead of a batch of lines. This closes #3344 --- .../src/test/java/org/apache/carbondata/tool/CarbonCliTest.java | 9 +++-- 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java b/tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java index 26901f8..af8d51d 100644 --- a/tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java +++ b/tools/cli/src/test/java/org/apache/carbondata/tool/CarbonCliTest.java @@ -241,12 +241,9 @@ public class CarbonCliTest { "20 3.36KB 4.06MB false 00.0B 93.76KB 0.0 100.0 7298 " , "21 2.04KB 1.49MB false 00.0B 89.62KB 0.0 100.0 9299 "); Assert.assertTrue(output.contains(expectedOutput)); - -expectedOutput = buildLines( -"## version Details", -"written_by Version ", -"TestUtil"+ CarbonVersionConstants.CARBONDATA_VERSION+" "); -Assert.assertTrue(output.contains(expectedOutput)); +Assert.assertTrue(output.contains("## version Details")); +Assert.assertTrue(output.contains("written_by Version")); +Assert.assertTrue(output.contains("TestUtil"+ CarbonVersionConstants.CARBONDATA_VERSION)); } @Test
[carbondata] branch branch-1.6 updated (6366d9e -> ac2af7c)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git. discard 6366d9e [maven-release-plugin] prepare for next development iteration omit 9938633 [maven-release-plugin] prepare release apache-carbondata-1.6.0-rc1 add ee78597 [CARBONDATA-3462][DOC]Added documentation for index server add a77e4fd [HOTFIX] Reset the hive catalog table stats to none even after refresh lookup relation. add 1d0754e [HOTFIX] Fix json to carbon writer add b6e5f69 [HOTFIX] Added taskid as UUID while writing files in fileformat to avoid corrupting. add ec2a731 [HOTFIX] Included MV module in assembly jar add c0d8d34 [HOTFIX] Fixed sk/ak not found for datasource table add c65cc12 [CARBONDATA-3474]Fix validate mvQuery having filter expression and correct error message add ed117f7 [HOTFIX] Fix failing CI test cases add ac2af7c [HOTFIX] Fixed date filter issue for fileformat This update added new revisions after undoing existing revisions. That is to say, some revisions that were in the old version of the branch are not in the new version. This situation occurs when a user --force pushes a change and generates a repository containing something like this: * -- * -- B -- O -- O -- O (6366d9e) \ N -- N -- N refs/heads/branch-1.6 (ac2af7c) You should already have received notification emails for all of the O revisions, and so the following emails describe only the N revisions from the common base, B. Any revisions marked "omit" are not gone; other references still refer to them. Any revisions marked "discard" are gone forever. No new revisions were added by this update. Summary of changes: README.md | 3 +- assembly/pom.xml | 27 +-- common/pom.xml | 2 +- core/pom.xml | 2 +- .../core/constants/CarbonCommonConstants.java | 2 +- .../carbondata/core/util/path/CarbonTablePath.java | 12 +- datamap/bloom/pom.xml | 6 +- datamap/examples/pom.xml | 6 +- datamap/lucene/pom.xml | 6 +- datamap/mv/core/pom.xml| 2 +- .../apache/carbondata/mv/datamap/MVHelper.scala| 13 +- .../org/apache/carbondata/mv/datamap/MVUtil.scala | 3 +- .../carbondata/mv/rewrite/MVCreateTestCase.scala | 2 +- .../mv/rewrite/TestAllOperationsOnMV.scala | 36 +++- datamap/mv/plan/pom.xml| 2 +- docs/index-server.md | 229 + examples/spark2/pom.xml| 2 +- format/pom.xml | 2 +- hadoop/pom.xml | 2 +- .../carbondata/hadoop/api/CarbonInputFormat.java | 4 +- integration/hive/pom.xml | 2 +- integration/presto/pom.xml | 2 +- integration/spark-common-test/pom.xml | 14 +- integration/spark-common/pom.xml | 2 +- integration/spark-datasource/pom.xml | 2 +- .../execution/datasources/CarbonFileIndex.scala| 3 +- .../datasources/CarbonSparkDataSourceUtil.scala| 5 + .../datasources/SparkCarbonFileFormat.scala| 4 +- .../datasource/SparkCarbonDataSourceTest.scala | 13 ++ integration/spark2/pom.xml | 2 +- .../sql/hive/CarbonInMemorySessionState.scala | 28 ++- .../apache/spark/sql/hive/CarbonSessionState.scala | 25 ++- .../apache/spark/sql/hive/CarbonSessionUtil.scala | 69 +-- .../carbondata/indexserver/IndexServer.scala | 9 +- .../scala/org/apache/spark/util/DataMapUtil.scala | 2 +- pom.xml| 80 ++- processing/pom.xml | 2 +- .../partition/spliter/RowResultProcessor.java | 2 +- .../processing/store/CarbonDataFileAttributes.java | 15 +- .../store/CarbonFactDataHandlerModel.java | 2 +- store/sdk/pom.xml | 6 +- .../carbondata/sdk/file/CarbonWriterBuilder.java | 9 +- .../carbondata/sdk/file/CSVCarbonWriterTest.java | 4 +- .../carbondata/sdk/file/CarbonReaderTest.java | 53 + streaming/pom.xml | 6 +- .../streaming/CarbonStreamRecordWriter.java| 2 +- tools/cli/pom.xml | 6 +- 47 files changed, 537 insertions(+), 195 deletions(-) create mode 100644 docs/index-server.md
[carbondata] branch master updated: [HOTFIX] Fix failing CI test cases
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new ed117f7 [HOTFIX] Fix failing CI test cases ed117f7 is described below commit ed117f74caae606849d01f5df434804ecc97d8eb Author: kunal642 AuthorDate: Mon Jul 29 21:46:41 2019 +0530 [HOTFIX] Fix failing CI test cases Problem: Bloom and lucene dependency was removed due to which mvn was downloaded the old jar. Solution: Add bloom and lucene dependency to the main pom This closes #3341 --- pom.xml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pom.xml b/pom.xml index 3995e43..35317ec 100644 --- a/pom.xml +++ b/pom.xml @@ -108,6 +108,8 @@ store/sdk assembly tools/cli +datamap/bloom +datamap/lucene datamap/mv/plan datamap/mv/core examples/spark2
svn commit: r34885 - in /dev/carbondata/1.6.0-rc1: ./ apache-carbondata-1.6.0-source-release.zip apache-carbondata-1.6.0-source-release.zip.asc apache-carbondata-1.6.0-source-release.zip.sha512
Author: ravipesala Date: Mon Jul 15 14:02:27 2019 New Revision: 34885 Log: Upload 1.6.0-rc1 Added: dev/carbondata/1.6.0-rc1/ dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip (with props) dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip.asc dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip.sha512 Added: dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip == Binary file - no diff available. Propchange: dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip -- svn:mime-type = application/octet-stream Added: dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip.asc == --- dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip.asc (added) +++ dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip.asc Mon Jul 15 14:02:27 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEER3EpqJTxH7zLwCVHutcqeKexsu4FAl0scX4ACgkQutcqeKex +su7+eRAAh3pmvUpjys9SxZZAftzqog0a5TJG3z0sLzFHj17WD86VE+zcsUj8EskU +zbbiJI76Q17gyt3p6m6+a0fwtN+YMJi8gx5WWmd5HEgkBaSE7zwYblXM/dpnVZke +HYM+RLZVMpPJJ0cc3bzf3JjXEaZzj+pm1s+ewuEka/QyqXzq+CUFzOSD9whFv3Xh +m+MaPGl2CdigQrGdDpLxyRKipmdg3yF/lSexASIB6Ol5VZxqGIwX4WCmHZo0HbkY +GL+3YJFnoExKylxC3Y6pk6gtaWFkmR3lHazHtWlJN+K/tGgG+XqM1Nn2w/wDBMfW +yt1Yla19OeW9GoazLehzojsMorQPRL6+3ZZYa61LUkrdSa5dTtaXaQ+RKGXsmEwk +04Hxgvk+g6eRFCro8AseR45ss4GXvsOQyAEv5Y8szemz/kRcrDk8VYLMtQNyyKGj +Bm26G7X68lMtVmyaju0XdKRraeDD1P5qgFyH0Tj8cYuLBEjCYGMLRHTSoyiOrwZY +0ididPCBR5nsTTb00FhAJfJDwkZ1dTkwJiz74SMtw3Hb4eNKXUKMOHLPJu2tASEm +5vZ+y844NadwvuYaEr5iXrPlYf1f2C9Rhca61ypFFPrhttgABE+W8wRsWmsXMLQO +KK15e036XJYVEqMlA1fT25uLZvohg1cKQVKvpgP5ZUzzreu/k90= +=/c1j +-END PGP SIGNATURE- Added: dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip.sha512 == --- dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip.sha512 (added) +++ dev/carbondata/1.6.0-rc1/apache-carbondata-1.6.0-source-release.zip.sha512 Mon Jul 15 14:02:27 2019 @@ -0,0 +1 @@ +e9d34a979f91466fc7be4d5807b2f935af395099e81a9fb6597bdfaac8e4cec2edb76ec029409f3ed2c513a032f34545507ea000e96c8b41bab300be4bc8e4de apache-carbondata-1.6.0-source-release.zip
[carbondata] branch branch-1.6 updated: [maven-release-plugin] prepare for next development iteration
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/branch-1.6 by this push: new 6366d9e [maven-release-plugin] prepare for next development iteration 6366d9e is described below commit 6366d9e423fc37a78f10c8376be1a95c64d3bd61 Author: ravipesala AuthorDate: Mon Jul 15 18:28:52 2019 +0530 [maven-release-plugin] prepare for next development iteration --- assembly/pom.xml | 2 +- common/pom.xml| 2 +- core/pom.xml | 2 +- datamap/bloom/pom.xml | 2 +- datamap/examples/pom.xml | 2 +- datamap/lucene/pom.xml| 2 +- datamap/mv/core/pom.xml | 2 +- datamap/mv/plan/pom.xml | 2 +- examples/spark2/pom.xml | 2 +- format/pom.xml| 2 +- hadoop/pom.xml| 2 +- integration/hive/pom.xml | 2 +- integration/presto/pom.xml| 2 +- integration/spark-common-test/pom.xml | 2 +- integration/spark-common/pom.xml | 2 +- integration/spark-datasource/pom.xml | 2 +- integration/spark2/pom.xml| 2 +- pom.xml | 4 ++-- processing/pom.xml| 2 +- store/sdk/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/cli/pom.xml | 2 +- 22 files changed, 23 insertions(+), 23 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index 7cc80dc..6d6c391 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../pom.xml diff --git a/common/pom.xml b/common/pom.xml index 8e5ddaa..728314c 100644 --- a/common/pom.xml +++ b/common/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../pom.xml diff --git a/core/pom.xml b/core/pom.xml index b39a42e..22982f3 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../pom.xml diff --git a/datamap/bloom/pom.xml b/datamap/bloom/pom.xml index a29f77b..8ba7846 100644 --- a/datamap/bloom/pom.xml +++ b/datamap/bloom/pom.xml @@ -4,7 +4,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../../pom.xml diff --git a/datamap/examples/pom.xml b/datamap/examples/pom.xml index a9c179d..6e3b8ae 100644 --- a/datamap/examples/pom.xml +++ b/datamap/examples/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../../pom.xml diff --git a/datamap/lucene/pom.xml b/datamap/lucene/pom.xml index 1a23a52..42a22b2 100644 --- a/datamap/lucene/pom.xml +++ b/datamap/lucene/pom.xml @@ -4,7 +4,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../../pom.xml diff --git a/datamap/mv/core/pom.xml b/datamap/mv/core/pom.xml index 5cb284d..6af274d 100644 --- a/datamap/mv/core/pom.xml +++ b/datamap/mv/core/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../../../pom.xml diff --git a/datamap/mv/plan/pom.xml b/datamap/mv/plan/pom.xml index fe1afb7..4b8c9be 100644 --- a/datamap/mv/plan/pom.xml +++ b/datamap/mv/plan/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../../../pom.xml diff --git a/examples/spark2/pom.xml b/examples/spark2/pom.xml index e303406..ad0d3ec 100644 --- a/examples/spark2/pom.xml +++ b/examples/spark2/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../../pom.xml diff --git a/format/pom.xml b/format/pom.xml index 51135d8..81aa95b 100644 --- a/format/pom.xml +++ b/format/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../pom.xml diff --git a/hadoop/pom.xml b/hadoop/pom.xml index 59f515e..bcb5696 100644 --- a/hadoop/pom.xml +++ b/hadoop/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../pom.xml diff --git a/integration/hive/pom.xml b/integration/hive/pom.xml index 58b0796..dfa8810 100644 --- a/integration/hive/pom.xml +++ b/integration/hive/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0 +1.6.1-SNAPSHOT ../../pom.xml diff --git a/integration/presto/pom.xml b/integration/presto/pom.xml index a2e9ef3..2631605 100644 --- a/integration/presto/pom.xml +++ b/integration/presto/pom.xml @@ -22,7 +22,7
[carbondata] annotated tag apache-carbondata-1.6.0-rc1 created (now 04c1e6b)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to annotated tag apache-carbondata-1.6.0-rc1 in repository https://gitbox.apache.org/repos/asf/carbondata.git. at 04c1e6b (tag) tagging 9938633c6a80407876c7e7fa0ffd455164edff4b (commit) by ravipesala on Mon Jul 15 18:28:31 2019 +0530 - Log - [maven-release-plugin] copy for tag apache-carbondata-1.6.0-rc1 --- No new revisions were added by this update.
[carbondata] annotated tag apache-carbondata-1.6.0-rc1 deleted (was 68bae63)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to annotated tag apache-carbondata-1.6.0-rc1 in repository https://gitbox.apache.org/repos/asf/carbondata.git. *** WARNING: tag apache-carbondata-1.6.0-rc1 was deleted! *** tag was 68bae63 The revisions that were on this annotated tag are still contained in other references; therefore, this change does not discard any commits from the repository.
[carbondata] annotated tag apache-carbondata-1.6.0-rc1 created (now 68bae63)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to annotated tag apache-carbondata-1.6.0-rc1 in repository https://gitbox.apache.org/repos/asf/carbondata.git. at 68bae63 (tag) tagging 9938633c6a80407876c7e7fa0ffd455164edff4b (commit) by ravipesala on Mon Jul 15 18:17:06 2019 +0530 - Log - [maven-release-plugin] copy for tag apache-carbondata-1.6.0-rc1 --- No new revisions were added by this update.
[carbondata] branch branch-1.6 created (now 9938633)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git. at 9938633 [maven-release-plugin] prepare release apache-carbondata-1.6.0-rc1 This branch includes the following new commits: new 9938633 [maven-release-plugin] prepare release apache-carbondata-1.6.0-rc1 The 1 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference.
[carbondata] 01/01: [maven-release-plugin] prepare release apache-carbondata-1.6.0-rc1
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.6 in repository https://gitbox.apache.org/repos/asf/carbondata.git commit 9938633c6a80407876c7e7fa0ffd455164edff4b Author: ravipesala AuthorDate: Mon Jul 15 18:14:51 2019 +0530 [maven-release-plugin] prepare release apache-carbondata-1.6.0-rc1 --- assembly/pom.xml | 2 +- common/pom.xml| 2 +- core/pom.xml | 2 +- datamap/bloom/pom.xml | 6 ++ datamap/examples/pom.xml | 6 ++ datamap/lucene/pom.xml| 6 ++ datamap/mv/core/pom.xml | 2 +- datamap/mv/plan/pom.xml | 2 +- examples/spark2/pom.xml | 2 +- format/pom.xml| 2 +- hadoop/pom.xml| 2 +- integration/hive/pom.xml | 2 +- integration/presto/pom.xml| 2 +- integration/spark-common-test/pom.xml | 14 +++--- integration/spark-common/pom.xml | 2 +- integration/spark-datasource/pom.xml | 2 +- integration/spark2/pom.xml| 2 +- pom.xml | 4 ++-- processing/pom.xml| 2 +- store/sdk/pom.xml | 6 ++ streaming/pom.xml | 6 ++ tools/cli/pom.xml | 6 ++ 22 files changed, 35 insertions(+), 47 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index d88c91a..7cc80dc 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0-SNAPSHOT +1.6.0 ../pom.xml diff --git a/common/pom.xml b/common/pom.xml index 14cd52f..8e5ddaa 100644 --- a/common/pom.xml +++ b/common/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0-SNAPSHOT +1.6.0 ../pom.xml diff --git a/core/pom.xml b/core/pom.xml index 41481af..b39a42e 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0-SNAPSHOT +1.6.0 ../pom.xml diff --git a/datamap/bloom/pom.xml b/datamap/bloom/pom.xml index 1e8c382..a29f77b 100644 --- a/datamap/bloom/pom.xml +++ b/datamap/bloom/pom.xml @@ -1,12 +1,10 @@ -http://maven.apache.org/POM/4.0.0; - xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; - xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> 4.0.0 org.apache.carbondata carbondata-parent -1.6.0-SNAPSHOT +1.6.0 ../../pom.xml diff --git a/datamap/examples/pom.xml b/datamap/examples/pom.xml index 3720a1c..a9c179d 100644 --- a/datamap/examples/pom.xml +++ b/datamap/examples/pom.xml @@ -15,16 +15,14 @@ See the License for the specific language governing permissions and limitations under the License. --> -http://maven.apache.org/POM/4.0.0; - xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; - xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> 4.0.0 org.apache.carbondata carbondata-parent -1.6.0-SNAPSHOT +1.6.0 ../../pom.xml diff --git a/datamap/lucene/pom.xml b/datamap/lucene/pom.xml index 3e93a83..1a23a52 100644 --- a/datamap/lucene/pom.xml +++ b/datamap/lucene/pom.xml @@ -1,12 +1,10 @@ -http://maven.apache.org/POM/4.0.0; - xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; - xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> +http://maven.apache.org/POM/4.0.0; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> 4.0.0 org.apache.carbondata carbondata-parent -1.6.0-SNAPSHOT +1.6.0 ../../pom.xml diff --git a/datamap/mv/core/pom.xml b/datamap/mv/core/pom.xml index 0a1f0e2..5cb284d 100644 --- a/datamap/mv/core/pom.xml +++ b/datamap/mv/core/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.6.0-SNAPSHOT +1.6.0 ../../../pom.xml diff --git a/datamap/mv/plan/pom.xml b/datamap/mv/plan/pom.xml index 753d48b..fe1afb7 100644 --- a/datamap/mv/plan/pom.xml +++ b/datamap/mv/plan/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbon
[carbondata] branch master updated: [CARBONDATA-3460] Fixed EOFException in CarbonScanRDD
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 0902d45 [CARBONDATA-3460] Fixed EOFException in CarbonScanRDD 0902d45 is described below commit 0902d459a30e0fdd72868b2956eeb1c6b3b06346 Author: kunal642 AuthorDate: Wed Jul 3 10:54:08 2019 +0530 [CARBONDATA-3460] Fixed EOFException in CarbonScanRDD Problem: Delete delta information was not written properly in the OutputStream due the flag based writing. Solution: Always write the delete delta info, the size of the array will be the deciding factor whether to read further or not. This closes #3316 --- .../core/indexstore/ExtendedBlocklet.java | 1 - .../apache/carbondata/hadoop/CarbonInputSplit.java | 52 +++--- 2 files changed, 26 insertions(+), 27 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/ExtendedBlocklet.java b/core/src/main/java/org/apache/carbondata/core/indexstore/ExtendedBlocklet.java index d97148d..a85423b 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/ExtendedBlocklet.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/ExtendedBlocklet.java @@ -177,7 +177,6 @@ public class ExtendedBlocklet extends Blocklet { DataOutputStream dos = new DataOutputStream(ebos); inputSplit.setFilePath(null); inputSplit.setBucketId(null); - inputSplit.setWriteDeleteDelta(false); if (inputSplit.isBlockCache()) { inputSplit.updateFooteroffset(); inputSplit.updateBlockLength(); diff --git a/core/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java b/core/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java index da1bc2c..edbfcfe 100644 --- a/core/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java +++ b/core/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java @@ -14,6 +14,7 @@ * See the License for the specific language governing permissions and * limitations under the License. */ + package org.apache.carbondata.hadoop; import java.io.ByteArrayInputStream; @@ -150,8 +151,6 @@ public class CarbonInputSplit extends FileSplit */ private int rowCount; - private boolean writeDeleteDelta = true; - public CarbonInputSplit() { segment = null; taskId = "0"; @@ -195,7 +194,13 @@ public class CarbonInputSplit extends FileSplit this.version = ColumnarFormatVersion.valueOf(in.readShort()); // will be removed after count(*) optmization in case of index server this.rowCount = in.readInt(); -this.writeDeleteDelta = in.readBoolean(); +if (in.readBoolean()) { + int numberOfDeleteDeltaFiles = in.readInt(); + deleteDeltaFiles = new String[numberOfDeleteDeltaFiles]; + for (int i = 0; i < numberOfDeleteDeltaFiles; i++) { +deleteDeltaFiles[i] = in.readUTF(); + } +} // after deseralizing required field get the start position of field which will be only used // in executor int leftoverPosition = underlineStream.getPosition(); @@ -359,7 +364,13 @@ public class CarbonInputSplit extends FileSplit this.length = in.readLong(); this.version = ColumnarFormatVersion.valueOf(in.readShort()); this.rowCount = in.readInt(); - this.writeDeleteDelta = in.readBoolean(); + if (in.readBoolean()) { +int numberOfDeleteDeltaFiles = in.readInt(); +deleteDeltaFiles = new String[numberOfDeleteDeltaFiles]; +for (int i = 0; i < numberOfDeleteDeltaFiles; i++) { + deleteDeltaFiles[i] = in.readUTF(); +} + } this.bucketId = in.readUTF(); } this.blockletId = in.readUTF(); @@ -379,13 +390,6 @@ public class CarbonInputSplit extends FileSplit validBlockletIds.add((int) in.readShort()); } this.isLegacyStore = in.readBoolean(); -if (writeDeleteDelta) { - int numberOfDeleteDeltaFiles = in.readInt(); - deleteDeltaFiles = new String[numberOfDeleteDeltaFiles]; - for (int i = 0; i < numberOfDeleteDeltaFiles; i++) { -deleteDeltaFiles[i] = in.readUTF(); - } -} } @Override public void write(DataOutput out) throws IOException { @@ -397,11 +401,10 @@ public class CarbonInputSplit extends FileSplit out.writeLong(length); out.writeShort(version.number()); out.writeInt(rowCount); - out.writeBoolean(writeDeleteDelta); + writeDeleteDeltaFile(out); out.writeUTF(bucketId); out.writeUTF(blockletId); out.write(serializeData, offset, actualLen); - writeDeleteDeltaFile(out); return; } // please refer writeDetailInfo doc @@ -419,7 +422,7 @@ public class CarbonInputSplit extends FileSplit } else { out.writeInt(0); } -out.writeBoole
[carbondata] branch master updated: [CARBONDATA-3459] Fixed id based distribution for showcache command
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new a682f98 [CARBONDATA-3459] Fixed id based distribution for showcache command a682f98 is described below commit a682f98e885bedd3a3d980223937095861c27607 Author: kunal642 AuthorDate: Wed Jul 3 00:53:43 2019 +0530 [CARBONDATA-3459] Fixed id based distribution for showcache command Problem: Currently tasks are not being fired based on the executor ID because getPrefferedLocation was not overridden. Solution: override getPreferredLocations in the ShowCache and InvalidateCacheRDD to fire tasks at the appropriate location This closes #3315 --- .../carbondata/indexserver/DistributedRDDUtils.scala | 6 +++--- .../carbondata/indexserver/DistributedShowCacheRDD.scala | 8 .../indexserver/InvalidateSegmentCacheRDD.scala | 15 ++- 3 files changed, 25 insertions(+), 4 deletions(-) diff --git a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala index a568153..933ec15 100644 --- a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala +++ b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala @@ -316,10 +316,10 @@ object DistributedRDDUtils { if (existingSegmentMapping == null) { val newSegmentMapping = new ConcurrentHashMap[String, String]() newSegmentMapping.put(segment.getSegmentNo, s"${newHost}_$newExecutor") -tableToExecutorMapping.put(tableUniqueName, newSegmentMapping) +tableToExecutorMapping.putIfAbsent(tableUniqueName, newSegmentMapping) } else { -existingSegmentMapping.put(segment.getSegmentNo, s"${newHost}_$newExecutor") -tableToExecutorMapping.put(tableUniqueName, existingSegmentMapping) +existingSegmentMapping.putIfAbsent(segment.getSegmentNo, s"${newHost}_$newExecutor") +tableToExecutorMapping.putIfAbsent(tableUniqueName, existingSegmentMapping) } s"executor_${newHost}_$newExecutor" } diff --git a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedShowCacheRDD.scala b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedShowCacheRDD.scala index 78b7e72..f1707c6 100644 --- a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedShowCacheRDD.scala +++ b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedShowCacheRDD.scala @@ -37,6 +37,14 @@ class DistributedShowCacheRDD(@transient private val ss: SparkSession, tableName } }.toArray + override protected def getPreferredLocations(split: Partition): Seq[String] = { +if (split.asInstanceOf[DataMapRDDPartition].getLocations != null) { + split.asInstanceOf[DataMapRDDPartition].getLocations.toSeq +} else { + Seq() +} + } + override protected def internalGetPartitions: Array[Partition] = { executorsList.zipWithIndex.map { case (executor, idx) => diff --git a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/InvalidateSegmentCacheRDD.scala b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/InvalidateSegmentCacheRDD.scala index c2bd589..750f9d9 100644 --- a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/InvalidateSegmentCacheRDD.scala +++ b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/InvalidateSegmentCacheRDD.scala @@ -30,7 +30,12 @@ import org.apache.carbondata.spark.rdd.CarbonRDD class InvalidateSegmentCacheRDD(@transient private val ss: SparkSession, carbonTable: CarbonTable, invalidSegmentIds: List[String]) extends CarbonRDD[String](ss, Nil) { - val executorsList: Array[String] = DistributionUtil.getNodeList(ss.sparkContext) + val executorsList: Array[String] = DistributionUtil.getExecutors(ss.sparkContext).flatMap { +case (host, executors) => + executors.map { +executor => s"executor_${host}_$executor" + } + }.toArray override def internalCompute(split: Partition, context: TaskContext): Iterator[String] = { @@ -38,6 +43,14 @@ class InvalidateSegmentCacheRDD(@transient private val ss: SparkSession, carbonT Iterator.empty } + override protected def getPreferredLocations(split: Partition): Seq[String] = { +if (split.asInstanceOf[DataMapRDDPartition].getLocations != null) { + split.asInstanceOf[DataMapRDDPartition].getLocations.toSeq +} else { + Seq() +} + } + override protected def internalGetPartitions: Array[Partition] = { if (invalidSegmentIds.isEmpty) { Array()
[carbondata] branch master updated: [HOTFIX] Fixed MinMax Based Pruning for Measure column in case of Legacy store
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new b017253 [HOTFIX] Fixed MinMax Based Pruning for Measure column in case of Legacy store b017253 is described below commit b017253f4eb0fb78e8249e895a8a2a4d2ab929da Author: Indhumathi27 AuthorDate: Tue Jul 9 14:01:07 2019 +0530 [HOTFIX] Fixed MinMax Based Pruning for Measure column in case of Legacy store This closes #3320 --- .../core/scan/filter/executer/IncludeFilterExecuterImpl.java | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java index 1231aa0..bfa2460 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/IncludeFilterExecuterImpl.java @@ -509,9 +509,12 @@ public class IncludeFilterExecuterImpl implements FilterExecuter { } } else if (isMeasurePresentInCurrentBlock) { chunkIndex = msrColumnEvaluatorInfo.getColumnIndexInMinMaxByteArray(); - isScanRequired = isScanRequired(blkMaxVal[chunkIndex], blkMinVal[chunkIndex], - msrColumnExecutorInfo.getFilterKeys(), - msrColumnEvaluatorInfo.getType()); + if (isMinMaxSet[chunkIndex]) { +isScanRequired = isScanRequired(blkMaxVal[chunkIndex], blkMinVal[chunkIndex], +msrColumnExecutorInfo.getFilterKeys(), msrColumnEvaluatorInfo.getType()); + } else { +isScanRequired = true; + } } if (isScanRequired) {
[carbondata] branch master updated: [CARBONDATA-3467] Fix count(*) with filter on string column
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new ebe78dc [CARBONDATA-3467] Fix count(*) with filter on string column ebe78dc is described below commit ebe78dca170773a5f4a37e8146a923b2dc6604a4 Author: Indhumathi27 AuthorDate: Tue Jul 9 09:10:24 2019 +0530 [CARBONDATA-3467] Fix count(*) with filter on string column Problem: count(*) with filter on string column throws Unresolved Exception Solution: Added check for UnresolvedAlias in MVAnalyzer This closes #3319 --- .../org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala | 9 - .../carbondata/mv/rewrite/TestAllOperationsOnMV.scala | 13 - 2 files changed, 20 insertions(+), 2 deletions(-) diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala index 04bcfbb..edd9c81 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala @@ -70,7 +70,14 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { plan.transform { case aggregate@Aggregate(grp, aExp, child) => // check for if plan is for dataload for preaggregate table, then skip applying mv -if (aExp.exists(p => p.name.equals("preAggLoad") || p.name.equals("preAgg"))) { +val isPreAggLoad = aExp.exists { p => + if (p.isInstanceOf[UnresolvedAlias]) { +false + } else { +p.name.equals("preAggLoad") || p.name.equals("preAgg") + } +} +if (isPreAggLoad) { needAnalysis = false } Aggregate(grp, aExp, child) diff --git a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/TestAllOperationsOnMV.scala b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/TestAllOperationsOnMV.scala index 839a2e6..81ddf38 100644 --- a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/TestAllOperationsOnMV.scala +++ b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/TestAllOperationsOnMV.scala @@ -540,6 +540,17 @@ class TestAllOperationsOnMV extends QueryTest with BeforeAndAfterEach { }.getMessage.contains("Operation not allowed on child table.") } + test("test count(*) with filter") { +sql("drop table if exists maintable") +sql("create table maintable(id int, name string, id1 string, id2 string, dob timestamp, doj " + +"timestamp, v1 bigint, v2 bigint, v3 decimal(30,10), v4 decimal(20,10), v5 double, v6 " + +"double ) stored by 'carbondata'") +sql("insert into maintable values(1, 'abc', 'id001', 'id002', '2017-01-01 00:00:00','2017-01-01 " + +"00:00:00', 234, 2242,12.4,23.4,2323,455 )") +checkAnswer(sql("select count(*) from maintable where id1 < id2"), Seq(Row(1))) +sql("drop table if exists maintable") + } + test("drop meta cache on mv datamap table") { sql("drop table IF EXISTS maintable") sql("create table maintable(name string, c_code int, price int) stored by 'carbondata'") @@ -580,6 +591,6 @@ class TestAllOperationsOnMV extends QueryTest with BeforeAndAfterEach { newSet.addAll(oldSet) newSet } - + }
[carbondata] branch master updated: [CARBONDATA-3457][MV] Fix Column not found issue with Query having Cast Expression
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 771d436 [CARBONDATA-3457][MV] Fix Column not found issue with Query having Cast Expression 771d436 is described below commit 771d436fe2ed2d34ccf0ee1d8f555af30c382345 Author: Indhumathi27 AuthorDate: Thu Jun 27 17:09:20 2019 +0530 [CARBONDATA-3457][MV] Fix Column not found issue with Query having Cast Expression Problem: For Cast(exp), alias reference is not included, hence throws column not found exception for column given inside cast expression. Solution: AliasMap has to be created for CAST[EXP] also and should be replaced with subsmer alias map references. This closes #3312 --- .../carbondata/mv/rewrite/DefaultMatchMaker.scala | 16 ++ .../carbondata/mv/rewrite/MVCreateTestCase.scala | 58 ++ 2 files changed, 74 insertions(+) diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/DefaultMatchMaker.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/DefaultMatchMaker.scala index 9a9a2a6..5329608 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/DefaultMatchMaker.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/DefaultMatchMaker.scala @@ -53,6 +53,14 @@ abstract class DefaultMatchPattern extends MatchPattern[ModularPlan] { (a.child.asInstanceOf[Attribute], a.toAttribute) }) +// Create aliasMap with Expression to alias reference attribute +val aliasMapExp = + subsumer.outputList.collect { +case a: Alias if a.child.isInstanceOf[Expression] && + !a.child.isInstanceOf[AggregateExpression] => + a.child -> a.toAttribute + }.toMap + // Check and replace all alias references with subsumer alias map references. val compensation1 = compensation.transform { case plan if !plan.skip && plan != subsumer => @@ -66,6 +74,14 @@ abstract class DefaultMatchPattern extends MatchPattern[ModularPlan] { exprId = ref.exprId, qualifier = a.qualifier) }.getOrElse(a) + case a: Expression => +aliasMapExp + .get(a) + .map { ref => +AttributeReference( + ref.name, ref.dataType)( + exprId = ref.exprId) + }.getOrElse(a) } } diff --git a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala index 1d259c8..ca6c0c5 100644 --- a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala +++ b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala @@ -1169,6 +1169,64 @@ class MVCreateTestCase extends QueryTest with BeforeAndAfterAll { assert(TestUtil.verifyMVDataMap(analyzed1, "da_cast")) } + test("test cast of expression with mv") { +sql("drop table IF EXISTS maintable") +sql("create table maintable (m_month bigint, c_code string, " + +"c_country smallint, d_dollar_value double, q_quantity double, u_unit smallint, b_country smallint, i_id int, y_year smallint) stored by 'carbondata'") +sql("insert into maintable select 10, 'xxx', 123, 456, 45, 5, 23, 1, 2000") +sql("drop datamap if exists da_cast") +sql( + "create datamap da_cast using 'mv' as select cast(floor((m_month +1000) / 900) * 900 - 2000 AS INT) as a, c_code as abc from maintable") +val df1 = sql( + " select cast(floor((m_month +1000) / 900) * 900 - 2000 AS INT) as a ,c_code as abc from maintable") +val df2 = sql( + " select cast(floor((m_month +1000) / 900) * 900 - 2000 AS INT),c_code as abc from maintable") +val analyzed1 = df1.queryExecution.analyzed +assert(TestUtil.verifyMVDataMap(analyzed1, "da_cast")) + } + + test("test cast with & without alias") { +sql("drop table IF EXISTS maintable") +sql("create table maintable (m_month bigint, c_code string, " + +"c_country smallint, d_dollar_value double, q_quantity double, u_unit smallint, b_country smallint, i_id int, y_year smallint) stored by 'carbondata'") +sql("insert into maintable select 10, 'xxx', 123, 456, 45, 5, 23, 1, 2000") +sql("drop datamap if exists da_cast") +sql( + "create datamap da_cast using 'mv' as select cast(m_month + 1000 AS INT) as a, c_code as abc from maintable") +checkAnswer(sql(
[carbondata] branch master updated: [CARBONDATA-3456] Fix DataLoading to MV table when Yarn-Application is killed
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new cdf0594 [CARBONDATA-3456] Fix DataLoading to MV table when Yarn-Application is killed cdf0594 is described below commit cdf0594cb4fefcec6a892692daca2d73f40ccd19 Author: Indhumathi27 AuthorDate: Thu Jun 27 18:16:04 2019 +0530 [CARBONDATA-3456] Fix DataLoading to MV table when Yarn-Application is killed Problem: When dataLoad is triggered on datamaptable and new LoadMetaDetail with SegmentStatus as InsertInProgress and segmentMappingInfo is created and then yarn-application is killed. Then on next load, stale loadMetadetail is still in InsertInProgress state and mainTableSegemnts mapped to that loadMetaDetail is not considered for nextLoad resulted in dataMismatch between main table and datamap table Solution: Clean up the old invalid segment before creating a new entry for new Load. This closes #3310 --- .../carbondata/core/datamap/DataMapProvider.java | 25 .../carbondata/core/datamap/DataMapUtil.java | 18 ++- .../core/datamap/dev/DataMapSyncStatus.java| 19 --- .../carbondata/core/metadata/SegmentFileStore.java | 2 +- .../core/statusmanager/SegmentStatusManager.java | 27 ++ .../apache/carbondata/core/util/CarbonUtil.java| 2 +- .../bloom/BloomCoarseGrainDataMapFactory.java | 3 ++- .../datamap/lucene/LuceneDataMapFactoryBase.java | 3 ++- .../carbondata/mv/datamap/MVDataMapProvider.scala | 8 ++- .../mv/rewrite/MVIncrementalLoadingTestcase.scala | 6 +++-- .../hadoop/api/CarbonOutputCommitter.java | 5 ++-- .../hadoop/api/CarbonTableInputFormat.java | 6 +++-- .../carbondata/datamap/IndexDataMapProvider.java | 4 ++-- .../datamap/PreAggregateDataMapProvider.java | 4 ++-- .../datamap/IndexDataMapRebuildRDD.scala | 3 ++- .../spark/rdd/CarbonDataRDDFactory.scala | 1 + .../spark/sql/events/MergeIndexEventListener.scala | 2 +- .../sql/execution/command/cache/CacheUtil.scala| 4 ++-- .../command/cache/DropCacheEventListeners.scala| 3 ++- .../command/datamap/CarbonDataMapShowCommand.scala | 5 ++-- .../command/mutation/HorizontalCompaction.scala| 6 +++-- .../CarbonAlterTableDropHivePartitionCommand.scala | 2 +- .../CarbonAlterTableDropPartitionCommand.scala | 3 ++- .../CarbonAlterTableSplitPartitionCommand.scala| 3 ++- .../org/apache/spark/sql/hive/CarbonRelation.scala | 4 ++-- .../org/apache/spark/util/MergeIndexUtil.scala | 2 +- .../processing/merger/CarbonDataMergerUtil.java| 7 +++--- 27 files changed, 120 insertions(+), 57 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java index d0b66f3..c320226 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java @@ -129,10 +129,15 @@ public abstract class DataMapProvider { } String newLoadName = ""; String segmentMap = ""; -AbsoluteTableIdentifier dataMapTableAbsoluteTableIdentifier = AbsoluteTableIdentifier -.from(dataMapSchema.getRelationIdentifier().getTablePath(), +CarbonTable dataMapTable = CarbonTable + .buildFromTablePath(dataMapSchema.getRelationIdentifier().getTableName(), dataMapSchema.getRelationIdentifier().getDatabaseName(), -dataMapSchema.getRelationIdentifier().getTableName()); +dataMapSchema.getRelationIdentifier().getTablePath(), +dataMapSchema.getRelationIdentifier().getTableId()); +AbsoluteTableIdentifier dataMapTableAbsoluteTableIdentifier = +dataMapTable.getAbsoluteTableIdentifier(); +// Clean up the old invalid segment data before creating a new entry for new load. +SegmentStatusManager.deleteLoadsAndUpdateMetadata(dataMapTable, false, null); SegmentStatusManager segmentStatusManager = new SegmentStatusManager(dataMapTableAbsoluteTableIdentifier); Map> segmentMapping = new HashMap<>(); @@ -148,6 +153,15 @@ public abstract class DataMapProvider { CarbonTablePath.getMetadataPath(dataMapSchema.getRelationIdentifier().getTablePath()); LoadMetadataDetails[] loadMetaDataDetails = SegmentStatusManager.readLoadMetadata(dataMapTableMetadataPath); +// Mark for delete all stale loadMetadetail +for (LoadMetadataDetails loadMetadataDetail : loadMetaDataDetails) { + if ((loadMetadataDetail.getSegmentStatus() == SegmentStatus.INSERT_IN_PROGRESS + || loadMe
svn commit: r34819 - /release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc
Author: ravipesala Date: Tue Jul 9 16:08:24 2019 New Revision: 34819 Log: Checkin 1.5.4 Added: release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc Added: release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc == --- release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc (added) +++ release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc Tue Jul 9 16:08:24 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAEBCgAdFiEER3EpqJTxH7zLwCVHutcqeKexsu4FAl0ku8MACgkQutcqeKex +su4Gmw//WcMUGJwO5RgIZWkyBgScoksV/tGfTVyckO8IS0cQcpeTFZ3mzrWkz5Me +8PjGaFvfn687dXV+wOZ2XYLkJB8HmYWhm2uq4ET/7pv2yRkc6BfvJvKA8oSPPcfg +Cbwlc174xQaLWb2a+3rLIT2Q2CCuHy+dc3vL1StZaDibCs7ecDZ+KAf/SMVizYWI +2aialZ0m9xvfIb5d3ENadP+8VcCHzpkdfyDzsNfpLKkYV87C04MKNJHwMRI2wKKd +FNg9PWLkGrPiR5/zWUSmIrcxB5V0SyKRa/7rZdsAgd5oIok3itp8NIUohQNDv7iM +Cvqedq4+Woi1Lm2BRrpx1alwm4cP04iwvzQQXi9YglHzSXbnZd4JN6qbdoLrpV/w +5k2V2x5dPZjWMtRJ/HraL0bCvam7D5ghIUAYvfN5F8c7YUDM28rkDV1aNhSXON8D +YNrf8wJzno3U97q50RmyfU6zkTKC1aV5XwW34ZbSOw9SqTAmY397RjAGnHqcsfNw +NjELgGMPcUmrTDPv+mpXKBNMfFBoKgg09EMy1jyDAmGAhQF5X5rtvzeIbAfprIbG +V+omKApIBHzibq65tw0f5QmhRwrClGOsDnhkbkRxybzzYDFjuocTGiBpTkZ9CNUw +DCXI7o6ZC/8q8zdOi6ACCNIiIzbdQRZoJyVeQmGzBLHa7SryY3Y= +=VnUg +-END PGP SIGNATURE-
svn commit: r34818 - /release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc
Author: ravipesala Date: Tue Jul 9 16:06:19 2019 New Revision: 34818 Log: Checkin 1.5.4 Removed: release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc
svn commit: r34798 - /release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc
Author: ravipesala Date: Mon Jul 8 15:56:07 2019 New Revision: 34798 Log: Checkin 1.5.4 Modified: release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc Modified: release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc == Binary files - no diff available.
[carbondata] branch master updated: [CARBONDATA-3440] Updated alter table DDL to accept upgrade_segments as a compaction type
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 785cc6c [CARBONDATA-3440] Updated alter table DDL to accept upgrade_segments as a compaction type 785cc6c is described below commit 785cc6cbb9aecd7cc90c892a9479855b9b403be4 Author: kunal642 AuthorDate: Tue Jun 11 19:57:23 2019 +0530 [CARBONDATA-3440] Updated alter table DDL to accept upgrade_segments as a compaction type Updated alter table DDL to accept upgrade_segments as a compaction type. made legacy segment distribution round-robin based. This closes #3277 --- .../core/datamap/DistributableDataMapFormat.java | 38 +-- .../apache/carbondata/core/datamap/Segment.java| 13 - .../core/indexstore/ExtendedBlocklet.java | 1 + .../core/indexstore/ExtendedBlockletWrapper.java | 2 +- .../blockletindex/BlockletDataMapFactory.java | 2 +- .../core/metadata/schema/table/CarbonTable.java| 311 ++--- .../apache/carbondata/core/util/SessionParams.java | 4 +- .../apache/carbondata/hadoop/CarbonInputSplit.java | 44 ++- .../carbondata/hadoop/api/CarbonInputFormat.java | 4 +- .../carbondata/indexserver/DataMapJobs.scala | 30 +- .../indexserver/DistributedPruneRDD.scala | 2 +- .../indexserver/DistributedRDDUtils.scala | 57 +++- .../carbondata/indexserver/IndexServer.scala | 37 ++- .../CarbonAlterTableCompactionCommand.scala| 32 ++- .../restructure/AlterTableUpgradeSegmentTest.scala | 50 .../processing/merger/CompactionType.java | 1 + 16 files changed, 385 insertions(+), 243 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java b/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java index cdc9e5c..8426fcb 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java @@ -24,6 +24,7 @@ import java.nio.charset.Charset; import java.util.ArrayList; import java.util.Iterator; import java.util.List; +import java.util.UUID; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.constants.CarbonCommonConstants; @@ -84,9 +85,12 @@ public class DistributableDataMapFormat extends FileInputFormat segmentsToLoad = new ArrayList<>(); segmentsToLoad.add(distributable.getDistributable().getSegment()); List blocklets = new ArrayList<>(); -DataMapChooser dataMapChooser = null; -if (null != filterResolverIntf) { - dataMapChooser = new DataMapChooser(table); -} if (dataMapLevel == null) { TableDataMap defaultDataMap = DataMapStoreManager.getInstance() .getDataMap(table, distributable.getDistributable().getDataMapSchema()); dataMaps = defaultDataMap.getTableDataMaps(distributable.getDistributable()); - if (table.isTransactionalTable()) { -blocklets = defaultDataMap.prune(dataMaps, distributable.getDistributable(), -filterResolverIntf, partitions); - } else { -blocklets = defaultDataMap.prune(segmentsToLoad, new DataMapFilter(filterResolverIntf), -partitions); - } + blocklets = defaultDataMap + .prune(segmentsToLoad, new DataMapFilter(filterResolverIntf), partitions); blocklets = DataMapUtil .pruneDataMaps(table, filterResolverIntf, segmentsToLoad, partitions, blocklets, dataMapChooser); @@ -380,10 +374,6 @@ public class DistributableDataMapFormat extends FileInputFormat getValidSegmentIds() { +List validSegments = new ArrayList<>(); +for (Segment segment : this.validSegments) { + validSegments.add(segment.getSegmentNo()); +} +return validSegments; + } + + public void createDataMapChooser() throws IOException { +if (null != filterResolverIntf) { + this.dataMapChooser = new DataMapChooser(table); +} + } } diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/Segment.java b/core/src/main/java/org/apache/carbondata/core/datamap/Segment.java index 9370be8..ad80182 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/Segment.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/Segment.java @@ -69,11 +69,6 @@ public class Segment implements Serializable, Writable { private long indexSize = 0; - /** - * Whether to cache the segment data maps in executors or not. - */ - private boolean isCacheable = true; - public Segment() { } @@ -287,14 +282,6 @@ public class Segment implements Seri
[carbondata] branch master updated: [CARBONDATA-3398] Handled show cache for index server and MV
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new f708efb [CARBONDATA-3398] Handled show cache for index server and MV f708efb is described below commit f708efb183d0247f9d6a46f7dff6bb4507998f3f Author: kunal642 AuthorDate: Tue May 28 15:30:49 2019 +0530 [CARBONDATA-3398] Handled show cache for index server and MV Added support to show/drop metacahe information from index server. Added tableNotFoundException fix when dbName and tableName have '' in their names, while splitting using '' the dbName was extracted wrongly. Instead now dbname and tableName would be seperated by '-' internally for show cache This closes #3245 --- .../core/datamap/dev/DataMapFactory.java | 4 + .../core/indexstore/BlockletDetailsFetcher.java| 2 - .../blockletindex/BlockletDataMapFactory.java | 13 +- .../bloom/BloomCoarseGrainDataMapFactory.java | 16 + .../hadoop/api/CarbonTableInputFormat.java | 4 +- .../sql/commands/TestCarbonShowCacheCommand.scala | 35 +- .../apache/carbondata/spark/util/CommonUtil.scala | 9 +- .../carbondata/indexserver/DataMapJobs.scala | 2 +- .../indexserver/DistributedShowCacheRDD.scala | 32 +- .../carbondata/indexserver/IndexServer.scala | 9 +- .../scala/org/apache/spark/sql/CarbonEnv.scala | 2 +- .../command/cache/CarbonShowCacheCommand.scala | 465 ++--- .../command/cache/ShowCacheEventListeners.scala| 78 ++-- .../scala/org/apache/spark/util/DataMapUtil.scala | 2 +- 14 files changed, 428 insertions(+), 245 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java b/core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java index 3fa7be6..1116525 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java @@ -192,4 +192,8 @@ public abstract class DataMapFactory { public boolean supportRebuild() { return false; } + + public String getCacheSize() { +return null; + } } diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDetailsFetcher.java b/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDetailsFetcher.java index 5eace3c..ae01e9e 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDetailsFetcher.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDetailsFetcher.java @@ -60,6 +60,4 @@ public interface BlockletDetailsFetcher { * clears the datamap from cache and segmentMap from executor */ void clear(); - - String getCacheSize() throws IOException ; } diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java index cab1b8b..f928976 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java @@ -302,14 +302,19 @@ public class BlockletDataMapFactory extends CoarseGrainDataMapFactory } } - @Override public String getCacheSize() throws IOException { + @Override + public String getCacheSize() { long sum = 0L; int numOfIndexFiles = 0; for (Map.Entry> entry : segmentMap.entrySet()) { for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier : entry.getValue()) { -sum += cache.get(new TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier, -getCarbonTable())).getMemorySize(); -numOfIndexFiles++; +BlockletDataMapIndexWrapper blockletDataMapIndexWrapper = cache.getIfPresent( +new TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier, +getCarbonTable())); +if (blockletDataMapIndexWrapper != null) { + sum += blockletDataMapIndexWrapper.getMemorySize(); + numOfIndexFiles++; +} } } return numOfIndexFiles + ":" + sum; diff --git a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java index 03599a9..f261871 100644 --- a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java +++ b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java @@ -453,4 +453,20 @@ public class BloomCoarseGrainDataMapFactory extends D
[carbondata] branch master updated: [CARBONDATA-3409] Fix Concurrent dataloading Issue with mv
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 9d02092 [CARBONDATA-3409] Fix Concurrent dataloading Issue with mv 9d02092 is described below commit 9d0209226eb3be7735da7cd66d88cece0141e7e5 Author: Indhumathi27 AuthorDate: Fri May 31 15:53:01 2019 +0530 [CARBONDATA-3409] Fix Concurrent dataloading Issue with mv Problem: While performing concurrent dataloading to MV datamap, if any of the loads was not able to get TableStatusLock, then because newLoadName and segmentMap was empty, it was doing full rebuild. Solution: If load was not able to take tablestatuslock, then disable the datamap and return This closes #3252 --- .../main/java/org/apache/carbondata/core/datamap/DataMapProvider.java | 3 +++ 1 file changed, 3 insertions(+) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java index c4ee49b..6a9d2c5 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java @@ -27,6 +27,7 @@ import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.datamap.dev.DataMapFactory; import org.apache.carbondata.core.datamap.status.DataMapSegmentStatusUtil; +import org.apache.carbondata.core.datamap.status.DataMapStatusManager; import org.apache.carbondata.core.locks.ICarbonLock; import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; import org.apache.carbondata.core.metadata.schema.table.CarbonTable; @@ -208,6 +209,8 @@ public abstract class DataMapProvider { "Not able to acquire the lock for Table status updation for table " + dataMapSchema .getRelationIdentifier().getDatabaseName() + "." + dataMapSchema .getRelationIdentifier().getTableName()); +DataMapStatusManager.disableDataMap(dataMapSchema.getDataMapName()); +return false; } } finally { if (carbonLock.unlock()) {
[carbondata] branch master updated: [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new b0d5a5c [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column b0d5a5c is described below commit b0d5a5c792d3cde62da164c4c019beefe8cc2608 Author: akashrn5 AuthorDate: Thu May 30 14:15:44 2019 +0530 [CARBONDATA-3407]Fix distinct, count, Sum query failure when MV is created on single projection column Problem: when MV datamap is created on single column as simple projection, sum, distinct,count queries are failing during sql conversion of modular plan. Basically there is no case to handle the modular plan when we have group by node without alias info and has select child node which is rewritten. Solution: the sql generation cases should take this case also, after that the rewritten query will wrong as alias will be present inside count or aggregate function. So actually rewritten query should be like: SELECT count(limit_fail_dm1_table.limit_fail_designation) AS count(designation) FROM default.limit_fail_dm1_table This closes #3249 --- .../carbondata/mv/datamap/MVAnalyzerRule.scala | 2 +- .../carbondata/mv/rewrite/MVCreateTestCase.scala| 21 + .../carbondata/mv/plans/util/SQLBuildDSL.scala | 2 +- .../carbondata/mv/plans/util/SQLBuilder.scala | 6 +- 4 files changed, 28 insertions(+), 3 deletions(-) diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala index 558a5bb..04bcfbb 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVAnalyzerRule.scala @@ -79,7 +79,7 @@ class MVAnalyzerRule(sparkSession: SparkSession) extends Rule[LogicalPlan] { DataMapClassProvider.MV.getShortName).asInstanceOf[SummaryDatasetCatalog] if (needAnalysis && catalog != null && isValidPlan(plan, catalog)) { val modularPlan = catalog.mvSession.sessionState.rewritePlan(plan).withMVTable - if (modularPlan.find (_.rewritten).isDefined) { + if (modularPlan.find(_.rewritten).isDefined) { val compactSQL = modularPlan.asCompactSQL val analyzed = sparkSession.sql(compactSQL).queryExecution.analyzed analyzed diff --git a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala index 25d2542..e025623 100644 --- a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala +++ b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala @@ -1041,6 +1041,26 @@ class MVCreateTestCase extends QueryTest with BeforeAndAfterAll { assert(verifyMVDataMap(analyzed2, "mvlikedm2")) } + test("test distinct, count, sum on MV with single projection column") { +sql("drop table if exists maintable") +sql("create table maintable(name string, age int, add string) stored by 'carbondata'") +sql("create datamap single_mv using 'mv' as select age from maintable") +sql("insert into maintable select 'pheobe',31,'NY'") +sql("insert into maintable select 'rachel',32,'NY'") +val df1 = sql("select distinct(age) from maintable") +val df2 = sql("select sum(age) from maintable") +val df3 = sql("select count(age) from maintable") +val analyzed1 = df1.queryExecution.analyzed +val analyzed2 = df2.queryExecution.analyzed +val analyzed3 = df3.queryExecution.analyzed +checkAnswer(df1, Seq(Row(31), Row(32))) +checkAnswer(df2, Seq(Row(63))) +checkAnswer(df3, Seq(Row(2))) +assert(TestUtil.verifyMVDataMap(analyzed1, "single_mv")) +assert(TestUtil.verifyMVDataMap(analyzed2, "single_mv")) +assert(TestUtil.verifyMVDataMap(analyzed3, "single_mv")) + } + def verifyMVDataMap(logicalPlan: LogicalPlan, dataMapName: String): Boolean = { val tables = logicalPlan collect { case l: LogicalRelation => l.catalogTable.get @@ -1060,6 +1080,7 @@ class MVCreateTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table IF EXISTS fact_table_parquet") sql("drop table if exists limit_fail") sql("drop table IF EXISTS mv_like") +sql("drop table IF EXISTS maintable") } override def afterAll { diff --git a/datamap/mv/plan/src/main/scala/org/apache/carbondata/mv/plans/util/SQLBuildDSL.scala b
[carbondata] branch master updated: [CARBONDATA-3404] Support CarbonFile API through FileTypeInterface to use custom FileSystem
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 85f1b9f [CARBONDATA-3404] Support CarbonFile API through FileTypeInterface to use custom FileSystem 85f1b9f is described below commit 85f1b9ff4d459248af56002e1523bcb46bf366e4 Author: KanakaKumar AuthorDate: Wed May 29 12:39:06 2019 +0530 [CARBONDATA-3404] Support CarbonFile API through FileTypeInterface to use custom FileSystem Currently CarbonData supports few set of FileSystems like HDFS,S3,VIEWFS schemes. If user configures table path from different file systems apart from supported, FileFactory takes CarbonLocalFile as default and causes errors. This PR proposes to support a API for user to extend CarbonFile which override the required methods from AbstractCarbonFile if a specific handling required for operations like renameForce. This closes #3246 --- .../core/constants/CarbonCommonConstants.java | 5 ++ .../filesystem/AbstractDFSCarbonFile.java | 6 +- .../core/datastore/filesystem/CarbonFile.java | 4 +- .../core/datastore/filesystem/LocalCarbonFile.java | 10 +-- .../datastore/impl/DefaultFileTypeProvider.java| 84 +++--- .../core/datastore/impl/FileFactory.java | 82 + .../core/datastore/impl/FileTypeInterface.java | 23 -- .../carbondata/core/locks/CarbonLockFactory.java | 11 ++- .../core/metadata/schema/SchemaReader.java | 5 +- .../apache/carbondata/core/util/CarbonUtil.java| 19 ++--- .../store/impl/FileFactoryImplUnitTest.java| 55 -- .../filesystem/store/impl/TestFileProvider.java| 59 +++ .../dblocation/DBLocationCarbonTableTestCase.scala | 4 +- 13 files changed, 282 insertions(+), 85 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 8b39343..1201e1a 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1601,6 +1601,11 @@ public final class CarbonCommonConstants { public static final String S3_SECRET_KEY = "fs.s3.awsSecretAccessKey"; /** + * Configuration Key for custom file provider + */ + public static final String CUSTOM_FILE_PROVIDER = "carbon.fs.custom.file.provider"; + + /** * FS_DEFAULT_FS */ @CarbonProperty diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java b/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java index a90648e..1470c05 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java @@ -404,8 +404,8 @@ public abstract class AbstractDFSCarbonFile implements CarbonFile { return new DataOutputStream(new BufferedOutputStream(outputStream)); } - @Override public boolean isFileExist(String filePath, FileFactory.FileType fileType, - boolean performFileCheck) throws IOException { + @Override public boolean isFileExist(String filePath, boolean performFileCheck) + throws IOException { filePath = filePath.replace("\\", "/"); Path path = new Path(filePath); FileSystem fs = path.getFileSystem(FileFactory.getConfiguration()); @@ -416,7 +416,7 @@ public abstract class AbstractDFSCarbonFile implements CarbonFile { } } - @Override public boolean isFileExist(String filePath, FileFactory.FileType fileType) + @Override public boolean isFileExist(String filePath) throws IOException { filePath = filePath.replace("\\", "/"); Path path = new Path(filePath); diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/CarbonFile.java b/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/CarbonFile.java index be08338..c3c5be5 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/CarbonFile.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/CarbonFile.java @@ -139,10 +139,10 @@ public interface CarbonFile { DataOutputStream getDataOutputStream(String path, FileFactory.FileType fileType, int bufferSize, String compressor) throws IOException; - boolean isFileExist(String filePath, FileFactory.FileType fileType, boolean performFileCheck) + boolean isFileExist(String filePath, boolean performFileCheck) throws IOException; - boolean isFileExist(String filePath, Fi
[carbondata] branch master updated: [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 6fa7fb4 [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns 6fa7fb4 is described below commit 6fa7fb4f94ca3082113d0b47b109bdd16cf046a3 Author: QiangCai AuthorDate: Wed May 15 16:46:20 2019 +0800 [CARBONDATA-3350] Enhance custom compaction to resort old single segment by new sort_columns This closes #3202 --- .../blockletindex/BlockletDataMapFactory.java | 2 +- .../TableStatusReadCommittedScope.java | 2 +- .../spark/rdd/CarbonTableCompactor.scala | 21 +++- .../processing/merger/CarbonCompactionUtil.java| 132 + 4 files changed, 128 insertions(+), 29 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java index 446507f..cab1b8b 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java @@ -167,7 +167,7 @@ public class BlockletDataMapFactory extends CoarseGrainDataMapFactory return dataMaps; } - private Set getTableBlockIndexUniqueIdentifiers(Segment segment) + public Set getTableBlockIndexUniqueIdentifiers(Segment segment) throws IOException { Set tableBlockIndexUniqueIdentifiers = segmentMap.get(segment.getSegmentNo()); diff --git a/core/src/main/java/org/apache/carbondata/core/readcommitter/TableStatusReadCommittedScope.java b/core/src/main/java/org/apache/carbondata/core/readcommitter/TableStatusReadCommittedScope.java index 5622efe..e4fd6f4 100644 --- a/core/src/main/java/org/apache/carbondata/core/readcommitter/TableStatusReadCommittedScope.java +++ b/core/src/main/java/org/apache/carbondata/core/readcommitter/TableStatusReadCommittedScope.java @@ -55,7 +55,7 @@ public class TableStatusReadCommittedScope implements ReadCommittedScope { } public TableStatusReadCommittedScope(AbsoluteTableIdentifier identifier, - LoadMetadataDetails[] loadMetadataDetails, Configuration configuration) throws IOException { + LoadMetadataDetails[] loadMetadataDetails, Configuration configuration) { this.identifier = identifier; this.configuration = configuration; this.loadMetadataDetails = loadMetadataDetails; diff --git a/integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala b/integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala index afe2927..4c7dd95 100644 --- a/integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala +++ b/integration/spark2/src/main/scala/org/apache/carbondata/spark/rdd/CarbonTableCompactor.scala @@ -29,13 +29,15 @@ import org.apache.spark.sql.execution.command.{CarbonMergerMapping, CompactionCa import org.apache.spark.util.MergeIndexUtil import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.constants.SortScopeOptions.SortScope import org.apache.carbondata.core.datamap.{DataMapStoreManager, Segment} +import org.apache.carbondata.core.datastore.impl.FileFactory import org.apache.carbondata.core.metadata.SegmentFileStore import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, SegmentStatusManager} import org.apache.carbondata.core.util.path.CarbonTablePath import org.apache.carbondata.events._ import org.apache.carbondata.processing.loading.model.CarbonLoadModel -import org.apache.carbondata.processing.merger.{CarbonDataMergerUtil, CompactionType} +import org.apache.carbondata.processing.merger.{CarbonCompactionUtil, CarbonDataMergerUtil, CompactionType} import org.apache.carbondata.spark.MergeResultImpl /** @@ -50,6 +52,21 @@ class CarbonTableCompactor(carbonLoadModel: CarbonLoadModel, operationContext: OperationContext) extends Compactor(carbonLoadModel, compactionModel, executor, sqlContext, storeLocation) { + private def needSortSingleSegment( + loadsToMerge: java.util.List[LoadMetadataDetails]): Boolean = { +// support to resort old segment with old sort_columns +if (CompactionType.CUSTOM == compactionModel.compactionType && +loadsToMerge.size() == 1 && +SortScope.NO_SORT != compactionModel.carbonTable.getSortScope) { + !CarbonCompactionUtil.isSortedByCurrentSortColumns( +carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable, +loadsToMerge.get(0), +FileFactory.getConfiguration) +} else { + false +} + } + override def executeCo
[carbondata] branch master updated: [CARBONDATA-3403]Fix MV is not working for like and filter AND and OR queries
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 32f5b50 [CARBONDATA-3403]Fix MV is not working for like and filter AND and OR queries 32f5b50 is described below commit 32f5b505509731ea1f7ff0fde2c7e25aea4925b4 Author: akashrn5 AuthorDate: Tue May 28 11:55:13 2019 +0530 [CARBONDATA-3403]Fix MV is not working for like and filter AND and OR queries Problem: MV table is not hit during query for like and filter AND and OR queries, When we have like or filter queries, the queries will have literals which will be case sensitive to fetch the data. But dring MV modular plan generation, we register the schema for datamap where we convert the complete datamap query to lower case, which will even convert the literals. So after modular plan generation of user query, during matching pahse of modular plan of datamap and user query, the semantic equals fails for literals, that is Attribute reference type. Solution: Do not convert the query to lower case when registering schema, that is when adding the preagg fun to query. So it will be handled for MV For preaggregate, instead converting complete query to lowercase, convert to lower case during ColumnTableRelation generation and createField for preaggregate generation, so it will be handled for preaggregate. This closes #3242 --- .../carbondata/mv/rewrite/MVCreateTestCase.scala | 20 .../timeseries/TestTimeSeriesCreateTable.scala | 2 +- .../command/preaaggregate/PreAggregateUtil.scala | 5 +++-- .../spark/sql/parser/CarbonSpark2SqlParser.scala | 4 ++-- 4 files changed, 26 insertions(+), 5 deletions(-) diff --git a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala index 5e12ad3..25d2542 100644 --- a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala +++ b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala @@ -1022,6 +1022,25 @@ class MVCreateTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table if exists all_table") } + test(" test MV with like queries and filter queries") { +sql("drop table if exists mv_like") +sql( + "create table mv_like(name string, age int, address string, Country string, id int) stored by 'carbondata'") +sql( + "create datamap mvlikedm1 using 'mv' as select name,address from mv_like where Country NOT LIKE 'US' group by name,address") +sql( + "create datamap mvlikedm2 using 'mv' as select name,address,Country from mv_like where Country = 'US' or Country = 'China' group by name,address,Country") +sql("insert into mv_like select 'chandler', 32, 'newYork', 'US', 5") +val df1 = sql( + "select name,address from mv_like where Country NOT LIKE 'US' group by name,address") +val analyzed1 = df1.queryExecution.analyzed +assert(verifyMVDataMap(analyzed1, "mvlikedm1")) +val df2 = sql( + "select name,address,Country from mv_like where Country = 'US' or Country = 'China' group by name,address,Country") +val analyzed2 = df2.queryExecution.analyzed +assert(verifyMVDataMap(analyzed2, "mvlikedm2")) + } + def verifyMVDataMap(logicalPlan: LogicalPlan, dataMapName: String): Boolean = { val tables = logicalPlan collect { case l: LogicalRelation => l.catalogTable.get @@ -1040,6 +1059,7 @@ class MVCreateTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table IF EXISTS fact_streaming_table2") sql("drop table IF EXISTS fact_table_parquet") sql("drop table if exists limit_fail") +sql("drop table IF EXISTS mv_like") } override def afterAll { diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/timeseries/TestTimeSeriesCreateTable.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/timeseries/TestTimeSeriesCreateTable.scala index d68195c..eabe0f5 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/timeseries/TestTimeSeriesCreateTable.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/timeseries/TestTimeSeriesCreateTable.scala @@ -517,7 +517,7 @@ class TestTimeSeriesCreateTable extends QueryTest with BeforeAndAfterAll with Be |GROUP BY dataTime """.stripMargin) } -assert(e.getMessage.contains(&quo
[carbondata] branch master updated: [CARBONDATA-3399] Implement executor id based distribution for indexserver
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new fa3e392 [CARBONDATA-3399] Implement executor id based distribution for indexserver fa3e392 is described below commit fa3e392c17ff1867baa6ac1ae918346e76ac1add Author: kunal642 AuthorDate: Mon May 27 12:41:54 2019 +0530 [CARBONDATA-3399] Implement executor id based distribution for indexserver This closes #3237 --- .../apache/spark/sql/hive/DistributionUtil.scala | 8 + .../indexserver/DistributedPruneRDD.scala | 9 +- .../indexserver/DistributedRDDUtils.scala | 218 - .../indexserver/DistributedRDDUtilsTest.scala | 115 +++ 4 files changed, 300 insertions(+), 50 deletions(-) diff --git a/integration/spark-common/src/main/scala/org/apache/spark/sql/hive/DistributionUtil.scala b/integration/spark-common/src/main/scala/org/apache/spark/sql/hive/DistributionUtil.scala index 0861d2b..4256777 100644 --- a/integration/spark-common/src/main/scala/org/apache/spark/sql/hive/DistributionUtil.scala +++ b/integration/spark-common/src/main/scala/org/apache/spark/sql/hive/DistributionUtil.scala @@ -89,6 +89,14 @@ object DistributionUtil { } } + def getExecutors(sparkContext: SparkContext): Map[String, Seq[String]] = { +val bm = sparkContext.env.blockManager +bm.master.getPeers(bm.blockManagerId) + .groupBy(blockManagerId => blockManagerId.host).map { + case (host, blockManagerIds) => (host, blockManagerIds.map(_.executorId)) +} + } + private def getLocalhostIPs = { val iface = NetworkInterface.getNetworkInterfaces var addresses: List[InterfaceAddress] = List.empty diff --git a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedPruneRDD.scala b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedPruneRDD.scala index d2dab2d..607f923 100644 --- a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedPruneRDD.scala +++ b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedPruneRDD.scala @@ -38,7 +38,7 @@ import org.apache.carbondata.core.util.CarbonProperties import org.apache.carbondata.spark.rdd.CarbonRDD import org.apache.carbondata.spark.util.CarbonScalaUtil -private[indexserver] class DataMapRDDPartition(rddId: Int, idx: Int, val inputSplit: InputSplit) +class DataMapRDDPartition(rddId: Int, idx: Int, val inputSplit: InputSplit) extends Partition { override def index: Int = idx @@ -50,8 +50,6 @@ private[indexserver] class DistributedPruneRDD(@transient private val ss: SparkS dataMapFormat: DistributableDataMapFormat) extends CarbonRDD[(String, ExtendedBlocklet)](ss, Nil) { - val executorsList: Set[String] = DistributionUtil.getNodeList(ss.sparkContext).toSet - @transient private val LOGGER = LogServiceFactory.getLogService(classOf[DistributedPruneRDD] .getName) @@ -106,7 +104,8 @@ private[indexserver] class DistributedPruneRDD(@transient private val ss: SparkS throw new java.util.NoSuchElementException("End of stream") } havePair = false -val executorIP = SparkEnv.get.blockManager.blockManagerId.host +val executorIP = s"${ SparkEnv.get.blockManager.blockManagerId.host }_${ + SparkEnv.get.blockManager.blockManagerId.executorId}" val value = (executorIP + "_" + cacheSize.toString, reader.getCurrentValue) value } @@ -125,6 +124,8 @@ private[indexserver] class DistributedPruneRDD(@transient private val ss: SparkS f => new DataMapRDDPartition(id, f._2, f._1) }.toArray } else { + val executorsList: Map[String, Seq[String]] = DistributionUtil +.getExecutors(ss.sparkContext) val (response, time) = CarbonScalaUtil.logTime { DistributedRDDUtils.getExecutors(splits.toArray, executorsList, dataMapFormat .getCarbonTable.getTableUniqueName, id) diff --git a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala index c381f80..c7632be 100644 --- a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala +++ b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/DistributedRDDUtils.scala @@ -20,7 +20,6 @@ import java.util.concurrent.ConcurrentHashMap import scala.collection.JavaConverters._ -import org.apache.commons.lang.StringUtils import org.apache.hadoop.mapreduce.InputSplit import org.apache.spark.Partition @@ -29,14 +28,14 @@ import org.apache.carbondata.core.datamap.dev.expr.DataMapDistributableWrapper object Dis
[carbondata] annotated tag apache-carbondata-1.5.4 created (now 5a55d9b)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to annotated tag apache-carbondata-1.5.4 in repository https://gitbox.apache.org/repos/asf/carbondata.git. at 5a55d9b (tag) tagging 1f2e184b81bef4e861b4dd32be94dc50bada6b68 (commit) replaces apache-carbondata-1.5.3-rc1 by ravipesala on Fri May 17 14:27:20 2019 +0530 - Log - [maven-release-plugin] copy for tag apache-carbondata-1.5.4-rc1 --- No new revisions were added by this update.
[carbondata] annotated tag apache-carbondata-1.5.4-rc1 deleted (was 5a55d9b)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to annotated tag apache-carbondata-1.5.4-rc1 in repository https://gitbox.apache.org/repos/asf/carbondata.git. *** WARNING: tag apache-carbondata-1.5.4-rc1 was deleted! *** tag was 5a55d9b The revisions that were on this annotated tag are still contained in other references; therefore, this change does not discard any commits from the repository.
svn commit: r34308 - /release/carbondata/1.5.4/
Author: ravipesala Date: Wed May 29 11:27:17 2019 New Revision: 34308 Log: Checkin 1.5.4 Added: release/carbondata/1.5.4/ release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar (with props) release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar.asc release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar.sha512 release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.2.1-hadoop2.8.3.jar (with props) release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.2.1-hadoop2.8.3.jar.asc release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.2.1-hadoop2.8.3.jar.sha512 release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.3.2-hadoop2.8.3.jar (with props) release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.3.2-hadoop2.8.3.jar.asc release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.3.2-hadoop2.8.3.jar.sha512 release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip (with props) release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.asc (with props) release/carbondata/1.5.4/apache-carbondata-1.5.4-source-release.zip.sha512 Added: release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar == Binary file - no diff available. Propchange: release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar -- svn:mime-type = application/octet-stream Added: release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar.asc == --- release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar.asc (added) +++ release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar.asc Wed May 29 11:27:17 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAEBCgAdFiEER3EpqJTxH7zLwCVHutcqeKexsu4FAlzuasgACgkQutcqeKex +su6Hyw//X2dc1cvZRrldHyobNdhqjAqsJHq6rt+YecN4bTQm90gwPKoxClfefDT9 +N0aPmhAqjFFbd+yI8R51aGQyPqyFiew2y/2xsjRUDn8TILUcd040NGfk9HGTep7B +/1KE6REsxQGGfM1a0tY0tn3yMwciKLUpXimwgi8LNmhcXiqjXQJJa0YSnDyM/U+O +gUp5Ne0skI7Q5M8hbsknXcqVmbiWIqquncqUM3qNE84VPcgt04bo6IJWJ35A0vmd +a1zwqtYD7MvP8E2pTuv4F/47u2XqxO/ho+G8qtj6MKp2n/jytw73qkH/N93HIJXc +0KlFVWPuggxgRp1tTY1p0D68hx2L5aIkJOFISQlAMycslaFIq0YcoF4prtc/CrdN +JFDjJ7UFUdaOUmmE9n7R+XilD/usjiC2wxiIl2SELFoO2Gf4fnhbzs4Qdm19LLkP +8ws6tJ3fkXPaRJKC9Vbl0q86UF478/GMHPZN9f7m0P2ulY+GYMxLXTyL/SF2rTt4 +b45S2UEpDwIPPzww4Hq2wvOOZi9eiLGT4+YMfHXGfthI3tCBQMQOP5ccO9SB3PY2 +Ea1/x5WtpxDMVax/zE3/5ZpjInhKXgRo9a0eRW70I0WST+ObxEuXsRoTS5fLPZuZ +fpHpNQlxBct8EMuOz/DnrPva7HhABOm9VCm4zBlW4Zd9XgaCK5k= +=OJKq +-END PGP SIGNATURE- Added: release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar.sha512 == --- release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar.sha512 (added) +++ release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar.sha512 Wed May 29 11:27:17 2019 @@ -0,0 +1 @@ +c107e1d21aaaf2d50c8ef765dfdd99ff62e93cdad942c3598f4d110712ae931dfeab7f0de22090eb6afc8bf6f25af7d174456b7c9749201e9c1c83afd38fe90d apache-carbondata-1.5.4-bin-spark2.1.0-hadoop2.8.3.jar Added: release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.2.1-hadoop2.8.3.jar == Binary file - no diff available. Propchange: release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.2.1-hadoop2.8.3.jar -- svn:mime-type = application/octet-stream Added: release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.2.1-hadoop2.8.3.jar.asc == --- release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.2.1-hadoop2.8.3.jar.asc (added) +++ release/carbondata/1.5.4/apache-carbondata-1.5.4-bin-spark2.2.1-hadoop2.8.3.jar.asc Wed May 29 11:27:17 2019 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAEBCgAdFiEER3EpqJTxH7zLwCVHutcqeKexsu4FAlzuas8ACgkQutcqeKex +su6nnRAArLDfJCuNiXcrzzJSNxfoNx2WrgBFYsYs+UsQlvmY01/604TvU3jXBwUL +9eg0OK0iDO3Qp7oD0XE/Pzq4ZR8pOwYBgPucEXm9UYBp43cdIAUa+MUZhsYowMJn +hyY99cvT6krS0N+Y6VQAdiC4QWODlUPtD/blqkuEQehHRtUOJvpuQXleg2aBtEVB +rB5E9zLsJ1bSepXyMRXM86BYEUHrN/E037OGMdLrjt+mRK2kc0wwtAKsrD8qfckN +TAFr9vYZbv3EgA+5p+8dGKIbYJrTYJyFxHoDtm9/COwcaGML+6y2aJPZwmxL7B4G +V+xmhKnQ2JandQYu80Gdy94QVKhJ3juG2K6Q+RJza6ZKUhmsdVpZhWewPKrxPlS9 +/0kSE9cxyEYm5ERhlN95xNK/B37LvGZNuSVfFD1IywHR/8CnJj8QJTutFMfpwxM6 +jhNthmGLgpux+ut4wdMruwfV3/rodNYtKjPfJBDMW/z21LG1ory16b+HZfIZBAL1 +DzGsWX8DeN+ZiZpX9+pjETEtUeAyuP2fmrZwWBRwf8gGh9mTe
[carbondata] branch master updated: [CARBONDATA-3402] Fix block complex data type and validate dmproperties for MV
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 1023ba9 [CARBONDATA-3402] Fix block complex data type and validate dmproperties for MV 1023ba9 is described below commit 1023ba951cc623b8f312e66fa288744705a928de Author: Indhumathi27 AuthorDate: Mon May 27 18:44:33 2019 +0530 [CARBONDATA-3402] Fix block complex data type and validate dmproperties for MV This PR includes, Blocked complex data types with mv Fixed to_date function while creating mv datamap Added inheriting Global dictionary from parent table to child table for preaggregate & mv Validate DMproperties for MV This closes #3241 --- .../apache/carbondata/mv/datamap/MVHelper.scala| 25 +++- .../org/apache/carbondata/mv/datamap/MVUtil.scala | 34 +++-- .../carbondata/mv/rewrite/MVCoalesceTestCase.scala | 16 +-- .../mv/rewrite/MVCountAndCaseTestCase.scala| 9 +- .../carbondata/mv/rewrite/MVCreateTestCase.scala | 137 ++--- .../mv/rewrite/MVIncrementalLoadingTestcase.scala | 37 +++--- .../mv/rewrite/MVMultiJoinTestCase.scala | 11 +- .../carbondata/mv/rewrite/MVRewriteTestCase.scala | 9 +- .../carbondata/mv/rewrite/MVSampleTestCase.scala | 25 ++-- .../carbondata/mv/rewrite/MVTPCDSTestCase.scala| 28 ++--- .../carbondata/mv/rewrite/MVTpchTestCase.scala | 35 +++--- .../mv/rewrite/TestAllOperationsOnMV.scala | 61 + .../mv/rewrite/TestPartitionWithMV.scala | 11 +- .../preaggregate/TestPreAggCreateCommand.scala | 8 +- .../spark/sql/catalyst/CarbonDDLSqlParser.scala| 3 +- .../command/management/CarbonLoadDataCommand.scala | 12 +- .../scala/org/apache/spark/util/DataMapUtil.scala | 58 ++--- 17 files changed, 282 insertions(+), 237 deletions(-) diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala index 8d60a06..57082d7 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala @@ -32,6 +32,7 @@ import org.apache.spark.sql.execution.command.{Field, PartitionerField, TableMod import org.apache.spark.sql.execution.command.table.{CarbonCreateTableCommand, CarbonDropTableCommand} import org.apache.spark.sql.execution.datasources.LogicalRelation import org.apache.spark.sql.parser.CarbonSpark2SqlParser +import org.apache.spark.sql.types.{ArrayType, MapType, StructType} import org.apache.spark.util.{DataMapUtil, PartitionUtils} import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException @@ -60,6 +61,7 @@ object MVHelper { s"MV datamap does not support streaming" ) } +MVUtil.validateDMProperty(dmProperties) val updatedQuery = new CarbonSpark2SqlParser().addPreAggFunction(queryString) val query = sparkSession.sql(updatedQuery) val logicalPlan = MVHelper.dropDummFuc(query.queryExecution.analyzed) @@ -71,6 +73,11 @@ object MVHelper { val updatedQueryWithDb = validateMVQuery(sparkSession, logicalPlan) val fullRebuild = isFullReload(logicalPlan) val fields = logicalPlan.output.map { attr => + if (attr.dataType.isInstanceOf[ArrayType] || attr.dataType.isInstanceOf[StructType] || + attr.dataType.isInstanceOf[MapType]) { +throw new UnsupportedOperationException( + s"MV datamap is unsupported for ComplexData type column: " + attr.name) + } val name = updateColumnName(attr) val rawSchema = '`' + name + '`' + ' ' + attr.dataType.typeName if (attr.dataType.typeName.startsWith("decimal")) { @@ -312,13 +319,19 @@ object MVHelper { modularPlan.asCompactSQL } + def getUpdatedName(name: String): String = { +val updatedName = name.replace("(", "_") + .replace(")", "") + .replace(" ", "_") + .replace("=", "") + .replace(",", "") + .replace(".", "_") + .replace("`", "") +updatedName + } + def updateColumnName(attr: Attribute): String = { -val name = - attr.name.replace("(", "_") -.replace(")", "") -.replace(" ", "_") -.replace("=", "") -.replace(",", "") +val name = getUpdatedName(attr.name) attr.qualifier.map(qualifier => qualifier + "_" + name).getOrElse(name) } diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondat
[carbondata] branch master updated: [CARBONDATA-3393] Merge Index Job Failure should not trigger the merge index job again. Exception should be propagated to the caller.
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 706e8d3 [CARBONDATA-3393] Merge Index Job Failure should not trigger the merge index job again. Exception should be propagated to the caller. 706e8d3 is described below commit 706e8d34c40da97e0d123f58eac3f6da3953f4d0 Author: dhatchayani AuthorDate: Tue May 28 19:29:46 2019 +0530 [CARBONDATA-3393] Merge Index Job Failure should not trigger the merge index job again. Exception should be propagated to the caller. Problem: If the merge index job is failed, the same job is triggered again. Solution: Merge index job exception has to be propagated to the caller. It should not trigger the same job again. Changes: (1) Merge index job failure will not be propagated to the caller. And will only be LOGGED. (2) Implement a new method to write the SegmentFile based on the current load timestamp. This helps in case of merge index failures and writing merge index for old store. This closes #3226 --- .../core/constants/CarbonCommonConstants.java | 12 +++ .../carbondata/core/metadata/SegmentFileStore.java | 21 +++ .../org/apache/spark/rdd/CarbonMergeFilesRDD.scala | 41 +++--- 3 files changed, 62 insertions(+), 12 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index aa9dd05..311019c 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -346,6 +346,18 @@ public final class CarbonCommonConstants { public static final String CARBON_MERGE_INDEX_IN_SEGMENT_DEFAULT = "true"; /** + * It is the user defined property to specify whether to throw exception or not in case + * if the MERGE INDEX JOB is failed. Default value - TRUE + * TRUE - throws exception and fails the corresponding LOAD job + * FALSE - Logs the exception and continue with the LOAD + */ + @CarbonProperty + public static final String CARBON_MERGE_INDEX_FAILURE_THROW_EXCEPTION = + "carbon.merge.index.failure.throw.exception"; + + public static final String CARBON_MERGE_INDEX_FAILURE_THROW_EXCEPTION_DEFAULT = "true"; + + /** * property to be used for specifying the max byte limit for string/varchar data type till * where storing min/max in data file will be considered */ diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java b/core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java index 69e5dc3..cbf58c7 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/SegmentFileStore.java @@ -139,12 +139,32 @@ public class SegmentFileStore { */ public static String writeSegmentFile(CarbonTable carbonTable, String segmentId, String UUID) throws IOException { +return writeSegmentFile(carbonTable, segmentId, UUID, null); + } + + /** + * Write segment file to the metadata folder of the table selecting only the current load files + * + * @param carbonTable + * @param segmentId + * @param UUID + * @param currentLoadTimeStamp + * @return + * @throws IOException + */ + public static String writeSegmentFile(CarbonTable carbonTable, String segmentId, String UUID, + final String currentLoadTimeStamp) throws IOException { String tablePath = carbonTable.getTablePath(); boolean supportFlatFolder = carbonTable.isSupportFlatFolder(); String segmentPath = CarbonTablePath.getSegmentPath(tablePath, segmentId); CarbonFile segmentFolder = FileFactory.getCarbonFile(segmentPath); CarbonFile[] indexFiles = segmentFolder.listFiles(new CarbonFileFilter() { @Override public boolean accept(CarbonFile file) { +if (null != currentLoadTimeStamp) { + return file.getName().contains(currentLoadTimeStamp) && ( + file.getName().endsWith(CarbonTablePath.INDEX_FILE_EXT) || file.getName() + .endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)); +} return (file.getName().endsWith(CarbonTablePath.INDEX_FILE_EXT) || file.getName() .endsWith(CarbonTablePath.MERGE_INDEX_FILE_EXT)); } @@ -185,6 +205,7 @@ public class SegmentFileStore { return null; } + /** * Move the loaded data from source folder to destination folder. */ diff --git a/integration/spark-common/src/main/scala/org/apache/spark/rdd/CarbonMergeFilesRDD.scala b/integration/spark-common/src/main/scala/org/apac
[carbondata] branch master updated: [DOCUMENTATION] Document change for GLOBAL_SORT_PARTITIONS
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 10cbf4e [DOCUMENTATION] Document change for GLOBAL_SORT_PARTITIONS 10cbf4e is described below commit 10cbf4ec018de4671284e9f6974d05b22609f3a0 Author: manishnalla1994 AuthorDate: Mon May 27 12:09:04 2019 +0530 [DOCUMENTATION] Document change for GLOBAL_SORT_PARTITIONS Documentation change done for Global Sort Partitions during Range Column DataLoad/Compaction. This closes #3234 --- docs/dml-of-carbondata.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/dml-of-carbondata.md b/docs/dml-of-carbondata.md index 6ec0520..3e2a22d 100644 --- a/docs/dml-of-carbondata.md +++ b/docs/dml-of-carbondata.md @@ -281,6 +281,8 @@ CarbonData DML statements are documented here,which includes: If the SORT_SCOPE is defined as GLOBAL_SORT, then user can specify the number of partitions to use while shuffling data for sort using GLOBAL_SORT_PARTITIONS. If it is not configured, or configured less than 1, then it uses the number of map task as reduce task. It is recommended that each reduce task deal with 512MB-1GB data. For RANGE_COLUMN, GLOBAL_SORT_PARTITIONS is used to specify the number of range partitions also. +GLOBAL_SORT_PARTITIONS should be specified optimally during RANGE_COLUMN LOAD because if a higher number is configured then the load time may be less but it will result in creation of more files which would degrade the query and compaction performance. +Conversely, if less partitions are configured then the load performance may degrade due to less use of parallelism but the query and compaction will become faster. Hence the user may choose optimal number depending on the use case. ``` OPTIONS('GLOBAL_SORT_PARTITIONS'='2') ```
[carbondata] branch master updated: [CARBONDATA-3396] Range Compaction Data Mismatch Fix
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new ce40c64 [CARBONDATA-3396] Range Compaction Data Mismatch Fix ce40c64 is described below commit ce40c64f552d02417400111e9865ff77a05d4fbd Author: manishnalla1994 AuthorDate: Mon May 27 11:41:10 2019 +0530 [CARBONDATA-3396] Range Compaction Data Mismatch Fix Problem : When we have to compact the data second time and the ranges made first time have data in more than one file/blocklet, then while compacting second time if the first blocklet does not contain any record then the other files are also skipped. Also, Global Sort and Local Sort with Range Column were taking different time for same data load and compaction as during write step we give only 1 core to Global Sort. Solution : For the first issue we are reading all the blocklets of a given range and then breaking only when the batch size is full. For the second issue in case of range column both the sort scopes will now take same number of cores and behave similarly. Also changed the number of tasks to be launched during the compaction, now based on the number of tasks during load. This closes #3233 --- .../core/constants/CarbonCommonConstants.java | 4 .../AbstractDetailQueryResultIterator.java | 14 + .../scan/result/iterator/RawResultIterator.java| 11 +-- .../carbondata/core/util/CarbonProperties.java | 23 -- .../carbondata/spark/rdd/CarbonMergerRDD.scala | 18 - .../processing/merger/CarbonCompactionUtil.java| 11 +++ .../store/CarbonFactDataHandlerModel.java | 3 ++- 7 files changed, 53 insertions(+), 31 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index e78ea17..aa9dd05 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1193,10 +1193,6 @@ public final class CarbonCommonConstants { public static final String CARBON_RANGE_COLUMN_SCALE_FACTOR_DEFAULT = "3"; - public static final String CARBON_ENABLE_RANGE_COMPACTION = "carbon.enable.range.compaction"; - - public static final String CARBON_ENABLE_RANGE_COMPACTION_DEFAULT = "false"; - // // Query parameter start here // diff --git a/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java b/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java index f39e549..d7f2c0b 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java @@ -24,7 +24,6 @@ import java.util.concurrent.ExecutorService; import org.apache.carbondata.common.CarbonIterator; import org.apache.carbondata.common.logging.LogServiceFactory; -import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.datastore.DataRefNode; import org.apache.carbondata.core.datastore.FileReader; import org.apache.carbondata.core.datastore.block.AbstractIndex; @@ -89,18 +88,7 @@ public abstract class AbstractDetailQueryResultIterator extends CarbonIterato AbstractDetailQueryResultIterator(List infos, QueryModel queryModel, ExecutorService execService) { -String batchSizeString = - CarbonProperties.getInstance().getProperty(CarbonCommonConstants.DETAIL_QUERY_BATCH_SIZE); -if (null != batchSizeString) { - try { -batchSize = Integer.parseInt(batchSizeString); - } catch (NumberFormatException ne) { -LOGGER.error("Invalid inmemory records size. Using default value"); -batchSize = CarbonCommonConstants.DETAIL_QUERY_BATCH_SIZE_DEFAULT; - } -} else { - batchSize = CarbonCommonConstants.DETAIL_QUERY_BATCH_SIZE_DEFAULT; -} +batchSize = CarbonProperties.getQueryBatchSize(); this.recorder = queryModel.getStatisticsRecorder(); this.blockExecutionInfos = infos; this.fileReader = FileFactory.getFileHolder( diff --git a/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/RawResultIterator.java b/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/RawResultIterator.java index 4d471b6..911a
[carbondata] branch master updated: [CARBONDATA-3397]Remove SparkUnknown Expression to Index Server
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 15bae6e [CARBONDATA-3397]Remove SparkUnknown Expression to Index Server 15bae6e is described below commit 15bae6e5848bc83d4a6f65499fe7dacf88f5a67a Author: BJangir AuthorDate: Mon May 27 14:55:39 2019 +0530 [CARBONDATA-3397]Remove SparkUnknown Expression to Index Server Problem if Query has UDF and it is registered to the Main driver Since UDF function will not be available in Index server , query will be failed in Indexserver (with NoClassDefincationFound). Solution UDF are SparkUnkownFilter(RowLevelFilterExecuterImpl) so Remove the SparkUnknown Expression because anyway for pruning we select all blocks. org.apache.carbondata.core.scan.filter.executer.RowLevelFilterExecuterImpl#isScanRequired. Supply all the UDFs functions and it's related lambda expressions to IndexServer also. But it has below issues a. Spark FunctionRegistry is not writable b. sending All functions from Main Server to Index server will be costly(in Size) & no way to find implicit function and explicit user created functions. So Solution 1 is adopted. This closes #3238 --- .../core/datamap/DistributableDataMapFormat.java | 8 .../scan/filter/FilterExpressionProcessor.java | 43 ++ .../carbondata/indexserver/DataMapJobs.scala | 39 3 files changed, 90 insertions(+) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java b/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java index f76cfec..57540e4 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java @@ -334,4 +334,12 @@ public class DistributableDataMapFormat extends FileInputFormat
[carbondata] branch master updated: [CARBONDATA-3400] Support IndexSever for Spark-Shell in secure Mode(kerberos)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new bf096e1 [CARBONDATA-3400] Support IndexSever for Spark-Shell in secure Mode(kerberos) bf096e1 is described below commit bf096e128f35865c7cd46cd5a5058c8e5227d773 Author: BJangir AuthorDate: Mon May 27 15:26:21 2019 +0530 [CARBONDATA-3400] Support IndexSever for Spark-Shell in secure Mode(kerberos) Problem In spark-shell OR Spark-Submit mode, Application user and IndexServer User are different . Application user is based on Kinit user OR based on spark.yarn.principle user whereas Indexserver user is based on spark.carbon.indexserver.principal . it is possible that both are different as Indexserver should have it's own authentication principle and should not depend on Application principle so that any application's Query(Thrifserver,Spark-shell,Spark-sql,Spark-Submit) can be served from IndexServer. Solution Authenticate the IndexServer by it's own principle and keytab. keytab is required so that long run application (client and indexserver ) does not impacted on token expire. Note:- Spark-default.conf of Thriftserver (beeline), spark-submit ,spark-sql should have both spark.carbon.indexserver.principal and spark.carbon.indexserver.keytab. This closes #3240 --- .../scala/org/apache/carbondata/indexserver/IndexServer.scala| 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala index e738fb3..f066095 100644 --- a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala +++ b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/IndexServer.scala @@ -167,9 +167,16 @@ object IndexServer extends ServerInterface { */ def getClient: ServerInterface = { import org.apache.hadoop.ipc.RPC +val indexServerUser = sparkSession.sparkContext.getConf + .get("spark.carbon.indexserver.principal", "") +val indexServerKeyTab = sparkSession.sparkContext.getConf + .get("spark.carbon.indexserver.keytab", "") +val ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI(indexServerUser, + indexServerKeyTab) +LOGGER.info("Login successful for user " + indexServerUser); RPC.getProxy(classOf[ServerInterface], RPC.getProtocolVersion(classOf[ServerInterface]), - new InetSocketAddress(serverIp, serverPort), UserGroupInformation.getLoginUser, + new InetSocketAddress(serverIp, serverPort), ugi, FileFactory.getConfiguration, NetUtils.getDefaultSocketFactory(FileFactory.getConfiguration)) } }
[carbondata] branch master updated: [CARBONDATA-3364] Support Read from Hive. Queries are giving empty results from hive.
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new fcca6c5 [CARBONDATA-3364] Support Read from Hive. Queries are giving empty results from hive. fcca6c5 is described below commit fcca6c5b661ec02adfa17622e980a0c396bd84c2 Author: dhatchayani AuthorDate: Mon Apr 29 18:52:57 2019 +0530 [CARBONDATA-3364] Support Read from Hive. Queries are giving empty results from hive. This closes #3192 --- .../apache/carbondata/examples/HiveExample.scala | 99 +- .../apache/carbondata/examplesCI/RunExamples.scala | 3 +- integration/hive/pom.xml | 9 +- .../carbondata/hive/CarbonHiveInputSplit.java | 8 +- .../apache/carbondata/hive/CarbonHiveSerDe.java| 2 +- .../carbondata/hive/MapredCarbonInputFormat.java | 20 ++--- .../carbondata/hive/MapredCarbonOutputFormat.java | 12 ++- .../{ => test}/server/HiveEmbeddedServer2.java | 20 ++--- integration/spark-common-test/pom.xml | 6 ++ .../TestCreateHiveTableWithCarbonDS.scala | 4 +- integration/spark-common/pom.xml | 5 ++ .../apache/spark/util/CarbonReflectionUtils.scala | 17 ++-- .../spark/util/DictionaryLRUCacheTestCase.scala| 1 + pom.xml| 1 + 14 files changed, 123 insertions(+), 84 deletions(-) diff --git a/examples/spark2/src/main/scala/org/apache/carbondata/examples/HiveExample.scala b/examples/spark2/src/main/scala/org/apache/carbondata/examples/HiveExample.scala index b50e763..c043076 100644 --- a/examples/spark2/src/main/scala/org/apache/carbondata/examples/HiveExample.scala +++ b/examples/spark2/src/main/scala/org/apache/carbondata/examples/HiveExample.scala @@ -19,33 +19,36 @@ package org.apache.carbondata.examples import java.io.File import java.sql.{DriverManager, ResultSet, Statement} -import org.apache.spark.sql.SparkSession +import org.apache.hadoop.fs.Path +import org.apache.hadoop.fs.permission.{FsAction, FsPermission} import org.apache.carbondata.common.logging.LogServiceFactory -import org.apache.carbondata.core.constants.CarbonCommonConstants -import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.core.datastore.impl.FileFactory import org.apache.carbondata.examples.util.ExampleUtils -import org.apache.carbondata.hive.server.HiveEmbeddedServer2 +import org.apache.carbondata.hive.test.server.HiveEmbeddedServer2 // scalastyle:off println object HiveExample { private val driverName: String = "org.apache.hive.jdbc.HiveDriver" - def main(args: Array[String]) { -val carbonSession = ExampleUtils.createCarbonSession("HiveExample") -exampleBody(carbonSession, CarbonProperties.getStorePath - + CarbonCommonConstants.FILE_SEPARATOR - + CarbonCommonConstants.DATABASE_DEFAULT_NAME) -carbonSession.stop() + val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath + private val targetLoc = s"$rootPath/examples/spark2/target" + val metaStoreLoc = s"$targetLoc/metastore_db" + val storeLocation = s"$targetLoc/store" + val logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) + + def main(args: Array[String]) { +createCarbonTable(storeLocation) +readFromHive System.exit(0) } - def exampleBody(carbonSession: SparkSession, store: String): Unit = { -val logger = LogServiceFactory.getLogService(this.getClass.getCanonicalName) -val rootPath = new File(this.getClass.getResource("/").getPath - + "../../../..").getCanonicalPath + def createCarbonTable(store: String): Unit = { + +val carbonSession = ExampleUtils.createCarbonSession("HiveExample") carbonSession.sql("""DROP TABLE IF EXISTS HIVE_CARBON_EXAMPLE""".stripMargin) @@ -56,14 +59,44 @@ object HiveExample { | STORED BY 'carbondata' """.stripMargin) +val inputPath = FileFactory + .getUpdatedFilePath(s"$rootPath/examples/spark2/src/main/resources/sample.csv") + carbonSession.sql( s""" - | LOAD DATA LOCAL INPATH '$rootPath/examples/spark2/src/main/resources/sample.csv' + | LOAD DATA LOCAL INPATH '$inputPath' + | INTO TABLE HIVE_CARBON_EXAMPLE + """.stripMargin) + +carbonSession.sql( + s""" + | LOAD DATA LOCAL INPATH '$inputPath' | INTO TABLE HIVE_CARBON_EXAMPLE """.stripMargin) carbonSession.sql("SELECT * FROM HIVE_CARBON_EXAMPLE").show() +carbonSession.close()
[carbondata] branch master updated: [CARBONDATA-3395] Fix Exception when concurrent readers built with same split object
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 36ee528 [CARBONDATA-3395] Fix Exception when concurrent readers built with same split object 36ee528 is described below commit 36ee52836c7bb7bc8e7a4cc6c294d7b77fdba2ee Author: ajantha-bhat AuthorDate: Fri May 24 19:50:57 2019 +0530 [CARBONDATA-3395] Fix Exception when concurrent readers built with same split object problem: Fix Exception when concurrent readers built with same split object cause: In CarbonInputSplit, BlockletDetailInfo and BlockletInfo are made lazy. so, BlockletInfo is prepared during reader builder. so, when two readers work on same split object, the state of this object is changed and leading to array out of bound issue. solution: a) synchronize BlockletInfo creation, b) load BlockletDetailInfo before passing to reader inside getSplit() API itself. c) Failure case get the proper identifier to cleanup the datamaps. d) build_with_splits, need to handle default projection filling if not configured. This closes #3232 --- .../carbondata/core/indexstore/BlockletDetailInfo.java | 6 +- .../carbondata/hadoop/api/CarbonFileInputFormat.java | 16 ++-- .../apache/carbondata/sdk/file/CarbonReaderBuilder.java | 14 ++ 3 files changed, 25 insertions(+), 11 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDetailInfo.java b/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDetailInfo.java index a5aa899..af07f09 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDetailInfo.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDetailInfo.java @@ -108,7 +108,11 @@ public class BlockletDetailInfo implements Serializable, Writable { public BlockletInfo getBlockletInfo() { if (null == blockletInfo) { try { -setBlockletInfoFromBinary(); +synchronized (this) { + if (null == blockletInfo) { +setBlockletInfoFromBinary(); + } +} } catch (IOException e) { throw new RuntimeException(e); } diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java index e83f898..1f34c4f 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java @@ -200,17 +200,21 @@ public class CarbonFileInputFormat extends CarbonInputFormat implements Se } }); } - if (getColumnProjection(job.getConfiguration()) == null) { -// If the user projection is empty, use default all columns as projections. -// All column name will be filled inside getSplits, so can update only here. -String[] projectionColumns = projectAllColumns(carbonTable); -setColumnProjection(job.getConfiguration(), projectionColumns); - } + setAllColumnProjectionIfNotConfigured(job, carbonTable); return splits; } return null; } + public void setAllColumnProjectionIfNotConfigured(JobContext job, CarbonTable carbonTable) { +if (getColumnProjection(job.getConfiguration()) == null) { + // If the user projection is empty, use default all columns as projections. + // All column name will be filled inside getSplits, so can update only here. + String[] projectionColumns = projectAllColumns(carbonTable); + setColumnProjection(job.getConfiguration(), projectionColumns); +} + } + private List getAllCarbonDataFiles(String tablePath) { List carbonFiles; try { diff --git a/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java b/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java index 6ead50d..2db92ea 100644 --- a/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java +++ b/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonReaderBuilder.java @@ -358,8 +358,8 @@ public class CarbonReaderBuilder { } } catch (Exception ex) { // Clear the datamap cache as it can get added in getSplits() method - DataMapStoreManager.getInstance() - .clearDataMaps(format.getAbsoluteTableIdentifier(hadoopConf)); + DataMapStoreManager.getInstance().clearDataMaps( + format.getOrCreateCarbonTable((job.getConfiguration())).getAbsoluteTableIdentifier()); throw ex; } } @@ -372,6 +372,8 @@ public class CarbonReaderBuilder { } final Job job = new Job(new JobConf(hadoopConf)); CarbonFileInputFormat format
[carbondata] branch master updated: [HOTFIX]Fix select * failure when MV datamap is enabled
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new faba657 [HOTFIX]Fix select * failure when MV datamap is enabled faba657 is described below commit faba657becafe3b68fe73af875385c57384dbc8f Author: akashrn5 AuthorDate: Mon May 27 12:28:00 2019 +0530 [HOTFIX]Fix select * failure when MV datamap is enabled Problem: when select * is executed with limit, ColumnPruning rule will remove the project node from the plan during optimization, so child of limit nod eis relation and it fails in modular plan generation Solution: so if child of Limit is relation, then make the select node and make the modular plan This closes #3235 --- .../carbondata/mv/rewrite/MVCreateTestCase.scala | 18 ++ .../carbondata/mv/plans/modular/ModularPatterns.scala | 10 ++ .../mv/plans/util/Logical2ModularExtractions.scala | 7 +++ 3 files changed, 35 insertions(+) diff --git a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala index 4f5423e..48f967f 100644 --- a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala +++ b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala @@ -953,6 +953,23 @@ class MVCreateTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table if exists all_table") } + test("test select * and distinct when MV is enabled") { +sql("drop table if exists limit_fail") +sql("CREATE TABLE limit_fail (empname String, designation String, doj Timestamp,workgroupcategory int, workgroupcategoryname String, deptno int, deptname String,projectcode int, projectjoindate Timestamp, projectenddate Timestamp,attendance int,utilization int,salary int)STORED BY 'org.apache.carbondata.format'") +sql(s"LOAD DATA local inpath '$resourcesPath/data_big.csv' INTO TABLE limit_fail OPTIONS" + +"('DELIMITER'= ',', 'QUOTECHAR'= '\"')") +sql("create datamap limit_fail_dm1 using 'mv' as select empname,designation from limit_fail") +try { + val df = sql("select distinct(empname) from limit_fail limit 10") + sql("select * from limit_fail limit 10").show() + val analyzed = df.queryExecution.analyzed + assert(verifyMVDataMap(analyzed, "limit_fail_dm1")) +} catch { + case ex: Exception => +assert(false) +} + } + def verifyMVDataMap(logicalPlan: LogicalPlan, dataMapName: String): Boolean = { val tables = logicalPlan collect { case l: LogicalRelation => l.catalogTable.get @@ -970,6 +987,7 @@ class MVCreateTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table IF EXISTS fact_streaming_table1") sql("drop table IF EXISTS fact_streaming_table2") sql("drop table IF EXISTS fact_table_parquet") +sql("drop table if exists limit_fail") } override def afterAll { diff --git a/datamap/mv/plan/src/main/scala/org/apache/carbondata/mv/plans/modular/ModularPatterns.scala b/datamap/mv/plan/src/main/scala/org/apache/carbondata/mv/plans/modular/ModularPatterns.scala index a4116d9..30857c8 100644 --- a/datamap/mv/plan/src/main/scala/org/apache/carbondata/mv/plans/modular/ModularPatterns.scala +++ b/datamap/mv/plan/src/main/scala/org/apache/carbondata/mv/plans/modular/ModularPatterns.scala @@ -19,6 +19,7 @@ package org.apache.carbondata.mv.plans.modular import org.apache.spark.sql.catalyst.expressions.{Expression, NamedExpression, PredicateHelper, _} import org.apache.spark.sql.catalyst.plans.logical._ +import org.apache.spark.sql.execution.datasources.LogicalRelation import org.apache.carbondata.mv.plans.{Pattern, _} import org.apache.carbondata.mv.plans.modular.Flags._ @@ -118,6 +119,15 @@ abstract class ModularPatterns extends Modularizer[ModularPlan] { makeSelectModule(output, input, predicate, aliasmap, joinedge, flags, children.map(modularizeLater), Seq(Seq(limitExpr)) ++ fspec1, wspec) +// if select * is with limit, then projection is removed from plan, so send the parent plan +// to ExtractSelectModule to make the select node +case limit@Limit(limitExpr, lr: LogicalRelation) => + val (output, input, predicate, aliasmap, joinedge, children, flags1, + fspec1, wspec) = ExtractSelectModule.unapply(limit).get + val flags = flags1.setFlag(LIMIT) + makeSelectModule(output, input, predicate, aliasmap, joinedge, flags, +children.map(modularizeLater), Seq(Seq(limitExpr)) ++ f
[carbondata] branch master updated: [CARBONDATA-3387] Support Partition with MV datamap & Show DataMap Status
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 51235d4 [CARBONDATA-3387] Support Partition with MV datamap & Show DataMap Status 51235d4 is described below commit 51235d4cf239ea0d167623fed5ae339796d56eae Author: Indhumathi27 AuthorDate: Mon May 13 11:08:31 2019 +0530 [CARBONDATA-3387] Support Partition with MV datamap & Show DataMap Status This PR includes, Support Partition with Mv Datamap [Datamap with single parent table] Show DataMap status and ParentTable to Datamap table segment Sync Information with SHOW DATAMAP ddl Optimization for Incremental DataLoad. In case of below scenario we can avoid reloading the MV Maintable segments:0,1,2 MV: 0 => 0,1,2 Now after maintable compaction it will reload the 0.1 segment of maintable to MV, this is avoided by changing the mapping {0,1,2}=>{0.1} This closes #3216 --- .../core/constants/CarbonCommonConstants.java | 2 + .../carbondata/core/datamap/DataMapProvider.java | 64 +- .../core/metadata/schema/table/DataMapSchema.java | 13 + datamap/mv/core/pom.xml| 2 +- .../carbondata/mv/datamap/MVDataMapProvider.scala | 12 +- .../apache/carbondata/mv/datamap/MVHelper.scala| 75 ++- .../org/apache/carbondata/mv/datamap/MVUtil.scala | 3 +- .../mv/rewrite/MVIncrementalLoadingTestcase.scala | 23 + .../mv/rewrite/TestAllOperationsOnMV.scala | 138 - .../mv/rewrite/TestPartitionWithMV.scala | 688 + datamap/mv/plan/pom.xml| 2 +- .../mv/plans/util/BirdcageOptimizer.scala | 4 +- .../testsuite/datamap/TestDataMapCommand.scala | 10 +- ...StandardPartitionWithPreaggregateTestCase.scala | 10 + .../scala/org/apache/spark/sql/CarbonEnv.scala | 5 +- .../datamap/CarbonCreateDataMapCommand.scala | 36 +- .../command/datamap/CarbonDataMapShowCommand.scala | 54 +- .../command/management/CarbonLoadDataCommand.scala | 10 +- .../execution/command/mv/DataMapListeners.scala| 113 +++- .../CarbonAlterTableDropHivePartitionCommand.scala | 4 - .../preaaggregate/PreAggregateListeners.scala | 2 +- .../command/table/CarbonDropTableCommand.scala | 14 +- .../spark/sql/execution/strategy/DDLStrategy.scala | 4 + .../processing/util/CarbonLoaderUtil.java | 43 ++ 24 files changed, 1280 insertions(+), 51 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 9375414..e78ea17 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -2174,4 +2174,6 @@ public final class CarbonCommonConstants { */ public static final String PARENT_TABLES = "parent_tables"; + public static final String LOAD_SYNC_TIME = "load_sync_time"; + } diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java index fe2e7dd..c4ee49b 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapProvider.java @@ -264,23 +264,52 @@ public abstract class DataMapProvider { } else { for (RelationIdentifier relationIdentifier : relationIdentifiers) { List dataMapTableSegmentList = new ArrayList<>(); +// Get all segments for parent relationIdentifier +List mainTableSegmentList = +DataMapUtil.getMainTableValidSegmentList(relationIdentifier); +boolean ifTableStatusUpdateRequired = false; for (LoadMetadataDetails loadMetaDetail : listOfLoadFolderDetails) { if (loadMetaDetail.getSegmentStatus() == SegmentStatus.SUCCESS || loadMetaDetail.getSegmentStatus() == SegmentStatus.INSERT_IN_PROGRESS) { Map> segmentMaps = DataMapSegmentStatusUtil.getSegmentMap(loadMetaDetail.getExtraInfo()); -dataMapTableSegmentList.addAll(segmentMaps.get( -relationIdentifier.getDatabaseName() + CarbonCommonConstants.POINT -+ relationIdentifier.getTableName())); +String mainTableMetaDataPath = + CarbonTablePath.getMetadataPath(relationIdentifier.getTablePath()); +LoadMetadataDetails[] parentTableLoadMetaDataDetails = +SegmentStatusManager.readLoadMetadata(mainTableMetaDataPath); +String table = relationIdentifier.getDa
[carbondata] branch master updated: [CARBONDATA-3392] Make LRU mandatory for index server
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new df7339c [CARBONDATA-3392] Make LRU mandatory for index server df7339c is described below commit df7339ce005be48dfb440e4cd02f640d6555e887 Author: kunal642 AuthorDate: Wed May 15 16:40:28 2019 +0530 [CARBONDATA-3392] Make LRU mandatory for index server Background: Currently LRU is optional for the user to configure, but this will raise some concerns in case of index server because the invalid segments have to be constantly removed from the cache in case of update/delete/compaction scenarios. Therefore if clear segment job is failed then the job would not fail bu there has to be a mechanism to prevent that segment from being in cache forever. To prevent the above mentioned scenario LRU cache size for executor is a mandatory property for the index server application. This closes #3222 --- .../carbondata/core/datamap/DataMapUtil.java | 10 +- .../carbondata/core/util/BlockletDataMapUtil.java | 2 +- .../hadoop/api/CarbonTableInputFormat.java | 39 +- .../carbondata/indexserver/DataMapJobs.scala | 18 -- .../indexserver/DistributedPruneRDD.scala | 12 +-- .../carbondata/indexserver/IndexServer.scala | 19 +-- .../spark/rdd/CarbonDataRDDFactory.scala | 10 -- .../sql/execution/command/cache/CacheUtil.scala| 15 +++-- .../command/cache/CarbonShowCacheCommand.scala | 23 - 9 files changed, 86 insertions(+), 62 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java index e20f19a..2371a10 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java @@ -115,7 +115,15 @@ public class DataMapUtil { DistributableDataMapFormat dataMapFormat = new DistributableDataMapFormat(carbonTable, validAndInvalidSegmentsInfo.getValidSegments(), invalidSegment, true, dataMapToClear); -dataMapJob.execute(dataMapFormat); +try { + dataMapJob.execute(dataMapFormat); +} catch (Exception e) { + if (dataMapJob.getClass().getName().equalsIgnoreCase(DISTRIBUTED_JOB_NAME)) { +LOGGER.warn("Failed to clear distributed cache.", e); + } else { +throw e; + } +} } public static void executeClearDataMapJob(CarbonTable carbonTable, String jobClassName) diff --git a/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java b/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java index c90c3dc..68aad72 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java @@ -228,7 +228,7 @@ public class BlockletDataMapUtil { List tableBlockIndexUniqueIdentifiers = new ArrayList<>(); String mergeFilePath = identifier.getIndexFilePath() + CarbonCommonConstants.FILE_SEPARATOR + identifier -.getMergeIndexFileName(); +.getIndexFileName(); segmentIndexFileStore.readMergeFile(mergeFilePath); List indexFiles = segmentIndexFileStore.getCarbonMergeFileToIndexFilesMap().get(mergeFilePath); diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java index dd86dcb..274c7ef 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java @@ -557,22 +557,31 @@ public class CarbonTableInputFormat extends CarbonInputFormat { } if (isIUDTable || isUpdateFlow) { Map blockletToRowCountMap = new HashMap<>(); - if (CarbonProperties.getInstance().isDistributedPruningEnabled(table.getDatabaseName(), - table.getTableName())) { -List extendedBlocklets = CarbonTableInputFormat.convertToCarbonInputSplit( -getDistributedSplit(table, null, partitions, filteredSegment, -allSegments.getInvalidSegments(), toBeCleanedSegments)); -for (InputSplit extendedBlocklet : extendedBlocklets) { - CarbonInputSplit blocklet = (CarbonInputSplit) extendedBlocklet; - String filePath = blocklet.getFilePath(); - String blockName = filePath.substring(filePath.lastIndexOf("/") + 1); - blockletToRowCountMap.put(blocklet.getSegmentId() + "," + blockName, - (
[carbondata] branch master updated: [CARBONDATA-3357] Support TableProperties from single parent table and restrict alter/delete/partition on mv
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 2a28dba [CARBONDATA-3357] Support TableProperties from single parent table and restrict alter/delete/partition on mv 2a28dba is described below commit 2a28dba04236ce976984d9cbc398eb8fa517d6f5 Author: Indhumathi27 AuthorDate: Wed Apr 24 01:04:21 2019 +0530 [CARBONDATA-3357] Support TableProperties from single parent table and restrict alter/delete/partition on mv Inherit Table Properties from main table to mv datamap table, if datamap has single parent table, else use default table properties. Restrict Alter/Delete/Partition operations on MV This closes #3184 --- .../core/datamap/DataMapStoreManager.java | 27 +- .../carbondata/core/datamap/DataMapUtil.java | 1 + .../core/metadata/schema/table/CarbonTable.java| 17 -- .../core/metadata/schema/table/DataMapSchema.java | 14 + .../carbondata/mv/datamap/MVDataMapProvider.scala | 19 +- .../apache/carbondata/mv/datamap/MVHelper.scala| 110 ++-- .../org/apache/carbondata/mv/datamap/MVUtil.scala | 287 + .../mv/rewrite/MVCountAndCaseTestCase.scala| 2 - .../carbondata/mv/rewrite/MVCreateTestCase.scala | 29 +-- .../mv/rewrite/MVIncrementalLoadingTestcase.scala | 1 - .../mv/rewrite/MVMultiJoinTestCase.scala | 8 +- .../carbondata/mv/rewrite/MVTpchTestCase.scala | 10 +- .../mv/rewrite/TestAllOperationsOnMV.scala | 255 ++ .../mv/rewrite/matching/TestSQLBatch.scala | 4 +- .../preaggregate/TestPreAggregateLoad.scala| 2 +- .../TestTimeSeriesUnsupportedSuite.scala | 8 +- .../scala/org/apache/spark/sql/CarbonEnv.scala | 9 +- .../command/datamap/CarbonDropDataMapCommand.scala | 9 + .../management/CarbonCleanFilesCommand.scala | 3 +- .../execution/command/mv/DataMapListeners.scala| 146 ++- .../CarbonAlterTableDropHivePartitionCommand.scala | 7 +- .../preaaggregate/PreAggregateListeners.scala | 6 +- .../preaaggregate/PreAggregateTableHelper.scala| 102 +--- .../schema/CarbonAlterTableRenameCommand.scala | 7 +- .../spark/sql/execution/strategy/DDLStrategy.scala | 4 +- .../spark/sql/hive/CarbonAnalysisRules.scala | 10 +- .../scala/org/apache/spark/util/DataMapUtil.scala | 160 27 files changed, 1054 insertions(+), 203 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java index 81b1fb2..89402c2 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java @@ -281,19 +281,22 @@ public final class DataMapStoreManager { dataMapCatalogs = new ConcurrentHashMap<>(); List dataMapSchemas = getAllDataMapSchemas(); for (DataMapSchema schema : dataMapSchemas) { -DataMapCatalog dataMapCatalog = dataMapCatalogs.get(schema.getProviderName()); -if (dataMapCatalog == null) { - dataMapCatalog = dataMapProvider.createDataMapCatalog(); - if (null == dataMapCatalog) { -throw new RuntimeException("Internal Error."); +if (schema.getProviderName() + .equalsIgnoreCase(dataMapProvider.getDataMapSchema().getProviderName())) { + DataMapCatalog dataMapCatalog = dataMapCatalogs.get(schema.getProviderName()); + if (dataMapCatalog == null) { +dataMapCatalog = dataMapProvider.createDataMapCatalog(); +if (null == dataMapCatalog) { + throw new RuntimeException("Internal Error."); +} +dataMapCatalogs.put(schema.getProviderName(), dataMapCatalog); + } + try { +dataMapCatalog.registerSchema(schema); + } catch (Exception e) { +// Ignore the schema +LOGGER.error("Error while registering schema", e); } - dataMapCatalogs.put(schema.getProviderName(), dataMapCatalog); -} -try { - dataMapCatalog.registerSchema(schema); -} catch (Exception e) { - // Ignore the schema - LOGGER.error("Error while registering schema", e); } } } diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java index 0a604fb..e20f19a 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java @@ -270,4 +270,5 @@
[carbondata] branch master updated: [CARBONDATA-3384] Fix NullPointerException for update/delete using index server
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new bd16325 [CARBONDATA-3384] Fix NullPointerException for update/delete using index server bd16325 is described below commit bd1632564acb248db7080b9fd5f76b8e8da79101 Author: kunal642 AuthorDate: Wed May 15 11:35:18 2019 +0530 [CARBONDATA-3384] Fix NullPointerException for update/delete using index server Problem: After update the segment cache is cleared from the executor, then in any subsequent query only one index file is considered for creating the BlockUniqueIdentifier. Therefore the query throws NullPointer when accessing the segmentProperties. Solution: Consider all index file for the segment for Identifier creation. This closes #3218 --- .../indexstore/blockletindex/BlockletDataMapFactory.java | 4 ++-- .../carbondata/hadoop/api/CarbonTableInputFormat.java| 4 +++- .../indexserver/InvalidateSegmentCacheRDD.scala | 16 ++-- 3 files changed, 15 insertions(+), 9 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java index e4a3ad8..446507f 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java @@ -344,6 +344,7 @@ public class BlockletDataMapFactory extends CoarseGrainDataMapFactory Set tableBlockIndexUniqueIdentifiers = segmentMap.get(distributable.getSegment().getSegmentNo()); if (tableBlockIndexUniqueIdentifiers == null) { + tableBlockIndexUniqueIdentifiers = new HashSet<>(); Set indexFiles = distributable.getSegment().getCommittedIndexFile().keySet(); for (String indexFile : indexFiles) { CarbonFile carbonFile = FileFactory.getCarbonFile(indexFile); @@ -363,10 +364,9 @@ public class BlockletDataMapFactory extends CoarseGrainDataMapFactory identifiersWrapper.add( new TableBlockIndexUniqueIdentifierWrapper(tableBlockIndexUniqueIdentifier, this.getCarbonTable())); -tableBlockIndexUniqueIdentifiers = new HashSet<>(); tableBlockIndexUniqueIdentifiers.add(tableBlockIndexUniqueIdentifier); -segmentMap.put(distributable.getSegment().getSegmentNo(), tableBlockIndexUniqueIdentifiers); } + segmentMap.put(distributable.getSegment().getSegmentNo(), tableBlockIndexUniqueIdentifiers); } else { for (TableBlockIndexUniqueIdentifier tableBlockIndexUniqueIdentifier : tableBlockIndexUniqueIdentifiers) { diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java index 458c95e..dd86dcb 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java @@ -564,7 +564,9 @@ public class CarbonTableInputFormat extends CarbonInputFormat { allSegments.getInvalidSegments(), toBeCleanedSegments)); for (InputSplit extendedBlocklet : extendedBlocklets) { CarbonInputSplit blocklet = (CarbonInputSplit) extendedBlocklet; - blockletToRowCountMap.put(blocklet.getSegmentId() + "," + blocklet.getFilePath(), + String filePath = blocklet.getFilePath(); + String blockName = filePath.substring(filePath.lastIndexOf("/") + 1); + blockletToRowCountMap.put(blocklet.getSegmentId() + "," + blockName, (long) blocklet.getDetailInfo().getRowCount()); } } else { diff --git a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/InvalidateSegmentCacheRDD.scala b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/InvalidateSegmentCacheRDD.scala index 1aa8cd9..bc83d2f 100644 --- a/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/InvalidateSegmentCacheRDD.scala +++ b/integration/spark2/src/main/scala/org/apache/carbondata/indexserver/InvalidateSegmentCacheRDD.scala @@ -43,12 +43,16 @@ class InvalidateSegmentCacheRDD(@transient private val ss: SparkSession, databas } override protected def internalGetPartitions: Array[Partition] = { -executorsList.zipWithIndex.map { - case (executor, idx) => -// create a dummy split for each executor to accumulate the cache size. -val dummySplit = new CarbonInputSplit() -dummySplit.setLocation(Array(executor)) -
[carbondata] branch master updated: [HOTFIX] exclude logback from arrow dependency
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new df71291 [HOTFIX] exclude logback from arrow dependency df71291 is described below commit df71291ec6a87cbf1c3e03cf728959abf2990faf Author: ajantha-bhat AuthorDate: Tue May 21 14:46:20 2019 +0530 [HOTFIX] exclude logback from arrow dependency [HOTFIX] exclude logback from arrow dependency logack is a similar logging framework with default DEBUG log level, arrow was importing this by transitive dependency. Due to this all library log level set to debug causing huge logs. Excluded this from dependency now. This closes #3228 --- store/sdk/pom.xml | 48 1 file changed, 48 insertions(+) diff --git a/store/sdk/pom.xml b/store/sdk/pom.xml index a1d594d..6f04a58 100644 --- a/store/sdk/pom.xml +++ b/store/sdk/pom.xml @@ -49,6 +49,12 @@ org.apache.arrow arrow-format 0.12.0 + + + ch.qos.logback + logback-classic + + org.apache.arrow @@ -56,6 +62,10 @@ 0.12.0 + ch.qos.logback + logback-classic + + io.netty netty-common @@ -71,6 +81,10 @@ 0.12.0 + ch.qos.logback + logback-classic + + io.netty netty-common @@ -84,6 +98,12 @@ org.apache.arrow arrow-plasma 0.12.0 + + + ch.qos.logback + logback-classic + + org.apache.arrow @@ -91,6 +111,10 @@ 0.12.0 + ch.qos.logback + logback-classic + + io.netty netty-buffer @@ -100,21 +124,45 @@ org.apache.arrow arrow-tools 0.12.0 + + + ch.qos.logback + logback-classic + + com.fasterxml.jackson.core jackson-core ${dep.jackson.version} + + + ch.qos.logback + logback-classic + + com.fasterxml.jackson.core jackson-annotations ${dep.jackson.version} + + + ch.qos.logback + logback-classic + + com.fasterxml.jackson.core jackson-databind ${dep.jackson.version} + + + ch.qos.logback + logback-classic + +
[carbondata] branch master updated: [CARBONDATA-3303] Fix that MV datamap return wrong results when using coalesce and less groupby columns
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new a2b7d20 [CARBONDATA-3303] Fix that MV datamap return wrong results when using coalesce and less groupby columns a2b7d20 is described below commit a2b7d20339a8ee28e1695e8eac9e1afa2c3a5b03 Author: qiuchenjian <807169...@qq.com> AuthorDate: Tue Feb 26 14:50:26 2019 +0800 [CARBONDATA-3303] Fix that MV datamap return wrong results when using coalesce and less groupby columns Problem MV datamap return wrong results when using coalesce and query SQL's groupby columns is less than MV SQL's create table coalesce_test_main(id int,name string,height int,weight int using carbondata insert into coalesce_test_main select 1,'tom',170,130 insert into coalesce_test_main select 2,'tom',170,120 insert into coalesce_test_main select 3,'lily',160,100 create datamap coalesce_test_main_mv using 'mv' as select coalesce(sum(id),0) as sum_id,name as myname,weight from coalesce_test_main group by name,weight select coalesce(sum(id),0) as sumid,name from coalesce_test_main group by name The query results: 1 tom 2 tom 3 lily Solution When query SQL's groupby columns is less than MV SQL's and the MV SQL has coalesce expression, MV table cann't calculate the right result, so MV shouldn't take effect at this scene This closes #3135 --- .../apache/carbondata/mv/datamap/MVHelper.scala| 14 +++- .../carbondata/mv/rewrite/MVCoalesceTestCase.scala | 91 ++ .../carbondata/mv/rewrite/MVRewriteTestCase.scala | 4 +- 3 files changed, 106 insertions(+), 3 deletions(-) diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala index 6d0b2d3..810449c 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala @@ -25,7 +25,7 @@ import scala.collection.mutable.ArrayBuffer import org.apache.spark.sql.{CarbonEnv, CarbonToSparkAdapter, SparkSession} import org.apache.spark.sql.catalyst.TableIdentifier import org.apache.spark.sql.catalyst.catalog.CatalogTable -import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Cast, Expression, NamedExpression, ScalaUDF, SortOrder} +import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Cast, Coalesce, Expression, NamedExpression, ScalaUDF, SortOrder} import org.apache.spark.sql.catalyst.expressions.aggregate._ import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, Join, LogicalPlan, Project} import org.apache.spark.sql.execution.command.{Field, TableModel, TableNewProcessor} @@ -184,6 +184,18 @@ object MVHelper { if (catalog.isMVWithSameQueryPresent(logicalPlan)) { throw new UnsupportedOperationException("MV with same query present") } + +var expressionValid = true +modularPlan.transformExpressions { + case coal@Coalesce(_) if coal.children.exists( +exp => exp.isInstanceOf[AggregateExpression]) => +expressionValid = false +coal +} + +if (!expressionValid) { + throw new UnsupportedOperationException("MV doesn't support Coalesce") +} } def updateColumnName(attr: Attribute): String = { diff --git a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCoalesceTestCase.scala b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCoalesceTestCase.scala new file mode 100644 index 000..f2a27c7 --- /dev/null +++ b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCoalesceTestCase.scala @@ -0,0 +1,91 @@ +/* +* Licensed to the Apache Software Foundation (ASF) under one or more +* contributor license agreements. See the NOTICE file distributed with +* this work for additional information regarding copyright ownership. +* The ASF licenses this file to You under the Apache License, Version 2.0 +* (the "License"); you may not use this file except in compliance with +* the License. You may obtain a copy of the License at +* +*http://www.apache.org/licenses/LICENSE-2.0 +* +* Unless required by applicable law or agreed to in writing, software +* distributed under the License is distributed on an "AS IS" BASIS, +* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +* See the License for the specific language governing permissions and +* limitations under the License. +*/ +package org.apache.carbondata.m
[carbondata] branch master updated: [CARBONDATA-3309] MV datamap supports Spark 2.1
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 4d7c8ad [CARBONDATA-3309] MV datamap supports Spark 2.1 4d7c8ad is described below commit 4d7c8ada98ed15511d0abff349b64522f047344b Author: qiuchenjian <807169...@qq.com> AuthorDate: Sun Mar 17 20:04:48 2019 +0800 [CARBONDATA-3309] MV datamap supports Spark 2.1 [Problem] MV datamap doesn't support Spark 2.1 version, so we need to support it [Solution] The following is the modification point and all MV test cases are passed on spark 2.1 version The Class we cann’t access in Spark 2.1 version (1). org.apache.spark.internal.Logging (2). org.apache.spark.sql.internal.SQLConf Solution:Create class extends above classed The Class that Spark 2.1 version doesn’t have (1). org.apache.spark.sql.catalyst.plans.logical.Subquery (2). org.apache.spark.sql.catalyst.catalog.interface.HiveTableRelation Solution: Use CatalogRelation instead and don’t use (in LogicalPlanSignatureGenerator) Mv the Subquery code to carbon project The method that we can’t access in Spark 2.1 version (1). sparkSession.sessionState.catalog.lookupRelation Solution: Solution:Add this method of SparkSQLUtil The changes of some class (1). org.apache.spark.sql.catalyst.expressions.SortOrder (2). org.apache.spark.sql.catalyst.expressions.Cast (3). org.apache.spark.sql.catalyst.plans.Statistics Solution: Adapt the new interface The method that Spark 2.1 version doesn’t have (1). normalizeExprId,canonicalized of org.apache.spark.sql.catalyst.plans.QueryPlan (2). CASE_SENSITIVE of SQLConf (3). STARSCHEMA_DETECTION of SQLConf Solution:Don’t use normalize , canonicalize and the CASE_SENSITIVE, STARSCHEMA_DETECTION Some logicplan optimization rules that Spark 2.1 version doesn’t have (1). SimplifyCreateMapOps (2). SimplifyCreateArrayOps (3). SimplifyCreateStructOps (4). RemoveRedundantProject (5). RemoveRedundantAliases (6). PullupCorrelatedPredicates (7). ReplaceDeduplicateWithAggregate (8). EliminateView Solution: Delete or move the code to carbon project Generate the instance in SparkSQLUtil to adapt Spark 2.1 version Query SQL pass the MV check in Spark 2.1 version(CarbonSessionState) This closes #3150 --- .../carbondata/mv/datamap/MVDataMapProvider.scala | 2 +- .../apache/carbondata/mv/datamap/MVHelper.scala| 2 +- .../apache/carbondata/mv/rewrite/MatchMaker.scala | 2 +- .../mv/rewrite/SummaryDatasetCatalog.scala | 5 +- .../carbondata/mv/rewrite/TestSQLSuite.scala | 4 +- .../carbondata/mv/rewrite/Tpcds_1_4_Suite.scala| 4 +- .../mv/expressions/modular/subquery.scala | 13 ++- .../mv/plans/modular/AggregatePushDown.scala | 8 +- .../carbondata/mv/plans/modular/Harmonizer.scala | 2 +- .../carbondata/mv/plans/modular/ModularPlan.scala | 8 +- .../mv/plans/modular/ModularRelation.scala | 22 +--- .../carbondata/mv/plans/modular/Modularizer.scala | 2 +- .../mv/plans/util/BirdcageOptimizer.scala | 10 +- .../mv/plans/util/Logical2ModularExtractions.scala | 19 +-- .../carbondata/mv/plans/util/SQLBuildDSL.scala | 5 +- .../carbondata/mv/plans/util/SQLBuilder.scala | 9 -- .../carbondata/mv/plans/util/Signature.scala | 2 +- .../carbondata/mv/testutil/Tpcds_1_4_Tables.scala | 4 +- .../carbondata/mv/plans/ModularToSQLSuite.scala| 4 +- .../carbondata/mv/plans/SignatureSuite.scala | 4 +- .../spark/sql/catalyst/analysis/EmptyRule.scala| 26 + .../org/apache/spark/sql/util/SparkSQLUtil.scala | 113 +- .../apache/spark/util/CarbonReflectionUtils.scala | 7 ++ .../src/main/scala/org/apache/spark/Logging.scala | 22 .../main/scala/org/apache/spark/sql/SQLConf.scala | 23 .../apache/spark/sql/CarbonToSparkAdapater.scala | 8 +- .../sql/catalyst/catalog/HiveTableRelation.scala | 56 + .../sql/catalyst/optimizer/MigrateOptimizer.scala | 129 + .../sql/catalyst/plans/logical/Subquery.scala | 28 + .../apache/spark/sql/hive/CarbonSessionState.scala | 19 ++- 30 files changed, 481 insertions(+), 81 deletions(-) diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVDataMapProvider.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVDataMapProvider.scala index 7108bf8..5ffc46a 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVDataMapProvider.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVDataMapProvider.scala @@ -81,7 +81,7 @@ class MVDataMapProvider( val iden
[carbondata] branch master updated: [CARBONDATA-3295] Fix that MV datamap throw exception because its rewrite algorithm when query SQL has multiply subquery
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 789c97e [CARBONDATA-3295] Fix that MV datamap throw exception because its rewrite algorithm when query SQL has multiply subquery 789c97e is described below commit 789c97e4196edbe454bd83730b36cd21f72ce0cd Author: qiuchenjian <807169...@qq.com> AuthorDate: Sun Feb 17 18:52:09 2019 +0800 [CARBONDATA-3295] Fix that MV datamap throw exception because its rewrite algorithm when query SQL has multiply subquery [Problem] Error: java.lang.UnsupportedOperationException was thrown. java.lang.UnsupportedOperationException at org.apache.carbondata.mv.plans.util.SQLBuildDSL.productArity(SQLBuildDSL.scala:36) at scala.runtime.ScalaRunTime12409anon.(ScalaRunTime.scala:174) at scala.runtime.ScalaRunTime$.typedProductIterator(ScalaRunTime.scala:172)create datamap data_table_mv using 'mv' as SELECT STARTTIME,LAYER4ID, COALESCE (SUM(seq),0) AS seq_c, COALESCE (SUM(succ),0) AS succ_c FROM data_table GROUP BY STARTTIME,LAYER4IDSELECT MT. AS , MT. AS , (CASE WHEN (SUM(COALESCE(seq_c, 0))) = 0 THEN NULL ELSE (CASE WHEN (CAST((SUM(COALESCE(seq_c, 0))) AS int)) = 0 THEN 0 ELSE ((CAST((SUM(COALESCE(succ_c, 0))) AS double)) / (CAST((SUM(COALESCE(seq_c, 0))) AS double))) END) * 100 END) AS rate FROM ( SELECT sum_result.*, H_REGION. FROM (SELECT cast(floor((starttime + 28800) / 3600) * 3600 - 28800 as int) AS , LAYER4ID, COALESCE(SUM(seq), 0) AS seq_c, COALESCE(SUM(succ), 0) AS succ_c FROM data_table WHERE STARTTIME >= 1549866600 AND STARTTIME < 1549899900 GROUP BY cast(floor((STARTTIME + 28800) / 3600) * 3600 - 28800 as int),LAYER4ID )sum_result LEFT JOIN (SELECT l4id AS , l4name AS , l4name AS NAME_2250410101 FROM region GROUP BY l4id, l4name) H_REGION ON sum_result.LAYER4ID = H_REGION. WHERE H_REGION.NAME_2250410101 IS NOT NULL ) MT GROUP BY MT., MT. ORDER BY ASC LIMIT 5000 [Root Cause] // TODO Find a better way to set the rewritten flag, it may fail in some conditions. val mapping = rewrittenPlan.collect { case m: ModularPlan => m } zip updatedDataMapTablePlan.collect { case m: ModularPlan => m } mapping.foreach(f => if (f._1.rewritten) f._2.setRewritten()) this rewrite algorithm has bug, nodes are not sequential in some scenes [Solution] Fix the mv rewrite algorithm to fix this,we can compare the Select and Group object between oriPlan and rewrittenPlan, but the ModularPlan tree has beed changed, so we cann't compare their childrens, this pr add coarseEqual to compare. This closes #3129 --- .../apache/carbondata/mv/datamap/MVHelper.scala| 6 -- .../carbondata/mv/rewrite/MVRewriteTestCase.scala | 96 ++ .../mv/plans/modular/basicOperators.scala | 12 +++ 3 files changed, 108 insertions(+), 6 deletions(-) diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala index 4c7fbc4..8baa924 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala @@ -596,12 +596,6 @@ object MVHelper { case g: GroupBy => MVHelper.updateDataMap(g, rewrite) } - // TODO Find a better way to set the rewritten flag, it may fail in some conditions. - val mapping = -rewrittenPlan.collect { case m: ModularPlan => m } zip -updatedDataMapTablePlan.collect { case m: ModularPlan => m } - mapping.foreach(f => if (f._1.rewritten) f._2.setRewritten()) - updatedDataMapTablePlan } else { rewrittenPlan diff --git a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVRewriteTestCase.scala b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVRewriteTestCase.scala new file mode 100644 index 000..3f5164f --- /dev/null +++ b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVRewriteTestCase.scala @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance w
[carbondata] branch master updated: [CARBONDATA-3294] Fix that MV datamap throw error when using count(1) and case when expression
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 85ec206 [CARBONDATA-3294] Fix that MV datamap throw error when using count(1) and case when expression 85ec206 is described below commit 85ec206e670f769f6d7875c527941346924eff43 Author: qiuchenjian <807169...@qq.com> AuthorDate: Sat Feb 16 21:45:05 2019 +0800 [CARBONDATA-3294] Fix that MV datamap throw error when using count(1) and case when expression [Problem] MV datamap throw error when using count(1) and case when expression, the error is: mismatched input 'FROM' expecting {, 'WHERE', 'GROUP', 'ORDER', 'HAVING', 'LIMIT', 'LATERAL', 'WINDOW', 'UNION', 'EXCEPT', 'MINUS', 'INTERSECT', 'SORT', 'CLUSTER', 'DISTRIBUTE'}(line 2, pos 0) == SQL == SELECT MT.3600, MT.2250410101, countNum, rate FROM ^^^ [Solution] The compacted SQL has a extra 'case when' expression cause this error ,because window operator has a bug when transforming logic plan to modular plan This closes #3128 --- .../mv/rewrite/MVCountAndCaseTestCase.scala| 97 ++ .../mv/plans/modular/ModularPatterns.scala | 11 ++- 2 files changed, 104 insertions(+), 4 deletions(-) diff --git a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCountAndCaseTestCase.scala b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCountAndCaseTestCase.scala new file mode 100644 index 000..567d6a9 --- /dev/null +++ b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCountAndCaseTestCase.scala @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.mv.rewrite + +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.execution.datasources.LogicalRelation +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +class MVCountAndCaseTestCase extends QueryTest with BeforeAndAfterAll{ + + + override def beforeAll(): Unit = { +drop +sql("create table region(l4id string,l4name string) using carbondata") +sql( + s"""create table data_table( + |starttime int, seq long,succ long,LAYER4ID string,tmp int) + |using carbondata""".stripMargin) + } + + def drop(): Unit ={ +sql("drop table if exists region") +sql("drop table if exists data_table") + } + + test("test mv count and case when expression") { +sql("drop datamap if exists data_table_mv") +sql(s"""create datamap data_table_mv using 'mv' as + | SELECT STARTTIME,LAYER4ID, + | SUM(seq) AS seq_c, + | SUM(succ) AS succ_c + | FROM data_table + | GROUP BY STARTTIME,LAYER4ID""".stripMargin) + +sql("rebuild datamap data_table_mv") + +var frame = sql(s"""SELECT MT.`3600` AS `3600`, + | MT.`2250410101` AS `2250410101`, + | count(1) over() as countNum, + | (CASE WHEN (SUM(COALESCE(seq_c, 0))) = 0 THEN NULL + | ELSE + | (CASE WHEN (CAST((SUM(COALESCE(seq_c, 0))) AS int)) = 0 THEN 0 + | ELSE ((CAST((SUM(COALESCE(succ_c, 0))) AS double)) + | / (CAST((SUM(COALESCE(seq_c, 0))) AS double))) + | END) * 100 + | END) AS rate + | FROM ( + | SELECT sum_result.*, H_REGION.`2250410101` FROM + | (SELECT cast(floor((starttime + 28800) / 3600) * 3600 - 28800 as int) AS `3600`, + | LAYER4ID, + | COALESCE(SUM(seq), 0) AS seq_c, + | COALESCE(SUM(succ), 0) AS succ_c + | FROM data_table + | WHERE STA
[carbondata] branch master updated: [CARBONDATA-3291] Fix that MV datamap doesn't take affect when the same table join
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 0482983 [CARBONDATA-3291] Fix that MV datamap doesn't take affect when the same table join 0482983 is described below commit 04829839452d7f56954219b56a6e515239effe61 Author: qiuchenjian <807169...@qq.com> AuthorDate: Wed Feb 13 20:32:42 2019 +0800 [CARBONDATA-3291] Fix that MV datamap doesn't take affect when the same table join [Problem] MV datamap doesn't take affect when the same table join the error scene see the test case This closes #3125 --- .../carbondata/mv/rewrite/DefaultMatchMaker.scala | 15 +++- .../apache/carbondata/mv/rewrite/Navigator.scala | 51 +--- .../mv/rewrite/MVMultiJoinTestCase.scala | 94 ++ .../mv/plans/modular/ModularRelation.scala | 15 4 files changed, 160 insertions(+), 15 deletions(-) diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/DefaultMatchMaker.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/DefaultMatchMaker.scala index cc5cc7b..59d72f8 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/DefaultMatchMaker.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/DefaultMatchMaker.scala @@ -162,8 +162,14 @@ object SelectSelectNoChildDelta extends DefaultMatchPattern with PredicateHelper // are 1-1 correspondence. // Change the following two conditions to more complicated ones if we want to // consider things that combine extrajoin, rejoin, and harmonized relations -val isUniqueRmE = subsumer.children.filter { x => subsumee.children.count(_ == x) != 1 } -val isUniqueEmR = subsumee.children.filter { x => subsumer.children.count(_ == x) != 1 } +val isUniqueRmE = subsumer.children.filter { x => subsumee.children.count{ + case relation: ModularRelation => relation.fineEquals(x) + case other => other == x +} != 1 } +val isUniqueEmR = subsumee.children.filter { x => subsumer.children.count{ + case relation: ModularRelation => relation.fineEquals(x) + case other => other == x +} != 1 } val extrajoin = sel_1a.children.filterNot { child => sel_1q.children.contains(child) } val rejoin = sel_1q.children.filterNot { child => sel_1a.children.contains(child) } @@ -180,7 +186,10 @@ object SelectSelectNoChildDelta extends DefaultMatchPattern with PredicateHelper isPredicateEmdR && isOutputEdR) { val mappings = sel_1a.children.zipWithIndex.map { case (childr, fromIdx) if sel_1q.children.contains(childr) => - val toIndx = sel_1q.children.indexWhere(_ == childr) + val toIndx = sel_1q.children.indexWhere{ +case relation: ModularRelation => relation.fineEquals(childr) +case other => other == childr + } (toIndx -> fromIdx) } diff --git a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/Navigator.scala b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/Navigator.scala index 76df4c2..905cd17 100644 --- a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/Navigator.scala +++ b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/rewrite/Navigator.scala @@ -17,11 +17,11 @@ package org.apache.carbondata.mv.rewrite -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap, AttributeSet} +import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeMap} import org.apache.carbondata.mv.expressions.modular._ -import org.apache.carbondata.mv.plans.modular.{GroupBy, ModularPlan, Select} import org.apache.carbondata.mv.plans.modular +import org.apache.carbondata.mv.plans.modular._ import org.apache.carbondata.mv.session.MVSession private[mv] class Navigator(catalog: SummaryDatasetCatalog, session: MVSession) { @@ -146,21 +146,27 @@ private[mv] class Navigator(catalog: SummaryDatasetCatalog, session: MVSession) val rtables = subsumer.collect { case n: modular.LeafNode => n } val etables = subsumee.collect { case n: modular.LeafNode => n } val pairs = for { - rtable <- rtables - etable <- etables - if rtable == etable -} yield (rtable, etable) + i <- rtables.indices + j <- etables.indices + if rtables(i) == etables(j) && reTablesJoinMatched( +rtables(i), etables(j), subsumer, subsumee, i, j + ) +} yield (rtables(i), etables(j)) pairs.foldLeft(subsumer) { case (curSubsumer, pair) => val mappedOperator = - if (pa
[carbondata] branch master updated: [CARBONDATA-3367][CARBONDATA-3368] Fix multiple issues in SDK reader
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new bd1d774 [CARBONDATA-3367][CARBONDATA-3368] Fix multiple issues in SDK reader bd1d774 is described below commit bd1d7745c1f62caedddbc519afaffd354e535b62 Author: ajantha-bhat AuthorDate: Wed Mar 6 16:44:52 2019 +0800 [CARBONDATA-3367][CARBONDATA-3368] Fix multiple issues in SDK reader Problem: [CARBONDATA-3367] OOM when huge number of carbondata files are read from SDK reader Cause: Currently, for each carbondata file, one CarbonRecordReader will be created. And list of CarbonRecordReader will be maintained in carbonReader. so even when CarbonRecordReader is closed, the GC will not happen for that reader as list is still referring that object. so, each CarbonRecordReader needs separate memory , instead of reusing the previous memory. Solution : Once CarbonRecordReader.close is done, remove it from the list problem: [CARBONDATA-3368]InferSchema from datafile instead of index file cause: problem : In SDK, when multiple readers were created with same folder location with different file list, for inferschema all the readers refers same index file, which was causing bottle neck and JVM crash in case of JNI call. solution: Inferschema from the data file mentioned while building the reader. problem : Support list interface for projection, when SDK is called from other languages, JNI interface supports only list from other languages. so need to add list interface for projections. This closes #3197 --- .../core/metadata/schema/table/CarbonTable.java| 43 ++ .../apache/carbondata/sdk/file/CarbonReader.java | 7 +++- .../carbondata/sdk/file/CarbonReaderBuilder.java | 15 3 files changed, 24 insertions(+), 41 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java index c66d1fc..f9ba6f5 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java @@ -37,8 +37,6 @@ import org.apache.carbondata.core.datamap.DataMapStoreManager; import org.apache.carbondata.core.datamap.TableDataMap; import org.apache.carbondata.core.datamap.dev.DataMapFactory; import org.apache.carbondata.core.datastore.block.SegmentProperties; -import org.apache.carbondata.core.datastore.filesystem.CarbonFile; -import org.apache.carbondata.core.datastore.impl.FileFactory; import org.apache.carbondata.core.features.TableOperation; import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier; import org.apache.carbondata.core.metadata.CarbonTableIdentifier; @@ -252,12 +250,9 @@ public class CarbonTable implements Serializable { String tableName, Configuration configuration) throws IOException { TableInfo tableInfoInfer = CarbonUtil.buildDummyTableInfo(tablePath, "null", "null"); -CarbonFile carbonFile = getLatestIndexFile(FileFactory.getCarbonFile(tablePath, configuration)); -if (carbonFile == null) { - throw new RuntimeException("Carbon index file not exists."); -} -org.apache.carbondata.format.TableInfo tableInfo = CarbonUtil -.inferSchemaFromIndexFile(carbonFile.getPath(), tableName); +// InferSchema from data file +org.apache.carbondata.format.TableInfo tableInfo = +CarbonUtil.inferSchema(tablePath, tableName, false, configuration); List columnSchemaList = new ArrayList(); for (org.apache.carbondata.format.ColumnSchema thriftColumnSchema : tableInfo .getFact_table().getTable_columns()) { @@ -271,38 +266,6 @@ public class CarbonTable implements Serializable { return CarbonTable.buildFromTableInfo(tableInfoInfer); } - private static CarbonFile getLatestIndexFile(CarbonFile tablePath) { -CarbonFile[] carbonFiles = tablePath.listFiles(); -CarbonFile latestCarbonIndexFile = null; -long latestIndexFileTimestamp = 0L; -for (CarbonFile carbonFile : carbonFiles) { - if (carbonFile.getName().endsWith(CarbonTablePath.INDEX_FILE_EXT) - && carbonFile.getLastModifiedTime() > latestIndexFileTimestamp) { -latestCarbonIndexFile = carbonFile; -latestIndexFileTimestamp = carbonFile.getLastModifiedTime(); - } else if (carbonFile.isDirectory()) { -// if the list has directories that doesn't contain index files, -// continue checking other files/directories in the list. -if (getLatestIndexFile(carbonFile) == null) { - continue; -} else { -
[carbondata] 02/02: [CARBONDATA-3365] Integrate apache arrow vector filling to carbon SDK
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git commit f5cc9b748830c0251ee70a86aa62d8533762bb87 Author: ajantha-bhat AuthorDate: Tue Feb 26 20:03:41 2019 +0800 [CARBONDATA-3365] Integrate apache arrow vector filling to carbon SDK So, By integrating carbon to support filling arrow vector, contents read by carbondata files can be used for analytics in any programming language. say arrow vector filled from carbon java SDK can be read by python, c, c++ and many other languages supported by arrow. This will also increase the scope for carbondata use-cases and carbondata can be used for various applications as arrow is integrated already with many query engines. This closes #3193 --- .../carbondata/examples/CarbonSessionExample.scala | 180 ++--- .../hadoop/api/CarbonFileInputFormat.java | 20 +-- .../carbondata/hadoop/api/CarbonInputFormat.java | 3 - store/sdk/pom.xml | 31 +++- .../carbondata/sdk/file/ArrowCarbonReader.java | 106 .../apache/carbondata/sdk/file/CarbonReader.java | 10 -- .../carbondata/sdk/file/CarbonReaderBuilder.java | 67 +++- .../carbondata/sdk/file/CarbonSchemaReader.java| 16 ++ .../carbondata/sdk/file/arrow/ArrowConverter.java | 80 +++-- .../sdk/file/arrow/ArrowFieldWriter.java | 45 +- .../carbondata/sdk/file/arrow/ArrowUtils.java | 29 ++-- .../carbondata/sdk/file/arrow/ArrowWriter.java | 6 + .../file/arrow/ExtendedByteArrayOutputStream.java | 39 + .../carbondata/sdk/file/CarbonReaderTest.java | 135 14 files changed, 563 insertions(+), 204 deletions(-) diff --git a/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala b/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala index 3aa761e..b6921f2 100644 --- a/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala +++ b/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala @@ -37,7 +37,7 @@ object CarbonSessionExample { s"$rootPath/examples/spark2/src/main/resources/log4j.properties") CarbonProperties.getInstance() - .addProperty(CarbonCommonConstants.ENABLE_QUERY_STATISTICS, "false") + .addProperty(CarbonCommonConstants.ENABLE_QUERY_STATISTICS, "true") val spark = ExampleUtils.createCarbonSession("CarbonSessionExample") spark.sparkContext.setLogLevel("INFO") exampleBody(spark) @@ -49,96 +49,92 @@ object CarbonSessionExample { val rootPath = new File(this.getClass.getResource("/").getPath + "../../../..").getCanonicalPath -//spark.sql("DROP TABLE IF EXISTS source") -// -//// Create table -//spark.sql( -// s""" -// | CREATE TABLE source( -// | shortField SHORT, -// | intField INT, -// | bigintField LONG, -// | doubleField DOUBLE, -// | stringField STRING, -// | timestampField TIMESTAMP, -// | decimalField DECIMAL(18,2), -// | dateField DATE, -// | charField CHAR(5), -// | floatField FLOAT -// | ) -// | STORED AS carbondata -// """.stripMargin) -// -//val path = s"$rootPath/examples/spark2/src/main/resources/data.csv" -// -//// scalastyle:off -//spark.sql( -// s""" -// | LOAD DATA LOCAL INPATH '$path' -// | INTO TABLE source -// | OPTIONS('HEADER'='true', 'COMPLEX_DELIMITER_LEVEL_1'='#') -// """.stripMargin) -//// scalastyle:on -// -//spark.sql( -// s""" -// | SELECT charField, stringField, intField -// | FROM source -// | WHERE stringfield = 'spark' AND decimalField > 40 -// """.stripMargin).show() -// -//spark.sql( -// s""" -// | SELECT * -// | FROM source WHERE length(stringField) = 5 -// """.stripMargin).show() -// -//spark.sql( -// s""" -// | SELECT * -// | FROM source WHERE date_format(dateField, "-MM-dd") = "2015-07-23" -// """.stripMargin).show() -// -//spark.sql("SELECT count(stringField) FROM source").show() -// -//spark.sql( -// s""" -// | SELECT sum(intField), stringField -// | FROM source -// | GROUP BY stringField -// """.stripMargin).show() -// -//spark.sql( -// s""" -// | SELECT t1.*, t2.* -// | FROM source t1, sou
[carbondata] branch master updated (894216e -> f5cc9b7)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git. from 894216e [CARBONDATA-3386] Concurrent Merge index and query is failing new c85a11f [CARBONDATA-3365] Integrate apache arrow vector filling to carbon SDK new f5cc9b7 [CARBONDATA-3365] Integrate apache arrow vector filling to carbon SDK The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: store/sdk/pom.xml | 71 .../carbondata/sdk/file/ArrowCarbonReader.java | 106 ++ .../carbondata/sdk/file/CarbonReaderBuilder.java | 20 +- .../carbondata/sdk/file/CarbonSchemaReader.java| 16 + .../carbondata/sdk/file/arrow/ArrowConverter.java | 135 .../sdk/file/arrow/ArrowFieldWriter.java | 367 + .../carbondata/sdk/file/arrow/ArrowUtils.java | 112 +++ .../carbondata/sdk/file/arrow/ArrowWriter.java | 144 .../file/arrow/ExtendedByteArrayOutputStream.java | 36 +- .../carbondata/sdk/file/CarbonReaderTest.java | 135 10 files changed, 1119 insertions(+), 23 deletions(-) create mode 100644 store/sdk/src/main/java/org/apache/carbondata/sdk/file/ArrowCarbonReader.java create mode 100644 store/sdk/src/main/java/org/apache/carbondata/sdk/file/arrow/ArrowConverter.java create mode 100644 store/sdk/src/main/java/org/apache/carbondata/sdk/file/arrow/ArrowFieldWriter.java create mode 100644 store/sdk/src/main/java/org/apache/carbondata/sdk/file/arrow/ArrowUtils.java create mode 100644 store/sdk/src/main/java/org/apache/carbondata/sdk/file/arrow/ArrowWriter.java copy core/src/main/java/org/apache/carbondata/core/util/ReUsableByteArrayDataOutputStream.java => store/sdk/src/main/java/org/apache/carbondata/sdk/file/arrow/ExtendedByteArrayOutputStream.java (54%)
[carbondata] 01/02: [CARBONDATA-3365] Integrate apache arrow vector filling to carbon SDK
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git commit c85a11f0f900180dcf36976809c3d244fb24c161 Author: kumarvishal09 AuthorDate: Wed Feb 6 18:10:43 2019 +0530 [CARBONDATA-3365] Integrate apache arrow vector filling to carbon SDK So, By integrating carbon to support filling arrow vector, contents read by carbondata files can be used for analytics in any programming language. say arrow vector filled from carbon java SDK can be read by python, c, c++ and many other languages supported by arrow. This will also increase the scope for carbondata use-cases and carbondata can be used for various applications as arrow is integrated already with many query engines. This closes #3193 --- .../carbondata/examples/CarbonSessionExample.scala | 180 +-- .../hadoop/api/CarbonFileInputFormat.java | 20 +- .../carbondata/hadoop/api/CarbonInputFormat.java | 3 + store/sdk/pom.xml | 50 .../apache/carbondata/sdk/file/CarbonReader.java | 10 + .../carbondata/sdk/file/CarbonReaderBuilder.java | 49 +++ .../carbondata/sdk/file/arrow/ArrowConverter.java | 73 + .../sdk/file/arrow/ArrowFieldWriter.java | 328 + .../carbondata/sdk/file/arrow/ArrowUtils.java | 111 +++ .../carbondata/sdk/file/arrow/ArrowWriter.java | 138 + 10 files changed, 873 insertions(+), 89 deletions(-) diff --git a/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala b/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala index b6921f2..3aa761e 100644 --- a/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala +++ b/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala @@ -37,7 +37,7 @@ object CarbonSessionExample { s"$rootPath/examples/spark2/src/main/resources/log4j.properties") CarbonProperties.getInstance() - .addProperty(CarbonCommonConstants.ENABLE_QUERY_STATISTICS, "true") + .addProperty(CarbonCommonConstants.ENABLE_QUERY_STATISTICS, "false") val spark = ExampleUtils.createCarbonSession("CarbonSessionExample") spark.sparkContext.setLogLevel("INFO") exampleBody(spark) @@ -49,92 +49,96 @@ object CarbonSessionExample { val rootPath = new File(this.getClass.getResource("/").getPath + "../../../..").getCanonicalPath -spark.sql("DROP TABLE IF EXISTS source") - -// Create table -spark.sql( - s""" - | CREATE TABLE source( - | shortField SHORT, - | intField INT, - | bigintField LONG, - | doubleField DOUBLE, - | stringField STRING, - | timestampField TIMESTAMP, - | decimalField DECIMAL(18,2), - | dateField DATE, - | charField CHAR(5), - | floatField FLOAT - | ) - | STORED AS carbondata - """.stripMargin) - -val path = s"$rootPath/examples/spark2/src/main/resources/data.csv" - -// scalastyle:off -spark.sql( - s""" - | LOAD DATA LOCAL INPATH '$path' - | INTO TABLE source - | OPTIONS('HEADER'='true', 'COMPLEX_DELIMITER_LEVEL_1'='#') - """.stripMargin) -// scalastyle:on - -spark.sql( - s""" - | SELECT charField, stringField, intField - | FROM source - | WHERE stringfield = 'spark' AND decimalField > 40 - """.stripMargin).show() - -spark.sql( - s""" - | SELECT * - | FROM source WHERE length(stringField) = 5 - """.stripMargin).show() - -spark.sql( - s""" - | SELECT * - | FROM source WHERE date_format(dateField, "-MM-dd") = "2015-07-23" - """.stripMargin).show() - -spark.sql("SELECT count(stringField) FROM source").show() - -spark.sql( - s""" - | SELECT sum(intField), stringField - | FROM source - | GROUP BY stringField - """.stripMargin).show() - -spark.sql( - s""" - | SELECT t1.*, t2.* - | FROM source t1, source t2 - | WHERE t1.stringField = t2.stringField - """.stripMargin).show() - -spark.sql( - s""" - | WITH t1 AS ( - | SELECT * FROM source - | UNION ALL - | SELECT * FROM source - | ) - | SELECT t1.*, t2.* - | FROM t1, source t2 - | WHERE t1.stringField = t2.stri
svn commit: r34085 - in /dev/carbondata/1.5.4-rc1: ./ apache-carbondata-1.5.4-source-release.zip apache-carbondata-1.5.4-source-release.zip.asc apache-carbondata-1.5.4-source-release.zip.sha512
Author: ravipesala Date: Fri May 17 13:40:24 2019 New Revision: 34085 Log: Upload 1.5.4-rc1 Added: dev/carbondata/1.5.4-rc1/ dev/carbondata/1.5.4-rc1/apache-carbondata-1.5.4-source-release.zip (with props) dev/carbondata/1.5.4-rc1/apache-carbondata-1.5.4-source-release.zip.asc (with props) dev/carbondata/1.5.4-rc1/apache-carbondata-1.5.4-source-release.zip.sha512 Added: dev/carbondata/1.5.4-rc1/apache-carbondata-1.5.4-source-release.zip == Binary file - no diff available. Propchange: dev/carbondata/1.5.4-rc1/apache-carbondata-1.5.4-source-release.zip -- svn:mime-type = application/octet-stream Added: dev/carbondata/1.5.4-rc1/apache-carbondata-1.5.4-source-release.zip.asc == Binary file - no diff available. Propchange: dev/carbondata/1.5.4-rc1/apache-carbondata-1.5.4-source-release.zip.asc -- svn:mime-type = application/octet-stream Added: dev/carbondata/1.5.4-rc1/apache-carbondata-1.5.4-source-release.zip.sha512 == --- dev/carbondata/1.5.4-rc1/apache-carbondata-1.5.4-source-release.zip.sha512 (added) +++ dev/carbondata/1.5.4-rc1/apache-carbondata-1.5.4-source-release.zip.sha512 Fri May 17 13:40:24 2019 @@ -0,0 +1 @@ +505d02818bae28b2cad475b49960c4c28fcf8cbbe171b0cc79139005db0de167395ed92e3db011cfdf6598a5a5f13a81cc085501c762d32bbe35a430e7b9fee8 apache-carbondata-1.5.4-source-release.zip
[carbondata] branch branch-1.5 updated: [maven-release-plugin] prepare for next development iteration
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a commit to branch branch-1.5 in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/branch-1.5 by this push: new 31a2d14 [maven-release-plugin] prepare for next development iteration 31a2d14 is described below commit 31a2d1432b23e8350f319d0ac88bfffada3a74d4 Author: ravipesala AuthorDate: Fri May 17 14:27:36 2019 +0530 [maven-release-plugin] prepare for next development iteration --- assembly/pom.xml | 2 +- common/pom.xml| 2 +- core/pom.xml | 2 +- datamap/bloom/pom.xml | 2 +- datamap/examples/pom.xml | 2 +- datamap/lucene/pom.xml| 2 +- datamap/mv/core/pom.xml | 2 +- datamap/mv/plan/pom.xml | 2 +- examples/spark2/pom.xml | 2 +- format/pom.xml| 2 +- hadoop/pom.xml| 2 +- integration/hive/pom.xml | 2 +- integration/presto/pom.xml| 2 +- integration/spark-common-test/pom.xml | 2 +- integration/spark-common/pom.xml | 2 +- integration/spark-datasource/pom.xml | 2 +- integration/spark2/pom.xml| 2 +- pom.xml | 4 ++-- processing/pom.xml| 2 +- store/sdk/pom.xml | 2 +- streaming/pom.xml | 2 +- tools/cli/pom.xml | 2 +- 22 files changed, 23 insertions(+), 23 deletions(-) diff --git a/assembly/pom.xml b/assembly/pom.xml index 7206b0d..f1586cb 100644 --- a/assembly/pom.xml +++ b/assembly/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../pom.xml diff --git a/common/pom.xml b/common/pom.xml index 5fa7df8..4844768 100644 --- a/common/pom.xml +++ b/common/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../pom.xml diff --git a/core/pom.xml b/core/pom.xml index 56cfaf5..ec55faf 100644 --- a/core/pom.xml +++ b/core/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../pom.xml diff --git a/datamap/bloom/pom.xml b/datamap/bloom/pom.xml index fdc2f62..ab5e29c 100644 --- a/datamap/bloom/pom.xml +++ b/datamap/bloom/pom.xml @@ -4,7 +4,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../../pom.xml diff --git a/datamap/examples/pom.xml b/datamap/examples/pom.xml index 08693f0..0c9d804 100644 --- a/datamap/examples/pom.xml +++ b/datamap/examples/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../../pom.xml diff --git a/datamap/lucene/pom.xml b/datamap/lucene/pom.xml index dfd09f6..ee06416 100644 --- a/datamap/lucene/pom.xml +++ b/datamap/lucene/pom.xml @@ -4,7 +4,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../../pom.xml diff --git a/datamap/mv/core/pom.xml b/datamap/mv/core/pom.xml index 9ee517c..b92dc0e 100644 --- a/datamap/mv/core/pom.xml +++ b/datamap/mv/core/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../../../pom.xml diff --git a/datamap/mv/plan/pom.xml b/datamap/mv/plan/pom.xml index 4ee274e..3d18384 100644 --- a/datamap/mv/plan/pom.xml +++ b/datamap/mv/plan/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../../../pom.xml diff --git a/examples/spark2/pom.xml b/examples/spark2/pom.xml index 1bc9247..88a99f6 100644 --- a/examples/spark2/pom.xml +++ b/examples/spark2/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../../pom.xml diff --git a/format/pom.xml b/format/pom.xml index 3b4bcee..45875d7 100644 --- a/format/pom.xml +++ b/format/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../pom.xml diff --git a/hadoop/pom.xml b/hadoop/pom.xml index 6780f07..9bfc789 100644 --- a/hadoop/pom.xml +++ b/hadoop/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../pom.xml diff --git a/integration/hive/pom.xml b/integration/hive/pom.xml index 649eec4..a990b44 100644 --- a/integration/hive/pom.xml +++ b/integration/hive/pom.xml @@ -22,7 +22,7 @@ org.apache.carbondata carbondata-parent -1.5.4 +1.5.5-SNAPSHOT ../../pom.xml diff --git a/integration/presto/pom.xml b/integration/presto/pom.xml index a4a9aba..4dacce1 100644 --- a/integration/presto/pom.xml +++ b/integration/presto/pom.xml @@ -22,7 +22,7
[carbondata] annotated tag apache-carbondata-1.5.4-rc1 created (now 5a55d9b)
This is an automated email from the ASF dual-hosted git repository. ravipesala pushed a change to annotated tag apache-carbondata-1.5.4-rc1 in repository https://gitbox.apache.org/repos/asf/carbondata.git. at 5a55d9b (tag) tagging 1f2e184b81bef4e861b4dd32be94dc50bada6b68 (commit) replaces apache-carbondata-1.5.3-rc1 by ravipesala on Fri May 17 14:27:20 2019 +0530 - Log - [maven-release-plugin] copy for tag apache-carbondata-1.5.4-rc1 --- No new revisions were added by this update.