[carbondata] branch master updated: [DOC] Fix the spell mistake of enable.unsafe.in.query.processing
This is an automated email from the ASF dual-hosted git repository. xuchuanyin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 9854f20 [DOC] Fix the spell mistake of enable.unsafe.in.query.processing 9854f20 is described below commit 9854f2033b4bbac0698f8cadd6c5872ab0423c2d Author: qiuchenjian <807169...@qq.com> AuthorDate: Sat Mar 23 09:28:32 2019 +0800 [DOC] Fix the spell mistake of enable.unsafe.in.query.processing Fix the spell mistake of enable.unsafe.in.query.processing This closes #3160 --- docs/usecases.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/usecases.md b/docs/usecases.md index 0cfcf85..8ff4975 100644 --- a/docs/usecases.md +++ b/docs/usecases.md @@ -148,8 +148,8 @@ Use all columns are no-dictionary as the cardinality is high. | Compaction | carbon.number.of.cores.while.compacting | 12 | Higher number of cores can improve the compaction speed.Data size is huge.Compaction need to use more threads to speed up the process | | Compaction | carbon.enable.auto.load.merge | FALSE | Doing auto minor compaction is costly process as data size is huge.Perform manual compaction when the cluster is less loaded | | Query | carbon.enable.vector.reader | true| To fetch results faster, supporting spark vector processing will speed up the query | -| Query | enable.unsafe.in.query.procressing | true| Data that needs to be scanned in huge which in turn generates more short lived Java objects. This cause pressure of GC.using unsafe and offheap will reduce the GC overhead | -| Query | use.offheap.in.query.processing | true| Data that needs to be scanned in huge which in turn generates more short lived Java objects. This cause pressure of GC.using unsafe and offheap will reduce the GC overhead.offheap can be accessed through java unsafe.hence enable.unsafe.in.query.procressing needs to be true | +| Query | enable.unsafe.in.query.processing | true| Data that needs to be scanned in huge which in turn generates more short lived Java objects. This cause pressure of GC.using unsafe and offheap will reduce the GC overhead | +| Query | use.offheap.in.query.processing | true| Data that needs to be scanned in huge which in turn generates more short lived Java objects. This cause pressure of GC.using unsafe and offheap will reduce the GC overhead.offheap can be accessed through java unsafe.hence enable.unsafe.in.query.processing needs to be true | | Query | enable.unsafe.columnpage| TRUE| Keep the column pages in offheap memory so that the memory overhead due to java object is less and also reduces GC pressure. | | Query | carbon.unsafe.working.memory.in.mb | 10240 | Amount of memory to use for offheap operations, you can increase this memory based on the data size |
[carbondata] branch master updated: [DOC] Update the doc of "Show DataMap"
This is an automated email from the ASF dual-hosted git repository. xuchuanyin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 1825861 [DOC] Update the doc of "Show DataMap" 1825861 is described below commit 182586164283c6f5dc32a1c9b488fbc6b25d44d7 Author: qiuchenjian <807169...@qq.com> AuthorDate: Wed Jan 30 09:23:21 2019 +0800 [DOC] Update the doc of "Show DataMap" update the doc of showing datamap This closes #3117 --- docs/datamap/datamap-management.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/datamap/datamap-management.md b/docs/datamap/datamap-management.md index 0dc4718..087c70a 100644 --- a/docs/datamap/datamap-management.md +++ b/docs/datamap/datamap-management.md @@ -141,6 +141,7 @@ There is a SHOW DATAMAPS command, when this is issued, system will read all data - DataMapName - DataMapProviderName like mv, preaggreagte, timeseries, etc - Associated Table +- DataMap Properties ### Compaction on DataMap
[carbondata] branch master updated: [CARBONDATA-3281] Add validation for the size of the LRU cache
This is an automated email from the ASF dual-hosted git repository. xuchuanyin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new e443a94 [CARBONDATA-3281] Add validation for the size of the LRU cache e443a94 is described below commit e443a943c63b0981603179d75b07d4006a925109 Author: litt AuthorDate: Tue Jan 29 16:34:40 2019 +0800 [CARBONDATA-3281] Add validation for the size of the LRU cache If configure the LRU bigger than jvm xmx size, then use CARBON_MAX_LRU_CACHE_SIZE_DEFAULT instead.Because if setting LRU bigger than xmx size,if we query for a big table with too many carbonfiles, it may cause "Error: java.io.IOException: Problem in loading segment blocks: GC overhead limit exceeded (state=,code=0)" and the jdbc server will restart. This closes #3118 --- .../carbondata/core/cache/CarbonLRUCache.java | 32 ++ .../core/constants/CarbonCommonConstants.java | 5 .../carbondata/core/cache/CarbonLRUCacheTest.java | 7 + 3 files changed, 44 insertions(+) diff --git a/core/src/main/java/org/apache/carbondata/core/cache/CarbonLRUCache.java b/core/src/main/java/org/apache/carbondata/core/cache/CarbonLRUCache.java index 0c75173..3371d0d 100644 --- a/core/src/main/java/org/apache/carbondata/core/cache/CarbonLRUCache.java +++ b/core/src/main/java/org/apache/carbondata/core/cache/CarbonLRUCache.java @@ -70,6 +70,16 @@ public final class CarbonLRUCache { + CarbonCommonConstants.CARBON_MAX_LRU_CACHE_SIZE_DEFAULT); lruCacheMemorySize = Long.parseLong(defaultPropertyName); } + +// if lru cache is bigger than jvm max heap then set part size of max heap (60% default) +if (isBeyondMaxMemory()) { + double changeSize = getPartOfXmx(); + LOGGER.warn("Configured LRU size " + lruCacheMemorySize + + "MB exceeds the max size of JVM heap. Carbon will fallback to use " + + changeSize + " MB instead"); + lruCacheMemorySize = (long)changeSize; +} + initCache(); if (lruCacheMemorySize > 0) { LOGGER.info("Configured LRU cache size is " + lruCacheMemorySize + " MB"); @@ -326,4 +336,26 @@ public final class CarbonLRUCache { public Map getCacheMap() { return lruCacheMap; } + + /** + * Check if LRU cache setting is bigger than max memory of jvm. + * if LRU cache is bigger than max memory of jvm when query for a big segments table, + * may cause JDBC server crash. + * @return true LRU cache is bigger than max memory of jvm, false otherwise + */ + private boolean isBeyondMaxMemory() { +long mSize = Runtime.getRuntime().maxMemory(); +long lruSize = lruCacheMemorySize * BYTE_CONVERSION_CONSTANT; +return lruSize >= mSize; + } + + /** + * when LRU cache is bigger than max heap of jvm. + * set to part of max heap size, use CARBON_LRU_CACHE_PERCENT_OVER_MAX_SIZE default 60%. + * @return the LRU cache size + */ + private double getPartOfXmx() { +long mSizeMB = Runtime.getRuntime().maxMemory() / BYTE_CONVERSION_CONSTANT; +return mSizeMB * CarbonCommonConstants.CARBON_LRU_CACHE_PERCENT_OVER_MAX_SIZE; + } } diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index f5c07a4..69374ad 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1257,6 +1257,11 @@ public final class CarbonCommonConstants { public static final String CARBON_MAX_LRU_CACHE_SIZE_DEFAULT = "-1"; /** + * when LRU cache if beyond the jvm max memory size,set 60% percent of max size + */ + public static final double CARBON_LRU_CACHE_PERCENT_OVER_MAX_SIZE = 0.6d; + + /** * property to enable min max during filter query */ @CarbonProperty diff --git a/core/src/test/java/org/apache/carbondata/core/cache/CarbonLRUCacheTest.java b/core/src/test/java/org/apache/carbondata/core/cache/CarbonLRUCacheTest.java index 0493655..8ef6684 100644 --- a/core/src/test/java/org/apache/carbondata/core/cache/CarbonLRUCacheTest.java +++ b/core/src/test/java/org/apache/carbondata/core/cache/CarbonLRUCacheTest.java @@ -60,6 +60,13 @@ public class CarbonLRUCacheTest { assertNull(carbonLRUCache.get("Column2")); } + @Test public void testBiggerThanMaxSizeConfiguration() { +CarbonLRUCache carbonLRUCacheForConfig = +new CarbonLRUCache("prop2", "20");//200GB +assertTrue(carbonLRUCacheForConfig.put("Column1", cacheable, 10L)); +assertFalse(carbonLRUCacheForConfig.put("Column2", cacheable, 107374182400L)
[carbondata] branch master updated: [CARBONDATA-3305] Support show metacache command to list the cache sizes for all tables
This is an automated email from the ASF dual-hosted git repository. xuchuanyin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 718be37 [CARBONDATA-3305] Support show metacache command to list the cache sizes for all tables 718be37 is described below commit 718be37295a55de3317191118bf74720e4de800f Author: QiangCai AuthorDate: Thu Jan 17 15:39:33 2019 +0800 [CARBONDATA-3305] Support show metacache command to list the cache sizes for all tables >>> SHOW METACACHE +++--++---+ |Database|Table |Index size|Datamap size|Dictionary size| +++--++---+ |ALL |ALL |842 Bytes |982 Bytes |80.34 KB | |default |ALL |842 Bytes |982 Bytes |80.34 KB | |default |t1 |225 Bytes |982 Bytes |0 | |default |t1_dpagg|259 Bytes |0 |0 | |default |t2 |358 Bytes |0 |80.34 KB | +++--++---+ >>> SHOW METACACHE FOR TABLE t1 +--+-+--+ |Field |Size |Comment | +--+-+--+ |Index |225 Bytes|1/1 index files cached| |Dictionary|0| | |dpagg |259 Bytes|preaggregate | |dblom |982 Bytes|bloomfilter | +--+-+--+ >>> SHOW METACACHE FOR TABLE t2 +--+-+--+ |Field |Size |Comment | +--+-+--+ |Index |358 Bytes|2/2 index files cached| |Dictionary|80.34 KB | | +--+-+--+ This closes #3078 --- .../carbondata/core/cache/CacheProvider.java | 4 + .../carbondata/core/cache/CarbonLRUCache.java | 4 + docs/ddl-of-carbondata.md | 21 ++ .../sql/commands/TestCarbonShowCacheCommand.scala | 163 +++ .../spark/sql/catalyst/CarbonDDLSqlParser.scala| 1 + .../command/cache/CarbonShowCacheCommand.scala | 312 + .../spark/sql/parser/CarbonSpark2SqlParser.scala | 12 +- 7 files changed, 515 insertions(+), 2 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/cache/CacheProvider.java b/core/src/main/java/org/apache/carbondata/core/cache/CacheProvider.java index 99b1693..deb48e2 100644 --- a/core/src/main/java/org/apache/carbondata/core/cache/CacheProvider.java +++ b/core/src/main/java/org/apache/carbondata/core/cache/CacheProvider.java @@ -195,4 +195,8 @@ public class CacheProvider { } cacheTypeToCacheMap.clear(); } + + public CarbonLRUCache getCarbonCache() { +return carbonLRUCache; + } } diff --git a/core/src/main/java/org/apache/carbondata/core/cache/CarbonLRUCache.java b/core/src/main/java/org/apache/carbondata/core/cache/CarbonLRUCache.java index 87254e3..74ff8a0 100644 --- a/core/src/main/java/org/apache/carbondata/core/cache/CarbonLRUCache.java +++ b/core/src/main/java/org/apache/carbondata/core/cache/CarbonLRUCache.java @@ -305,4 +305,8 @@ public final class CarbonLRUCache { lruCacheMap.clear(); } } + + public Map getCacheMap() { +return lruCacheMap; + } } diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md index 0d0e5bd..3476475 100644 --- a/docs/ddl-of-carbondata.md +++ b/docs/ddl-of-carbondata.md @@ -67,6 +67,7 @@ CarbonData DDL statements are documented here,which includes: * [SPLIT PARTITION](#split-a-partition) * [DROP PARTITION](#drop-a-partition) * [BUCKETING](#bucketing) +* [CACHE](#cache) ## CREATE TABLE @@ -1088,4 +1089,24 @@ Users can specify which columns to include and exclude for local dictionary gene TBLPROPERTIES ('BUCKETNUMBER'='4', 'BUCKETCOLUMNS'='productName') ``` +## CACHE + CarbonData internally uses LRU caching to improve the performance. The user can get information + about current cache used status in memory through the following command: + + ```sql + SHOW METADATA + ``` + + This shows the overall memory consumed in the cache by categories - index files, dictionary and + datamaps. This also shows the cache usage by all the tables and children tables in the current + database. + + ```sql + SHOW METADATA ON TABLE tableName + ``` + + This shows detailed information on cache usage by the table `tableName` and its carbonindex files, + its dictionary files, its datamaps and children tables. + + This command is not allowed on child tables. diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/sql/commands/TestCarbonShowCacheCommand.scala b/integr
[carbondata] branch master updated: [CARBONDATA-2447] Block update operation on range/list/hash partition table
This is an automated email from the ASF dual-hosted git repository. xuchuanyin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 1470f78 [CARBONDATA-2447] Block update operation on range/list/hash partition table 1470f78 is described below commit 1470f786451c841841cee21cd7d4a3bdf95ecc76 Author: qiuchenjian <807169...@qq.com> AuthorDate: Tue Jan 22 10:17:00 2019 +0800 [CARBONDATA-2447] Block update operation on range/list/hash partition table [problem] when update the data on range partition table, it will lost data or update failed , see the jira or new test case [Cause] Range partition table take taskNo in filename as partitionId, when update the taskNo is inscreasing ,the taskNo didn't changed with partitionId [Solution] (1) When query the range partition table, don't match the partitionid ---this method losses the meaning of partition (2) Range partition table use directory or seperate part as partitionid ---this is not necessary and suggest to use standard partition (3) Range partition table doesn't support update opretion ---this PR use this method This closes #3091 --- .../partition/TestUpdateForPartitionTable.scala| 71 ++ .../mutation/CarbonProjectForUpdateCommand.scala | 9 ++- 2 files changed, 79 insertions(+), 1 deletion(-) diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/partition/TestUpdateForPartitionTable.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/partition/TestUpdateForPartitionTable.scala new file mode 100644 index 000..14dab1e --- /dev/null +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/partition/TestUpdateForPartitionTable.scala @@ -0,0 +1,71 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.spark.testsuite.partition + +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest. BeforeAndAfterAll + +class TestUpdateForPartitionTable extends QueryTest with BeforeAndAfterAll { + + override def beforeAll: Unit = { +dropTable + +sql("create table test_range_partition_table (id int) partitioned by (name string) " + + "stored by 'carbondata' TBLPROPERTIES('PARTITION_TYPE' = 'RANGE','RANGE_INFO' = 'a,e,f')") +sql("create table test_hive_partition_table (id int) partitioned by (name string) " + + "stored by 'carbondata'") +sql("create table test_hash_partition_table (id int) partitioned by (name string) " + + "stored by 'carbondata' TBLPROPERTIES('PARTITION_TYPE' = 'HASH','NUM_PARTITIONS' = '2')") +sql("create table test_list_partition_table (id int) partitioned by (name string) " + + "stored by 'carbondata' TBLPROPERTIES('PARTITION_TYPE' = 'LIST','LIST_INFO' = 'a,e,f')") + } + + def dropTable = { +sql("drop table if exists test_hash_partition_table") +sql("drop table if exists test_list_partition_table") +sql("drop table if exists test_range_partition_table") +sql("drop table if exists test_hive_partition_table") + } + + + test ("test update for unsupported partition table") { +val updateTables = Array( + "test_range_partition_table", + "test_list_partition_table", + "test_hash_partition_table") + +updateTables.foreach(table => { + sql("insert into " + table + " select 1,'b' ") + val ex = intercept[UnsupportedOperationException] { +sql("update " + table + " set (name) = ('c') where id = 1").collect() + } + assertResult("Unsupported update operation for range/hash/list partition table")(ex.getMessage) +}) + + } + + test ("test update for hive(standard) partition table") { + +sql("insert into test_hive_partition_table select 1,'b' ") +sql(&q
[carbondata] branch master updated: [CARBONDATA-3107] Optimize error/exception coding for better debugging
This is an automated email from the ASF dual-hosted git repository. xuchuanyin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new 9672a10 [CARBONDATA-3107] Optimize error/exception coding for better debugging 9672a10 is described below commit 9672a1032a77502f67dbc81e6e2758bfe2eaeebf Author: Manhua AuthorDate: Tue Oct 30 09:44:59 2018 +0800 [CARBONDATA-3107] Optimize error/exception coding for better debugging Some error log in carbon is only a single line of conclusion message (like "Dataload failed"), and when we look into codes may found that is newly created exception without original exception, so more jobs need to be done to find the root cause. To better locate the root cause when carbon fails, this PR propose to keep the original throwable for logging when wrapping it to another exception, and also log stack trace alone with error message. Changes in this PR follows these rules: (`e` is an exception) | Code Sample | Problem | Suggest Modification | | --- | --- | --- | | `LOGGER.error(e);` | no stack trace(e is taken as message instead of throwable) | `LOGGER.error(e.getMessage(), e);` | | `LOGGER.error(e.getMessage());` | no stack trace | `LOGGER.error(e.getMessage(), e);` | | `catch ... throw new Exception("Error occur")` | useless message, no stack trace | `throw new Exception(e)` | | `catch ... throw new Exception(e.getMessage())` | no stack trace | `throw new Exception(e)` | | `catch ... throw new Exception(e.getMessage(), e)` | no need to call `getMessage()` | `throw new Exception(e)` | | `catch ... throw new Exception("Error occur: " + e.getMessage(), e)` | useless message | `throw new Exception(e)` | | `catch ... throw new Exception("DataLoad fail: " + e.getMessage())` | no stack trace | `throw new Exception("DataLoad fail: " + e.getMessage(), e)` | Some exceptions, such as MalformedCarbonCommandException, InterruptException, NumberFormatException, InvalidLoadOptionException and NoRetryException, do not have Constructor using `Throwable`, so this PR does not change it. This closes #2878 --- .../cache/dictionary/ForwardDictionaryCache.java | 2 +- .../cache/dictionary/ReverseDictionaryCache.java | 2 +- .../core/datamap/DataMapStoreManager.java | 2 +- .../carbondata/core/datamap/DataMapUtil.java | 2 +- .../filesystem/AbstractDFSCarbonFile.java | 14 ++--- .../datastore/filesystem/AlluxioCarbonFile.java| 2 +- .../core/datastore/filesystem/HDFSCarbonFile.java | 2 +- .../core/datastore/filesystem/LocalCarbonFile.java | 2 +- .../core/datastore/filesystem/S3CarbonFile.java| 2 +- .../datastore/filesystem/ViewFSCarbonFile.java | 2 +- .../core/datastore/impl/FileFactory.java | 4 ++-- .../client/NonSecureDictionaryClient.java | 2 +- .../client/NonSecureDictionaryClientHandler.java | 4 ++-- .../generator/TableDictionaryGenerator.java| 2 +- .../server/NonSecureDictionaryServerHandler.java | 2 +- .../service/AbstractDictionaryServer.java | 8 .../core/indexstore/BlockletDataMapIndexStore.java | 4 ++-- .../core/indexstore/BlockletDetailInfo.java| 2 +- .../timestamp/DateDirectDictionaryGenerator.java | 4 ++-- .../carbondata/core/locks/ZookeeperInit.java | 2 +- .../core/metadata/schema/table/CarbonTable.java| 2 +- .../carbondata/core/mutate/CarbonUpdateUtil.java | 4 ++-- .../core/reader/CarbonDeleteFilesDataReader.java | 14 ++--- .../carbondata/core/scan/filter/FilterUtil.java| 16 +++ .../core/scan/result/BlockletScannedResult.java| 6 +++--- .../AbstractDetailQueryResultIterator.java | 2 +- .../core/statusmanager/LoadMetadataDetails.java| 6 +++--- .../core/statusmanager/SegmentStatusManager.java | 6 +++--- .../carbondata/core/util/CarbonProperties.java | 2 +- .../apache/carbondata/core/util/CarbonUtil.java| 8 .../apache/carbondata/core/util/DataTypeUtil.java | 8 .../core/util/ObjectSerializationUtil.java | 4 ++-- .../carbondata/core/util/path/HDFSLeaseUtils.java | 2 +- .../datastore/filesystem/HDFSCarbonFileTest.java | 2 +- .../core/load/LoadMetadataDetailsUnitTest.java | 2 +- .../bloom/BloomCoarseGrainDataMapFactory.java | 2 +- .../datamap/lucene/LuceneFineGrainDataMap.java | 6 +++--- .../lucene/LuceneFineGrainDataMapFactory.java | 2 +- .../hive/CarbonDictionaryDecodeReadSupport.java| 2 +- .../carbondata/hive/MapredCarbonInputFormat.java | 2 +- .../carbondata/presto/impl/CarbonTableReader.java | 12 +-- .../client/SecureDicti
[carbondata] branch master updated: [CARBONDATA-3278] Remove duplicate code to get filter string of date/timestamp
This is an automated email from the ASF dual-hosted git repository. xuchuanyin pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git The following commit(s) were added to refs/heads/master by this push: new d84f721 [CARBONDATA-3278] Remove duplicate code to get filter string of date/timestamp d84f721 is described below commit d84f721e847ed33ece605213b2c71971f215b733 Author: Manhua AuthorDate: Mon Jan 28 15:55:45 2019 +0800 [CARBONDATA-3278] Remove duplicate code to get filter string of date/timestamp Remove duplicated code to get filter string of date/timestamp by method `ExpressionResult.getString()` instead. This closes #3109 --- .../datamap/bloom/BloomCoarseGrainDataMap.java | 41 +++--- 1 file changed, 5 insertions(+), 36 deletions(-) diff --git a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java index 4459fc5..fea48c3 100644 --- a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java +++ b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java @@ -19,17 +19,13 @@ package org.apache.carbondata.datamap.bloom; import java.io.IOException; import java.io.UnsupportedEncodingException; -import java.text.DateFormat; -import java.text.SimpleDateFormat; import java.util.ArrayList; import java.util.Arrays; -import java.util.Date; import java.util.HashMap; import java.util.HashSet; import java.util.List; import java.util.Map; import java.util.Set; -import java.util.TimeZone; import java.util.concurrent.ConcurrentHashMap; import org.apache.carbondata.common.annotations.InterfaceAudience; @@ -57,6 +53,7 @@ import org.apache.carbondata.core.scan.expression.LiteralExpression; import org.apache.carbondata.core.scan.expression.conditional.EqualToExpression; import org.apache.carbondata.core.scan.expression.conditional.InExpression; import org.apache.carbondata.core.scan.expression.conditional.ListExpression; +import org.apache.carbondata.core.scan.expression.exception.FilterIllegalMemberException; import org.apache.carbondata.core.scan.expression.logical.AndExpression; import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; import org.apache.carbondata.core.util.CarbonProperties; @@ -303,35 +300,6 @@ public class BloomCoarseGrainDataMap extends CoarseGrainDataMap { return queryModels; } - /** - * Here preprocessed NULL and date/timestamp data type. - * - * Note that if the datatype is date/timestamp, the expressionValue is long type. - */ - private Object getLiteralExpValue(LiteralExpression le) { -Object expressionValue = le.getLiteralExpValue(); -Object literalValue; - -if (null == expressionValue) { - literalValue = null; -} else if (le.getLiteralExpDataType() == DataTypes.DATE) { - DateFormat format = new SimpleDateFormat(CarbonCommonConstants.CARBON_DATE_DEFAULT_FORMAT); - // the below settings are set statically according to DateDirectDirectionaryGenerator - format.setLenient(false); - format.setTimeZone(TimeZone.getTimeZone("GMT")); - literalValue = format.format(new Date((long) expressionValue / 1000)); -} else if (le.getLiteralExpDataType() == DataTypes.TIMESTAMP) { - DateFormat format = - new SimpleDateFormat(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT); - // the below settings are set statically according to TimeStampDirectDirectionaryGenerator - format.setLenient(false); - literalValue = format.format(new Date((long) expressionValue / 1000)); -} else { - literalValue = expressionValue; -} -return literalValue; - } - private BloomQueryModel buildQueryModelForEqual(ColumnExpression ce, LiteralExpression le) throws DictionaryGenerationException, UnsupportedEncodingException { @@ -358,11 +326,12 @@ public class BloomCoarseGrainDataMap extends CoarseGrainDataMap { private byte[] getInternalFilterValue(CarbonColumn carbonColumn, LiteralExpression le) throws DictionaryGenerationException, UnsupportedEncodingException { -Object filterLiteralValue = getLiteralExpValue(le); // convert the filter value to string and apply converters on it to get carbon internal value String strFilterValue = null; -if (null != filterLiteralValue) { - strFilterValue = String.valueOf(filterLiteralValue); +try { + strFilterValue = le.getExpressionResult().getString(); +} catch (FilterIllegalMemberException e) { + throw new RuntimeException("Error while resolving filter expression", e); } Object convertedValue = this.name2Converters.get(carbonColumn.getColName()).convert(
carbondata git commit: [CARBONDATA-3181][BloomDataMap] Fix access field error for BitSet in bloom filter
Repository: carbondata Updated Branches: refs/heads/master d7ff3e688 -> 96ce00758 [CARBONDATA-3181][BloomDataMap] Fix access field error for BitSet in bloom filter Problem java.lang.IllegalAccessError is thrown when query on bloom filter without compress on CarbonThriftServer. Analyse similar problem was occur when get/set BitSet in CarbonBloomFilter, it uses reflection to solve. We can do it like this. Since we have set the BitSet already, another easier way is to call super class method to avoid accessing it from CarbonBloomFilter Solution if bloom filter is not compressed, call super method to test membership This closes #3000 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/96ce0075 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/96ce0075 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/96ce0075 Branch: refs/heads/master Commit: 96ce00758c3f1153f7ed5dccb43533be55dd1e5e Parents: d7ff3e6 Author: Manhua Authored: Wed Dec 19 11:30:46 2018 +0800 Committer: xuchuanyin Committed: Thu Dec 20 19:08:37 2018 +0800 -- .../hadoop/util/bloom/CarbonBloomFilter.java| 20 1 file changed, 8 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/96ce0075/datamap/bloom/src/main/java/org/apache/hadoop/util/bloom/CarbonBloomFilter.java -- diff --git a/datamap/bloom/src/main/java/org/apache/hadoop/util/bloom/CarbonBloomFilter.java b/datamap/bloom/src/main/java/org/apache/hadoop/util/bloom/CarbonBloomFilter.java index 4b111df..fb0c779 100644 --- a/datamap/bloom/src/main/java/org/apache/hadoop/util/bloom/CarbonBloomFilter.java +++ b/datamap/bloom/src/main/java/org/apache/hadoop/util/bloom/CarbonBloomFilter.java @@ -49,27 +49,23 @@ public class CarbonBloomFilter extends BloomFilter { @Override public boolean membershipTest(Key key) { -if (key == null) { - throw new NullPointerException("key cannot be null"); -} - -int[] h = hash.hash(key); -hash.clear(); if (compress) { // If it is compressed check in roaring bitmap + if (key == null) { +throw new NullPointerException("key cannot be null"); + } + int[] h = hash.hash(key); + hash.clear(); for (int i = 0; i < nbHash; i++) { if (!bitmap.contains(h[i])) { return false; } } + return true; } else { - for (int i = 0; i < nbHash; i++) { -if (!bits.get(h[i])) { - return false; -} - } + // call super method to avoid IllegalAccessError for `bits` field + return super.membershipTest(key); } -return true; } @Override
carbondata git commit: [HOTFIX] replace apache common log with carbondata log4j
Repository: carbondata Updated Branches: refs/heads/master c0d858139 -> e01916b6b [HOTFIX] replace apache common log with carbondata log4j replace apache common log with carbondata log4j This closes #2999 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/e01916b6 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/e01916b6 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/e01916b6 Branch: refs/heads/master Commit: e01916b6bff6daf6ccbc359d44baeabfadf6a0bf Parents: c0d8581 Author: brijoobopanna Authored: Tue Dec 18 16:15:13 2018 +0530 Committer: xuchuanyin Committed: Thu Dec 20 14:23:15 2018 +0800 -- .../org/apache/carbondata/core/datamap/TableDataMap.java | 7 --- .../indexstore/blockletindex/BlockletDataMapFactory.java | 3 --- .../org/apache/carbondata/core/util/BlockletDataMapUtil.java | 7 --- .../apache/carbondata/core/util/ObjectSerializationUtil.java | 8 +--- .../org/apache/carbondata/hadoop/api/CarbonInputFormat.java | 7 --- .../apache/carbondata/hadoop/api/CarbonTableInputFormat.java | 7 --- .../carbondata/hadoop/api/CarbonTableOutputFormat.java | 7 --- .../apache/carbondata/hive/server/HiveEmbeddedServer2.java | 7 --- .../loading/iterator/CarbonOutputIteratorWrapper.java| 7 --- 9 files changed, 33 insertions(+), 27 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/e01916b6/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java index 4de7449..86390e8 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java @@ -29,6 +29,7 @@ import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.datamap.dev.BlockletSerializer; import org.apache.carbondata.core.datamap.dev.DataMap; @@ -50,8 +51,7 @@ import org.apache.carbondata.events.Event; import org.apache.carbondata.events.OperationContext; import org.apache.carbondata.events.OperationEventListener; -import org.apache.commons.logging.Log; -import org.apache.commons.logging.LogFactory; +import org.apache.log4j.Logger; /** * Index at the table level, user can add any number of DataMap for one table, by @@ -74,7 +74,8 @@ public final class TableDataMap extends OperationEventListener { private SegmentPropertiesFetcher segmentPropertiesFetcher; - private static final Log LOG = LogFactory.getLog(TableDataMap.class); + private static final Logger LOG = + LogServiceFactory.getLogService(TableDataMap.class.getName()); /** * It is called to initialize and load the required table datamap metadata. http://git-wip-us.apache.org/repos/asf/carbondata/blob/e01916b6/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java index 096a5e3..5892f78 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java @@ -54,8 +54,6 @@ import org.apache.carbondata.core.util.BlockletDataMapUtil; import org.apache.carbondata.core.util.path.CarbonTablePath; import org.apache.carbondata.events.Event; -import org.apache.commons.logging.Log; -import org.apache.commons.logging.LogFactory; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.LocatedFileStatus; import org.apache.hadoop.fs.Path; @@ -67,7 +65,6 @@ import org.apache.hadoop.fs.RemoteIterator; public class BlockletDataMapFactory extends CoarseGrainDataMapFactory implements BlockletDetailsFetcher, SegmentPropertiesFetcher, CacheableDataMap { - private static final Log LOG = LogFactory.getLog(BlockletDataMapFactory.class); private static final String NAME = "clustered.btree.blocklet"; /** * variable for cache level BLOCKLET http://git-wip-us.apache.org/repos/asf/carbondata/blob/e01916b6/core/src/main/java/org/apache/carbond
carbondata git commit: [CARBONDATA-3166]Updated Document and added Column Compressor used in Describe Formatted
Repository: carbondata Updated Branches: refs/heads/master 82adc50e7 -> ebdd5486e [CARBONDATA-3166]Updated Document and added Column Compressor used in Describe Formatted Updated Document and added column compressor used in Describe Formatted Command This closes #2986 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/ebdd5486 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/ebdd5486 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/ebdd5486 Branch: refs/heads/master Commit: ebdd5486e309e606efa1ddeb7d2b6d62f2315541 Parents: 82adc50 Author: shardul-cr7 Authored: Thu Dec 13 14:12:18 2018 +0530 Committer: xuchuanyin Committed: Fri Dec 14 19:52:34 2018 +0800 -- docs/configuration-parameters.md | 2 +- .../execution/command/table/CarbonDescribeFormattedCommand.scala | 4 +++- 2 files changed, 4 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/ebdd5486/docs/configuration-parameters.md -- diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md index 4aa2929..db21c6a 100644 --- a/docs/configuration-parameters.md +++ b/docs/configuration-parameters.md @@ -91,7 +91,7 @@ This section provides the details of all the configurations required for the Car | carbon.dictionary.server.port | 2030 | Single Pass Loading enables single job to finish data loading with dictionary generation on the fly. It enhances performance in the scenarios where the subsequent data loading after initial load involves fewer incremental updates on the dictionary. Single pass loading can be enabled using the option ***carbon.options.single.pass***. When this option is specified, a dictionary server will be internally started to handle the dictionary generation and query requests. This configuration specifies the port on which the server need to listen for incoming requests. Port value ranges between 0-65535 | | carbon.load.directWriteToStorePath.enabled | false | During data load, all the carbondata files are written to local disk and finally copied to the target store location in HDFS/S3. Enabling this parameter will make carbondata files to be written directly onto target HDFS/S3 location bypassing the local disk.**NOTE:** Writing directly to HDFS/S3 saves local disk IO(once for writing the files and again for copying to HDFS/S3) there by improving the performance. But the drawback is when data loading fails or the application crashes, unwanted carbondata files will remain in the target HDFS/S3 location until it is cleared during next data load or by running *CLEAN FILES* DDL command | | carbon.options.serialization.null.format | \N | Based on the business scenarios, some columns might need to be loaded with null values. As null value cannot be written in csv files, some special characters might be adopted to specify null values. This configuration can be used to specify the null values format in the data being loaded. | -| carbon.column.compressor | snappy | CarbonData will compress the column values using the compressor specified by this configuration. Currently CarbonData supports 'snappy' and 'zstd' compressors. | +| carbon.column.compressor | snappy | CarbonData will compress the column values using the compressor specified by this configuration. Currently CarbonData supports 'snappy', 'zstd' and 'gzip' compressors. | | carbon.minmax.allowed.byte.count | 200 | CarbonData will write the min max values for string/varchar types column using the byte count specified by this configuration. Max value is 1000 bytes(500 characters) and Min value is 10 bytes(5 characters). **NOTE:** This property is useful for reducing the store size thereby improving the query performance but can lead to query degradation if value is not configured properly. | | ## Compaction Configuration http://git-wip-us.apache.org/repos/asf/carbondata/blob/ebdd5486/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala index 151359e..2d560df 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonDescribeFormattedCommand.scala @@ -92,7 +92,9 @@ private[sql] case cl
carbondata git commit: [CARBONDATA-3133] Update the document to add spark 2.3.2 and datamap mv compiling method
Repository: carbondata Updated Branches: refs/heads/master 7584bf7d5 -> 295734cc8 [CARBONDATA-3133] Update the document to add spark 2.3.2 and datamap mv compiling method Update the document to add spark 2.3.2 and add datamap mv compiling method. This closes #2955 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/295734cc Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/295734cc Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/295734cc Branch: refs/heads/master Commit: 295734cc89dc5bae5538b500d674938cbfb75e9f Parents: 7584bf7 Author: Jonathan.Wei <252637...@qq.com> Authored: Tue Nov 27 15:20:48 2018 +0800 Committer: xuchuanyin Committed: Wed Nov 28 14:49:45 2018 +0800 -- build/README.md | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/295734cc/build/README.md -- diff --git a/build/README.md b/build/README.md index 028c03a..f361a6e 100644 --- a/build/README.md +++ b/build/README.md @@ -29,9 +29,12 @@ Build with different supported versions of Spark, by default using Spark 2.2.1 t ``` mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 clean package mvn -DskipTests -Pspark-2.2 -Dspark.version=2.2.1 clean package +mvn -DskipTests -Pspark-2.3 -Dspark.version=2.3.2 clean package ``` -Note: If you are working in Windows environment, remember to add `-Pwindows` while building the project. +Note: + - If you are working in Windows environment, remember to add `-Pwindows` while building the project. + - The mv feature is not compiled by default. If you want to use this feature, remember to add `-Pmv` while building the project. ## For contributors : To build the format code after any changes, please follow the below command. Note:Need install Apache Thrift 0.9.3
carbondata git commit: [HOTFIX] Improve log message in CarbonWriterBuilder
Repository: carbondata Updated Branches: refs/heads/master c2ae98744 -> 3b8de320d [HOTFIX] Improve log message in CarbonWriterBuilder In master the log message is not proper: AppName is not set, please use writtenBy() API to set the App Namewhich is using SDK This closes #2920 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/3b8de320 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/3b8de320 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/3b8de320 Branch: refs/heads/master Commit: 3b8de320d7092470e6d58ad3dcee594e3ae7ecc8 Parents: c2ae987 Author: Jacky Li Authored: Wed Nov 14 20:54:33 2018 +0800 Committer: xuchuanyin Committed: Wed Nov 21 18:32:26 2018 +0800 -- .../org/apache/carbondata/sdk/file/CarbonWriterBuilder.java| 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/3b8de320/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java -- diff --git a/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java b/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java index 917d4dc..1ca5b74 100644 --- a/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java +++ b/store/sdk/src/main/java/org/apache/carbondata/sdk/file/CarbonWriterBuilder.java @@ -438,13 +438,13 @@ public class CarbonWriterBuilder { Objects.requireNonNull(path, "path should not be null"); if (this.writerType == null) { throw new IOException( - "Writer type is not set, use withCsvInput() or withAvroInput() or withJsonInput() " + "'writerType' must be set, use withCsvInput() or withAvroInput() or withJsonInput() " + "API based on input"); } if (this.writtenByApp == null || this.writtenByApp.isEmpty()) { throw new RuntimeException( - "AppName is not set, please use writtenBy() API to set the App Name" - + "which is using SDK"); + "'writtenBy' must be set when writing carbon files, use writtenBy() API to " + + "set it, it can be the name of the application which is using the SDK"); } CarbonLoadModel loadModel = buildLoadModel(schema); loadModel.setSdkWriterCores(numOfThreads);
carbondata git commit: [CARBONDATA-3031] refining usage of numberofcores in CarbonProperties
Repository: carbondata Updated Branches: refs/heads/master 0c02e9801 -> 39e8e3da5 [CARBONDATA-3031] refining usage of numberofcores in CarbonProperties 1. many places use the function 'getNumOfCores' of CarbonProperties which returns the loading cores. 2. so if we still use the value in scene like 'query' or 'compaction' , it will be confused. This closes #2907 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/39e8e3da Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/39e8e3da Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/39e8e3da Branch: refs/heads/master Commit: 39e8e3da5c4cca7ff022991c4b5f8808988356d5 Parents: 0c02e98 Author: Sssan520 Authored: Mon Jul 2 19:12:24 2018 +0800 Committer: xuchuanyin Committed: Sat Nov 17 13:49:48 2018 +0800 -- .../dictionary/AbstractDictionaryCache.java | 2 +- .../generator/TableDictionaryGenerator.java | 2 +- .../reader/CarbonDeleteFilesDataReader.java | 6 +++- .../carbondata/core/util/CarbonProperties.java | 35 +++- .../CarbonAlterTableDropPartitionCommand.scala | 4 +-- .../CarbonAlterTableSplitPartitionCommand.scala | 6 ++-- .../loading/CarbonDataLoadConfiguration.java| 10 ++ .../loading/DataLoadProcessBuilder.java | 2 ++ .../processing/merger/CarbonDataMergerUtil.java | 3 +- .../merger/CompactionResultSortProcessor.java | 4 ++- .../sort/sortdata/SortParameters.java | 8 +++-- .../store/CarbonFactDataHandlerModel.java | 13 ++-- .../util/CarbonDataProcessorUtil.java | 2 +- 13 files changed, 62 insertions(+), 35 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/39e8e3da/core/src/main/java/org/apache/carbondata/core/cache/dictionary/AbstractDictionaryCache.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/cache/dictionary/AbstractDictionaryCache.java b/core/src/main/java/org/apache/carbondata/core/cache/dictionary/AbstractDictionaryCache.java index 83c7237..36d5f98 100644 --- a/core/src/main/java/org/apache/carbondata/core/cache/dictionary/AbstractDictionaryCache.java +++ b/core/src/main/java/org/apache/carbondata/core/cache/dictionary/AbstractDictionaryCache.java @@ -70,7 +70,7 @@ public abstract class AbstractDictionaryCachehttp://git-wip-us.apache.org/repos/asf/carbondata/blob/39e8e3da/core/src/main/java/org/apache/carbondata/core/dictionary/generator/TableDictionaryGenerator.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/dictionary/generator/TableDictionaryGenerator.java b/core/src/main/java/org/apache/carbondata/core/dictionary/generator/TableDictionaryGenerator.java index 33a91d8..003ab5a 100644 --- a/core/src/main/java/org/apache/carbondata/core/dictionary/generator/TableDictionaryGenerator.java +++ b/core/src/main/java/org/apache/carbondata/core/dictionary/generator/TableDictionaryGenerator.java @@ -78,7 +78,7 @@ public class TableDictionaryGenerator } @Override public void writeDictionaryData() { -int numOfCores = CarbonProperties.getInstance().getNumberOfCores(); +int numOfCores = CarbonProperties.getInstance().getNumberOfLoadingCores(); long start = System.currentTimeMillis(); ExecutorService executorService = Executors.newFixedThreadPool(numOfCores); for (final DictionaryGenerator generator : columnMap.values()) { http://git-wip-us.apache.org/repos/asf/carbondata/blob/39e8e3da/core/src/main/java/org/apache/carbondata/core/reader/CarbonDeleteFilesDataReader.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/reader/CarbonDeleteFilesDataReader.java b/core/src/main/java/org/apache/carbondata/core/reader/CarbonDeleteFilesDataReader.java index 32eb60d..ee87a75 100644 --- a/core/src/main/java/org/apache/carbondata/core/reader/CarbonDeleteFilesDataReader.java +++ b/core/src/main/java/org/apache/carbondata/core/reader/CarbonDeleteFilesDataReader.java @@ -60,12 +60,16 @@ public class CarbonDeleteFilesDataReader { initThreadPoolSize(); } + public CarbonDeleteFilesDataReader(int thread_pool_size) { +this.thread_pool_size = thread_pool_size; + } + /** * This method will initialize the thread pool size to be used for creating the * max number of threads for a job */ private void initThreadPoolSize() { -thread_pool_size = CarbonProperties.getInstance().getNumberOfCores(); +thread_pool_size = CarbonProperties.getInstance().getNumberOfLoadingCores(); } /** http://git-wip-us.apache.org/repos/asf/carbondata/blob/39e8e3da/core/src/main/java/org/apa
[1/2] carbondata git commit: [CARBONDATA-3087] Improve DESC FORMATTED output
Repository: carbondata Updated Branches: refs/heads/master ceb135175 -> 851dd2c88 http://git-wip-us.apache.org/repos/asf/carbondata/blob/851dd2c8/integration/spark2/src/test/scala/org/apache/spark/sql/GetDataSizeAndIndexSizeTest.scala -- diff --git a/integration/spark2/src/test/scala/org/apache/spark/sql/GetDataSizeAndIndexSizeTest.scala b/integration/spark2/src/test/scala/org/apache/spark/sql/GetDataSizeAndIndexSizeTest.scala index 03ec3a1..563206f 100644 --- a/integration/spark2/src/test/scala/org/apache/spark/sql/GetDataSizeAndIndexSizeTest.scala +++ b/integration/spark2/src/test/scala/org/apache/spark/sql/GetDataSizeAndIndexSizeTest.scala @@ -17,7 +17,10 @@ package org.apache.spark.sql +import java.util.Date + import org.apache.spark.sql.test.util.QueryTest + import org.apache.carbondata.core.constants.CarbonCommonConstants import org.scalatest.BeforeAndAfterAll @@ -59,7 +62,7 @@ class GetDataSizeAndIndexSizeTest extends QueryTest with BeforeAndAfterAll { .filter(row => row.getString(0).contains(CarbonCommonConstants.TABLE_DATA_SIZE) || row.getString(0).contains(CarbonCommonConstants.TABLE_INDEX_SIZE)) assert(res1.length == 2) -res1.foreach(row => assert(row.getString(1).trim.toLong > 0)) +res1.foreach(row => assert(row.getString(1).trim.substring(0, 2).toDouble > 0)) } test("get data size and index size after major compaction") { @@ -73,7 +76,7 @@ class GetDataSizeAndIndexSizeTest extends QueryTest with BeforeAndAfterAll { .filter(row => row.getString(0).contains(CarbonCommonConstants.TABLE_DATA_SIZE) || row.getString(0).contains(CarbonCommonConstants.TABLE_INDEX_SIZE)) assert(res2.length == 2) -res2.foreach(row => assert(row.getString(1).trim.toLong > 0)) +res2.foreach(row => assert(row.getString(1).trim.substring(0, 2).toDouble > 0)) } test("get data size and index size after minor compaction") { @@ -91,7 +94,7 @@ class GetDataSizeAndIndexSizeTest extends QueryTest with BeforeAndAfterAll { .filter(row => row.getString(0).contains(CarbonCommonConstants.TABLE_DATA_SIZE) || row.getString(0).contains(CarbonCommonConstants.TABLE_INDEX_SIZE)) assert(res3.length == 2) -res3.foreach(row => assert(row.getString(1).trim.toLong > 0)) +res3.foreach(row => assert(row.getString(1).trim.substring(0, 2).toDouble > 0)) } test("get data size and index size after insert into") { @@ -105,7 +108,7 @@ class GetDataSizeAndIndexSizeTest extends QueryTest with BeforeAndAfterAll { .filter(row => row.getString(0).contains(CarbonCommonConstants.TABLE_DATA_SIZE) || row.getString(0).contains(CarbonCommonConstants.TABLE_INDEX_SIZE)) assert(res4.length == 2) -res4.foreach(row => assert(row.getString(1).trim.toLong > 0)) +res4.foreach(row => assert(row.getString(1).trim.substring(0, 2).toDouble > 0)) } test("get data size and index size after insert overwrite") { @@ -119,7 +122,7 @@ class GetDataSizeAndIndexSizeTest extends QueryTest with BeforeAndAfterAll { .filter(row => row.getString(0).contains(CarbonCommonConstants.TABLE_DATA_SIZE) || row.getString(0).contains(CarbonCommonConstants.TABLE_INDEX_SIZE)) assert(res5.length == 2) -res5.foreach(row => assert(row.getString(1).trim.toLong > 0)) +res5.foreach(row => assert(row.getString(1).trim.substring(0, 2).toDouble > 0)) } test("get data size and index size for empty table") { @@ -128,15 +131,14 @@ class GetDataSizeAndIndexSizeTest extends QueryTest with BeforeAndAfterAll { .filter(row => row.getString(0).contains(CarbonCommonConstants.TABLE_DATA_SIZE) || row.getString(0).contains(CarbonCommonConstants.TABLE_INDEX_SIZE)) assert(res6.length == 2) -res6.foreach(row => assert(row.getString(1).trim.toLong == 0)) +res6.foreach(row => assert(row.getString(1).trim.substring(0, 2).toDouble == 0)) } test("get last update time for empty table") { sql("CREATE TABLE tableSize9 (empno int, workgroupcategory string, deptno int, projectcode int, attendance int) STORED BY 'org.apache.carbondata.format'") val res7 = sql("DESCRIBE FORMATTED tableSize9").collect() - .filter(row => row.getString(0).contains(CarbonCommonConstants.LAST_UPDATE_TIME)) + .filter(row => row.getString(0).contains("Last Update")) assert(res7.length == 1) -res7.foreach(row => assert(row.getString(1).trim.toLong == 0)) } test("get last update time for unempty table") { @@ -144,9 +146,8 @@ class GetDataSizeAndIndexSizeTest extends QueryTest with BeforeAndAfterAll { sql(s"""LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE tableSize10 OPTIONS ('DELIMITER'= ',', 'QUOTECHAR'= '\"', 'FILEHEADER'='')""") val res8 = sql("DESCRIBE FORMATTED tableSize10").collect() - .filter(row => row.getString(0).contains(CarbonCommonConstants.LAST_UPDATE_TIME)) +
[2/2] carbondata git commit: [CARBONDATA-3087] Improve DESC FORMATTED output
[CARBONDATA-3087] Improve DESC FORMATTED output Change output of DESC FORMATTED This closes #2908 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/851dd2c8 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/851dd2c8 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/851dd2c8 Branch: refs/heads/master Commit: 851dd2c884895762af305fa668ad80c151ba89bd Parents: ceb1351 Author: Jacky Li Authored: Thu Nov 8 20:11:28 2018 +0800 Committer: xuchuanyin Committed: Thu Nov 15 16:48:04 2018 +0800 -- .../core/constants/CarbonCommonConstants.java | 13 +- .../core/constants/SortScopeOptions.java| 49 +++ .../core/metadata/schema/table/CarbonTable.java | 75 - .../core/metadata/schema/table/TableInfo.java | 4 +- .../apache/carbondata/core/util/CarbonUtil.java | 3 + .../TestNoInvertedIndexLoadAndQuery.scala | 7 +- .../preaggregate/TestPreAggCreateCommand.scala | 7 +- ...ithColumnMetCacheAndCacheLevelProperty.scala | 12 +- ...ithColumnMetCacheAndCacheLevelProperty.scala | 8 +- .../TestCreateTableWithCompactionOptions.scala | 20 -- .../TestNonTransactionalCarbonTable.scala | 52 +-- .../testsuite/dataload/TestLoadDataFrame.scala | 4 +- .../describeTable/TestDescribeTable.scala | 12 +- .../LocalDictionarySupportCreateTableTest.scala | 8 +- .../testsuite/sortcolumns/TestSortColumns.scala | 3 +- .../sql/commands/StoredAsCarbondataSuite.scala | 2 +- .../sql/commands/UsingCarbondataSuite.scala | 2 +- .../spark/rdd/NewCarbonDataLoadRDD.scala| 2 +- .../apache/spark/sql/test/util/QueryTest.scala | 2 +- .../spark/rdd/CarbonDataRDDFactory.scala| 3 +- .../management/CarbonLoadDataCommand.scala | 3 +- .../table/CarbonDescribeFormattedCommand.scala | 329 ++- .../carbondata/TestStreamingTableOpName.scala | 2 +- .../AlterTableValidationTestCase.scala | 6 +- .../vectorreader/AddColumnTestCases.scala | 38 +-- .../vectorreader/ChangeDataTypeTestCases.scala | 6 +- .../vectorreader/DropColumnTestCases.scala | 6 +- .../spark/sql/GetDataSizeAndIndexSizeTest.scala | 25 +- .../loading/DataLoadProcessBuilder.java | 2 +- .../loading/model/CarbonLoadModelBuilder.java | 2 +- .../loading/sort/SortScopeOptions.java | 49 --- .../processing/loading/sort/SorterFactory.java | 1 + .../store/CarbonFactDataHandlerModel.java | 2 +- .../writer/v3/CarbonFactDataWriterImplV3.java | 2 +- .../util/CarbonDataProcessorUtil.java | 2 +- 35 files changed, 390 insertions(+), 373 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/851dd2c8/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 259f84e..b75648e 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -62,11 +62,6 @@ public final class CarbonCommonConstants { public static final int BLOCKLET_SIZE_MAX_VAL = 1200; /** - * default block size in MB - */ - public static final String BLOCK_SIZE_DEFAULT_VAL = "1024"; - - /** * min block size in MB */ public static final int BLOCK_SIZE_MIN_VAL = 1; @@ -438,8 +433,16 @@ public final class CarbonCommonConstants { public static final String COLUMN_PROPERTIES = "columnproperties"; // table block size in MB public static final String TABLE_BLOCKSIZE = "table_blocksize"; + + // default block size in MB + public static final String TABLE_BLOCK_SIZE_DEFAULT = "1024"; + // table blocklet size in MB public static final String TABLE_BLOCKLET_SIZE = "table_blocklet_size"; + + // default blocklet size value in MB + public static final String TABLE_BLOCKLET_SIZE_DEFAULT = "64"; + /** * set in column level to disable inverted index * @Deprecated :This property is deprecated, it is kept just for compatibility http://git-wip-us.apache.org/repos/asf/carbondata/blob/851dd2c8/core/src/main/java/org/apache/carbondata/core/constants/SortScopeOptions.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/SortScopeOptions.java b/core/src/main/java/org/apache/carbondata/core/constants/SortScopeOptions.java new file mode 100644 index 000..281a27e --- /dev/null ++
[2/2] carbondata git commit: [HOTFIX] Remove search mode module
[HOTFIX] Remove search mode module Remove search mode module This closes #2904 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/10918484 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/10918484 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/10918484 Branch: refs/heads/master Commit: 10918484853953528f5756b1b6bdb4a3a2d3b068 Parents: 6f3b9d3 Author: Jacky Li Authored: Fri Nov 9 11:45:50 2018 +0800 Committer: xuchuanyin Committed: Tue Nov 13 19:59:28 2018 +0800 -- .../core/constants/CarbonCommonConstants.java | 65 .../scan/executor/QueryExecutorFactory.java | 17 +- .../impl/SearchModeDetailQueryExecutor.java | 86 - .../SearchModeVectorDetailQueryExecutor.java| 91 -- .../AbstractSearchModeResultIterator.java | 139 .../iterator/SearchModeResultIterator.java | 53 --- .../SearchModeVectorResultIterator.java | 49 --- .../carbondata/core/util/CarbonProperties.java | 57 .../carbondata/core/util/SessionParams.java | 1 - .../benchmark/ConcurrentQueryBenchmark.scala| 59 +--- .../carbondata/examples/S3UsingSDkExample.scala | 3 +- .../carbondata/examples/SearchModeExample.scala | 194 --- integration/spark-common-test/pom.xml | 10 - ...eneFineGrainDataMapWithSearchModeSuite.scala | 325 --- .../detailquery/SearchModeTestCase.scala| 154 - .../carbondata/spark/rdd/CarbonScanRDD.scala| 15 +- integration/spark2/pom.xml | 5 - .../carbondata/store/SparkCarbonStore.scala | 63 .../org/apache/spark/sql/CarbonSession.scala| 95 -- .../execution/command/CarbonHiveCommands.scala | 12 +- .../bloom/BloomCoarseGrainDataMapSuite.scala| 15 - pom.xml | 7 - store/search/pom.xml| 112 --- .../store/worker/SearchRequestHandler.java | 244 -- .../apache/carbondata/store/worker/Status.java | 28 -- .../scala/org/apache/spark/rpc/Master.scala | 291 - .../scala/org/apache/spark/rpc/RpcUtil.scala| 56 .../scala/org/apache/spark/rpc/Scheduler.scala | 139 .../scala/org/apache/spark/rpc/Worker.scala | 118 --- .../org/apache/spark/search/Registry.scala | 51 --- .../org/apache/spark/search/Searcher.scala | 79 - .../carbondata/store/SearchServiceTest.java | 37 --- .../org/apache/spark/rpc/SchedulerSuite.scala | 154 - 33 files changed, 25 insertions(+), 2799 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/10918484/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index fc26404..6edfd66 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -1380,71 +1380,6 @@ public final class CarbonCommonConstants { public static final String CARBON_HEAP_MEMORY_POOLING_THRESHOLD_BYTES_DEFAULT = "1048576"; /** - * If set to true, will use CarbonReader to do distributed scan directly instead of using - * compute framework like spark, thus avoiding limitation of compute framework like SQL - * optimizer and task scheduling overhead. - */ - @InterfaceStability.Unstable - @CarbonProperty(dynamicConfigurable = true) - public static final String CARBON_SEARCH_MODE_ENABLE = "carbon.search.enabled"; - - public static final String CARBON_SEARCH_MODE_ENABLE_DEFAULT = "false"; - - /** - * It's timeout threshold of carbon search query - */ - @InterfaceStability.Unstable - @CarbonProperty - public static final String CARBON_SEARCH_QUERY_TIMEOUT = "carbon.search.query.timeout"; - - /** - * Default value is 10 seconds - */ - public static final String CARBON_SEARCH_QUERY_TIMEOUT_DEFAULT = "10s"; - - /** - * The size of thread pool used for reading files in Work for search mode. By default, - * it is number of cores in Worker - */ - @InterfaceStability.Unstable - @CarbonProperty - public static final String CARBON_SEARCH_MODE_SCAN_THREAD = "carbon.search.scan.thread"; - - /** - * In search mode, Master will listen on this port for worker registration. - * If Master failed to start service with this port, it will try to increment the port number - * and try to bind again, until it is success - */ - @Int
[1/2] carbondata git commit: [HOTFIX] Remove search mode module
Repository: carbondata Updated Branches: refs/heads/master 6f3b9d3b9 -> 109184848 http://git-wip-us.apache.org/repos/asf/carbondata/blob/10918484/integration/spark2/src/test/scala/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapSuite.scala -- diff --git a/integration/spark2/src/test/scala/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapSuite.scala b/integration/spark2/src/test/scala/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapSuite.scala index 3b5b5ca..4985718 100644 --- a/integration/spark2/src/test/scala/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapSuite.scala +++ b/integration/spark2/src/test/scala/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapSuite.scala @@ -294,17 +294,6 @@ class BloomCoarseGrainDataMapSuite extends QueryTest with BeforeAndAfterAll with val expectedAnswer1 = sql(s"select * from $normalTable where id = 1").collect() val expectedAnswer2 = sql(s"select * from $normalTable where city in ('city_999')").collect() -carbonSession.startSearchMode() -assert(carbonSession.isSearchModeEnabled) - -checkAnswer( - sql(s"select * from $bloomDMSampleTable where id = 1"), expectedAnswer1) -checkAnswer( - sql(s"select * from $bloomDMSampleTable where city in ('city_999')"), expectedAnswer2) - -carbonSession.stopSearchMode() -assert(!carbonSession.isSearchModeEnabled) - sql(s"DROP TABLE IF EXISTS $normalTable") sql(s"DROP TABLE IF EXISTS $bloomDMSampleTable") } @@ -975,10 +964,6 @@ class BloomCoarseGrainDataMapSuite extends QueryTest with BeforeAndAfterAll with } override protected def afterAll(): Unit = { -// in case of search mode test case failed, stop search mode again -if (carbonSession.isSearchModeEnabled) { - carbonSession.stopSearchMode() -} deleteFile(bigFile) deleteFile(smallFile) sql(s"DROP TABLE IF EXISTS $normalTable") http://git-wip-us.apache.org/repos/asf/carbondata/blob/10918484/pom.xml -- diff --git a/pom.xml b/pom.xml index b61c59e..8e67c26 100644 --- a/pom.xml +++ b/pom.xml @@ -104,7 +104,6 @@ integration/spark-common-test datamap/examples store/sdk -store/search assembly tools/cli @@ -536,8 +535,6 @@ ${basedir}/streaming/src/main/java ${basedir}/streaming/src/main/scala ${basedir}/store/sdk/src/main/java - ${basedir}/store/search/src/main/scala - ${basedir}/store/search/src/main/java ${basedir}/datamap/bloom/src/main/java ${basedir}/datamap/lucene/src/main/java @@ -599,8 +596,6 @@ ${basedir}/streaming/src/main/java ${basedir}/streaming/src/main/scala ${basedir}/store/sdk/src/main/java - ${basedir}/store/search/src/main/scala - ${basedir}/store/search/src/main/java ${basedir}/datamap/bloom/src/main/java ${basedir}/datamap/lucene/src/main/java @@ -658,8 +653,6 @@ ${basedir}/streaming/src/main/java ${basedir}/streaming/src/main/scala ${basedir}/store/sdk/src/main/java - ${basedir}/store/search/src/main/scala - ${basedir}/store/search/src/main/java ${basedir}/datamap/bloom/src/main/java ${basedir}/datamap/lucene/src/main/java http://git-wip-us.apache.org/repos/asf/carbondata/blob/10918484/store/search/pom.xml -- diff --git a/store/search/pom.xml b/store/search/pom.xml deleted file mode 100644 index 6b84be9..000 --- a/store/search/pom.xml +++ /dev/null @@ -1,112 +0,0 @@ -http://maven.apache.org/POM/4.0.0; - xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; - xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd;> - - 4.0.0 - - -org.apache.carbondata -carbondata-parent -1.6.0-SNAPSHOT -../../pom.xml - - - carbondata-search - Apache CarbonData :: Search - - -${basedir}/../../dev - - - - - org.apache.carbondata - carbondata-hadoop - ${project.version} - - - org.apache.spark - spark-core_${scala.binary.version} - ${spark.version} - - - junit - junit - test - - - org.scalatest - scalatest_${scala.binary.version} - test - - - - -src/test/scala - - -org.apache.maven.plugins -maven-compiler-plugin - - 1.7 - 1.7 - - - -org.scala-tools -maven-scala-plugin -2.15.2 - - -
carbondata git commit: [HOTFIX] change log level for data loading
Repository: carbondata Updated Branches: refs/heads/master 6707db689 -> 6f3b9d3b9 [HOTFIX] change log level for data loading In current data loading, many log meant for debugging purpose is logged as INFO log, in order to reduce the entry of them, In this PR they are changed to DEBUG level. This closes #2911 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/6f3b9d3b Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/6f3b9d3b Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/6f3b9d3b Branch: refs/heads/master Commit: 6f3b9d3b99ca55a6b29e951d89e17bb60eee1f03 Parents: 6707db6 Author: Jacky Li Authored: Fri Nov 9 14:49:15 2018 +0800 Committer: xuchuanyin Committed: Fri Nov 9 17:06:13 2018 +0800 -- .../core/metadata/schema/table/TableInfo.java | 9 +++-- .../apache/carbondata/core/util/CarbonUtil.java | 39 +++- .../loading/AbstractDataLoadProcessorStep.java | 4 +- .../processing/loading/DataLoadExecutor.java| 2 - .../CarbonRowDataWriterProcessorStepImpl.java | 12 +++--- .../steps/DataWriterProcessorStepImpl.java | 12 +++--- .../store/CarbonFactDataHandlerColumnar.java| 35 -- .../store/writer/AbstractFactDataWriter.java| 2 +- .../writer/v3/CarbonFactDataWriterImplV3.java | 7 +++- .../util/CarbonDataProcessorUtil.java | 1 - 10 files changed, 73 insertions(+), 50 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/6f3b9d3b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableInfo.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableInfo.java b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableInfo.java index b3e9e7e..3e50586 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableInfo.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableInfo.java @@ -258,9 +258,12 @@ public class TableInfo implements Serializable, Writable { } if (null == tableBlockSize) { tableBlockSize = CarbonCommonConstants.BLOCK_SIZE_DEFAULT_VAL; - LOGGER.info("Table block size not specified for " + getTableUniqueName() - + ". Therefore considering the default value " - + CarbonCommonConstants.BLOCK_SIZE_DEFAULT_VAL + " MB"); + if (LOGGER.isDebugEnabled()) { +LOGGER.debug( +"Table block size not specified for " + getTableUniqueName() + +". Therefore considering the default value " + +CarbonCommonConstants.BLOCK_SIZE_DEFAULT_VAL + " MB"); + } } return Integer.parseInt(tableBlockSize); } http://git-wip-us.apache.org/repos/asf/carbondata/blob/6f3b9d3b/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java b/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java index 1840ba0..2fa6260 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java @@ -2474,7 +2474,7 @@ public final class CarbonUtil { lockAcquired = carbonLock.lockWithRetries(); } if (lockAcquired) { - LOGGER.info("Acquired lock for table for table status updation"); + LOGGER.debug("Acquired lock for table for table status updation"); String metadataPath = carbonTable.getMetadataPath(); LoadMetadataDetails[] loadMetadataDetails = SegmentStatusManager.readLoadMetadata(metadataPath); @@ -2488,7 +2488,7 @@ public final class CarbonUtil { // If it is old segment, need to calculate data size and index size again if (null == dsize || null == isize) { needUpdate = true; -LOGGER.info("It is an old segment, need calculate data size and index size again"); +LOGGER.debug("It is an old segment, need calculate data size and index size again"); HashMap map = CarbonUtil.getDataSizeAndIndexSize( identifier.getTablePath(), loadMetadataDetail.getLoadName()); dsize = String.valueOf(map.get(CarbonCommonConstants.CARBON_TOTAL_DATA_SIZE)); @@ -2524,7 +2524,7 @@ public final class CarbonUtil { } } finally { if (carbonLock.unlock()) { - LOGGER.info("Table unlocked successfully after table status upda
carbondata git commit: [DOC] Update streaming-guide.md
Repository: carbondata Updated Branches: refs/heads/master aa2747722 -> 5344b781e [DOC] Update streaming-guide.md Correct StreamSQL description This closes #2910 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/5344b781 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/5344b781 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/5344b781 Branch: refs/heads/master Commit: 5344b781e818be6285143d9af5343f391a4042d4 Parents: aa27477 Author: Jacky Li Authored: Fri Nov 9 12:00:07 2018 +0800 Committer: xuchuanyin Committed: Fri Nov 9 17:01:14 2018 +0800 -- docs/streaming-guide.md | 40 ++-- 1 file changed, 30 insertions(+), 10 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/5344b781/docs/streaming-guide.md -- diff --git a/docs/streaming-guide.md b/docs/streaming-guide.md index 714b07a..0987ed2 100644 --- a/docs/streaming-guide.md +++ b/docs/streaming-guide.md @@ -31,9 +31,10 @@ - [StreamSQL](#streamsql) - [Defining Streaming Table](#streaming-table) - [Streaming Job Management](#streaming-job-management) -- [START STREAM](#start-stream) -- [STOP STREAM](#stop-stream) +- [CREATE STREAM](#create-stream) +- [DROP STREAM](#drop-stream) - [SHOW STREAMS](#show-streams) +- [CLOSE STREAM](#close-stream) ## Quick example Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME @@ -333,7 +334,7 @@ Following example shows how to start a streaming ingest job sql( """ -|START STREAM job123 ON TABLE sink +|CREATE STREAM job123 ON TABLE sink |STMPROPERTIES( | 'trigger'='ProcessingTime', | 'interval'='1 seconds') @@ -343,7 +344,7 @@ Following example shows how to start a streaming ingest job | WHERE id % 2 = 1 """.stripMargin) -sql("STOP STREAM job123") +sql("DROP STREAM job123") sql("SHOW STREAMS [ON TABLE tableName]") ``` @@ -360,13 +361,13 @@ These two tables are normal carbon tables, they can be queried independently. As above example shown: -- `START STREAM jobName ON TABLE tableName` is used to start a streaming ingest job. -- `STOP STREAM jobName` is used to stop a streaming job by its name +- `CREATE STREAM jobName ON TABLE tableName` is used to start a streaming ingest job. +- `DROP STREAM jobName` is used to stop a streaming job by its name - `SHOW STREAMS [ON TABLE tableName]` is used to print streaming job information -# START STREAM +# CREATE STREAM When this is issued, carbon will start a structured streaming job to do the streaming ingestion. Before launching the job, system will validate: @@ -424,11 +425,25 @@ For Kafka data source, create the source table by: ) ``` +- Then CREATE STREAM can be used to start the streaming ingest job from source table to sink table +``` +CREATE STREAM job123 ON TABLE sink +STMPROPERTIES( +'trigger'='ProcessingTime', + 'interval'='10 seconds' +) +AS + SELECT * + FROM source + WHERE id % 2 = 1 +``` -# STOP STREAM - -When this is issued, the streaming job will be stopped immediately. It will fail if the jobName specified is not exist. +# DROP STREAM +When `DROP STREAM` is issued, the streaming job will be stopped immediately. It will fail if the jobName specified is not exist. +``` +DROP STREAM job123 +``` # SHOW STREAMS @@ -441,4 +456,9 @@ When this is issued, the streaming job will be stopped immediately. It will fail `SHOW STREAMS` command will show all stream jobs in the system. +# ALTER TABLE CLOSE STREAM + +When the streaming application is stopped, and user want to manually trigger data conversion from carbon streaming files to columnar files, one can use +`ALTER TABLE sink COMPACT 'CLOSE_STREAMING';` +
[4/6] carbondata git commit: [CARBONDATA-3064] Support separate audit log
http://git-wip-us.apache.org/repos/asf/carbondata/blob/aa274772/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala index a1c68a3..c64f50b 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala @@ -28,7 +28,6 @@ import org.apache.spark.util.AlterTableUtil import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException import org.apache.carbondata.common.logging.LogServiceFactory -import org.apache.carbondata.common.logging.impl.Audit import org.apache.carbondata.core.datamap.DataMapStoreManager import org.apache.carbondata.core.exception.ConcurrentOperationException import org.apache.carbondata.core.features.TableOperation @@ -48,6 +47,8 @@ private[sql] case class CarbonAlterTableRenameCommand( val newTableIdentifier = alterTableRenameModel.newTableIdentifier val oldDatabaseName = oldTableIdentifier.database .getOrElse(sparkSession.catalog.currentDatabase) +setAuditTable(oldDatabaseName, oldTableIdentifier.table) +setAuditInfo(Map("newName" -> alterTableRenameModel.newTableIdentifier.table)) val newDatabaseName = newTableIdentifier.database .getOrElse(sparkSession.catalog.currentDatabase) if (!oldDatabaseName.equalsIgnoreCase(newDatabaseName)) { @@ -60,15 +61,12 @@ private[sql] case class CarbonAlterTableRenameCommand( } val oldTableName = oldTableIdentifier.table.toLowerCase val newTableName = newTableIdentifier.table.toLowerCase -Audit.log(LOGGER, s"Rename table request has been received for $oldDatabaseName.$oldTableName") LOGGER.info(s"Rename table request has been received for $oldDatabaseName.$oldTableName") val metastore = CarbonEnv.getInstance(sparkSession).carbonMetastore val relation: CarbonRelation = metastore.lookupRelation(oldTableIdentifier.database, oldTableName)(sparkSession) .asInstanceOf[CarbonRelation] if (relation == null) { - Audit.log(LOGGER, s"Rename table request has failed. " + - s"Table $oldDatabaseName.$oldTableName does not exist") throwMetadataException(oldDatabaseName, oldTableName, "Table does not exist") } @@ -162,13 +160,11 @@ private[sql] case class CarbonAlterTableRenameCommand( OperationListenerBus.getInstance().fireEvent(alterTableRenamePostEvent, operationContext) sparkSession.catalog.refreshTable(newIdentifier.quotedString) - Audit.log(LOGGER, s"Table $oldTableName has been successfully renamed to $newTableName") LOGGER.info(s"Table $oldTableName has been successfully renamed to $newTableName") } catch { case e: ConcurrentOperationException => throw e case e: Exception => -LOGGER.error("Rename table failed: " + e.getMessage, e) if (carbonTable != null) { AlterTableUtil.revertRenameTableChanges( newTableName, @@ -182,4 +178,5 @@ private[sql] case class CarbonAlterTableRenameCommand( Seq.empty } + override protected def opName: String = "ALTER TABLE RENAME TABLE" } http://git-wip-us.apache.org/repos/asf/carbondata/blob/aa274772/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableSetCommand.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableSetCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableSetCommand.scala index 51c0e6e..b1e7e33 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableSetCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableSetCommand.scala @@ -29,17 +29,18 @@ private[sql] case class CarbonAlterTableSetCommand( isView: Boolean) extends MetadataCommand { - override def run(sparkSession: SparkSession): Seq[Row] = { -processMetadata(sparkSession) - } - override def processMetadata(sparkSession: SparkSession): Seq[Row] = { + setAuditTable(tableIdentifier.database.getOrElse(sparkSession.catalog.currentDatabase), + tableIdentifier.table) AlterTableUtil.modifyTableProperties( tableIdentifier, properties, Nil, set = true)(sparkSession,
[3/6] carbondata git commit: [CARBONDATA-3064] Support separate audit log
http://git-wip-us.apache.org/repos/asf/carbondata/blob/aa274772/integration/spark2/src/test/scala/org/apache/spark/carbondata/TestStreamingTableOpName.scala -- diff --git a/integration/spark2/src/test/scala/org/apache/spark/carbondata/TestStreamingTableOpName.scala b/integration/spark2/src/test/scala/org/apache/spark/carbondata/TestStreamingTableOpName.scala new file mode 100644 index 000..d789f5c --- /dev/null +++ b/integration/spark2/src/test/scala/org/apache/spark/carbondata/TestStreamingTableOpName.scala @@ -0,0 +1,2647 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.carbondata + +import java.io.{File, PrintWriter} +import java.math.BigDecimal +import java.net.{BindException, ServerSocket} +import java.sql.{Date, Timestamp} +import java.util.concurrent.Executors + +import scala.collection.mutable + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql._ +import org.apache.spark.sql.catalyst.TableIdentifier +import org.apache.spark.sql.catalyst.analysis.NoSuchTableException +import org.apache.spark.sql.hive.CarbonRelation +import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +import org.apache.carbondata.common.exceptions.NoSuchStreamException +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.datastore.impl.FileFactory +import org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider.TIMESERIES +import org.apache.carbondata.core.metadata.schema.table.CarbonTable +import org.apache.carbondata.core.statusmanager.{FileFormat, SegmentStatus} +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.core.util.path.CarbonTablePath +import org.apache.carbondata.spark.exception.ProcessMetaDataException +import org.apache.carbondata.spark.rdd.CarbonScanRDD +import org.apache.carbondata.streaming.parser.CarbonStreamParser + +class TestStreamingTableOpName extends QueryTest with BeforeAndAfterAll { + + private val spark = sqlContext.sparkSession + private val dataFilePath = s"$resourcesPath/streamSample.csv" + def currentPath: String = new File(this.getClass.getResource("/").getPath + "../../") +.getCanonicalPath + val badRecordFilePath: File =new File(currentPath + "/target/test/badRecords") + + override def beforeAll { +badRecordFilePath.delete() +badRecordFilePath.mkdirs() +CarbonProperties.getInstance().addProperty( + CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, + CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT) +CarbonProperties.getInstance().addProperty( + CarbonCommonConstants.CARBON_DATE_FORMAT, + CarbonCommonConstants.CARBON_DATE_DEFAULT_FORMAT) +sql("DROP DATABASE IF EXISTS streaming CASCADE") +sql("CREATE DATABASE streaming") +sql("USE streaming") +sql( + """ +| CREATE TABLE source( +|c1 string, +|c2 int, +|c3 string, +|c5 string +| ) STORED BY 'org.apache.carbondata.format' +| TBLPROPERTIES ('streaming' = 'true') + """.stripMargin) +sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/IUD/dest.csv' INTO TABLE source""") + +dropTable() + +// 1. normal table not support streaming ingest +createTable(tableName = "batch_table", streaming = false, withBatchLoad = true) + +// 2. streaming table with different input source +// file source +createTable(tableName = "stream_table_file", streaming = true, withBatchLoad = true) + +// 3. streaming table with bad records +createTable(tableName = "bad_record_fail", streaming = true, withBatchLoad = true) + +// 4. streaming frequency check +createTable(tableName = "stream_table_1s", streaming = true, withBatchLoad = true) + +// 5. streaming table execute batch loading +// 6. detail query +// 8. compaction +// full scan + filter scan + aggregate query +createTable(tableName = "stream_table_filter",
[6/6] carbondata git commit: [CARBONDATA-3064] Support separate audit log
[CARBONDATA-3064] Support separate audit log A new audit log is implemented as following: 1. a framework is added for carbon command to record the audit log automatically, see command/package.scala 2. Audit logs are output by Auditor.java, log4j config example is provided in Auditor.java file comment 3.old audit log is removed This closes #2885 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/aa274772 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/aa274772 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/aa274772 Branch: refs/heads/master Commit: aa27477228d4fd0c51da3f2f8f588e41e06a9519 Parents: 6093a32 Author: Jacky Li Authored: Wed Oct 31 14:49:38 2018 +0800 Committer: xuchuanyin Committed: Fri Nov 9 08:47:55 2018 +0800 -- .../carbondata/common/logging/impl/Audit.java | 49 - .../logging/ft/LoggingServiceTest_FT.java | 93 - .../status/DiskBasedDataMapStatusProvider.java |2 - .../client/NonSecureDictionaryClient.java |3 +- .../NonSecureDictionaryClientHandler.java |3 +- .../IncrementalColumnDictionaryGenerator.java |3 +- .../statusmanager/SegmentStatusManager.java | 14 +- .../carbondata/core/util/SessionParams.java | 18 +- .../carbondata/mv/datamap/MVAnalyzerRule.scala |2 - .../examples/sql/CarbonSessionExample.java | 137 - .../examples/sql/JavaCarbonSessionExample.java | 94 + .../examples/CarbonSessionExample.scala | 44 +- .../carbondata/examplesCI/RunExamples.scala |5 + .../client/SecureDictionaryClient.java |3 +- .../server/SecureDictionaryServer.java |3 +- .../org/apache/carbondata/api/CarbonStore.scala |9 +- .../carbondata/spark/rdd/PartitionDropper.scala |8 - .../spark/rdd/PartitionSplitter.scala |6 - .../carbondata/spark/rdd/StreamHandoffRDD.scala | 12 +- .../command/carbonTableSchemaCommon.scala | 21 +- .../sql/test/ResourceRegisterAndCopier.scala|4 +- ...CreateCarbonSourceTableAsSelectCommand.scala | 13 +- .../datamap/CarbonMergeBloomIndexFilesRDD.scala |2 +- .../spark/rdd/AggregateDataMapCompactor.scala |4 - .../spark/rdd/CarbonDataRDDFactory.scala| 71 +- .../spark/rdd/CarbonTableCompactor.scala| 12 - .../carbondata/stream/StreamJobManager.scala| 13 +- .../events/MergeBloomIndexEventListener.scala |3 +- .../sql/events/MergeIndexEventListener.scala| 17 +- .../datamap/CarbonCreateDataMapCommand.scala| 12 +- .../datamap/CarbonDataMapRebuildCommand.scala |3 + .../datamap/CarbonDataMapShowCommand.scala |3 + .../datamap/CarbonDropDataMapCommand.scala |6 +- .../CarbonAlterTableCompactionCommand.scala | 13 +- .../CarbonAlterTableFinishStreaming.scala |3 + .../management/CarbonCleanFilesCommand.scala|7 +- .../command/management/CarbonCliCommand.scala |4 + .../CarbonDeleteLoadByIdCommand.scala |5 +- .../CarbonDeleteLoadByLoadDateCommand.scala |5 +- .../management/CarbonInsertIntoCommand.scala| 13 +- .../management/CarbonLoadDataCommand.scala | 355 +-- .../management/CarbonShowLoadsCommand.scala |3 + .../management/RefreshCarbonTableCommand.scala | 18 +- .../CarbonProjectForDeleteCommand.scala |9 +- .../CarbonProjectForUpdateCommand.scala |4 + .../command/mutation/DeleteExecution.scala | 12 +- .../command/mutation/HorizontalCompaction.scala |6 - .../spark/sql/execution/command/package.scala | 82 +- ...arbonAlterTableAddHivePartitionCommand.scala |3 + ...rbonAlterTableDropHivePartitionCommand.scala |3 + .../CarbonAlterTableDropPartitionCommand.scala |8 +- .../CarbonAlterTableSplitPartitionCommand.scala |9 +- .../CarbonShowCarbonPartitionsCommand.scala |3 + .../CarbonAlterTableAddColumnCommand.scala |9 +- .../CarbonAlterTableDataTypeChangeCommand.scala | 16 +- .../CarbonAlterTableDropColumnCommand.scala |8 +- .../schema/CarbonAlterTableRenameCommand.scala |9 +- .../schema/CarbonAlterTableSetCommand.scala |9 +- .../schema/CarbonAlterTableUnsetCommand.scala | 11 +- .../schema/CarbonGetTableDetailCommand.scala|2 + .../stream/CarbonCreateStreamCommand.scala |3 + .../stream/CarbonDropStreamCommand.scala|3 + .../stream/CarbonShowStreamsCommand.scala |3 + .../CarbonCreateTableAsSelectCommand.scala | 17 +- .../table/CarbonCreateTableCommand.scala| 17 +- .../table/CarbonDescribeFormattedCommand.scala |3 + .../command/table/CarbonDropTableCommand.scala |6 +- .../command/table/CarbonExplainCommand.scala|4 +- .../command/table/CarbonShowTablesCommand.scala |1 + .../strategy
[2/6] carbondata git commit: [CARBONDATA-3064] Support separate audit log
http://git-wip-us.apache.org/repos/asf/carbondata/blob/aa274772/integration/spark2/src/test/scala/org/apache/spark/carbondata/TestStreamingTableOperation.scala -- diff --git a/integration/spark2/src/test/scala/org/apache/spark/carbondata/TestStreamingTableOperation.scala b/integration/spark2/src/test/scala/org/apache/spark/carbondata/TestStreamingTableOperation.scala deleted file mode 100644 index 62c0221..000 --- a/integration/spark2/src/test/scala/org/apache/spark/carbondata/TestStreamingTableOperation.scala +++ /dev/null @@ -1,2647 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.carbondata - -import java.io.{File, PrintWriter} -import java.math.BigDecimal -import java.net.{BindException, ServerSocket} -import java.sql.{Date, Timestamp} -import java.util.concurrent.Executors - -import scala.collection.mutable - -import org.apache.spark.rdd.RDD -import org.apache.spark.sql._ -import org.apache.spark.sql.catalyst.TableIdentifier -import org.apache.spark.sql.catalyst.analysis.NoSuchTableException -import org.apache.spark.sql.hive.CarbonRelation -import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery} -import org.apache.spark.sql.test.util.QueryTest -import org.scalatest.BeforeAndAfterAll - -import org.apache.carbondata.common.exceptions.NoSuchStreamException -import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException -import org.apache.carbondata.core.constants.CarbonCommonConstants -import org.apache.carbondata.core.datastore.impl.FileFactory -import org.apache.carbondata.core.metadata.schema.datamap.DataMapClassProvider.TIMESERIES -import org.apache.carbondata.core.metadata.schema.table.CarbonTable -import org.apache.carbondata.core.statusmanager.{FileFormat, SegmentStatus} -import org.apache.carbondata.core.util.CarbonProperties -import org.apache.carbondata.core.util.path.CarbonTablePath -import org.apache.carbondata.spark.exception.ProcessMetaDataException -import org.apache.carbondata.spark.rdd.CarbonScanRDD -import org.apache.carbondata.streaming.parser.CarbonStreamParser - -class TestStreamingTableOperation extends QueryTest with BeforeAndAfterAll { - - private val spark = sqlContext.sparkSession - private val dataFilePath = s"$resourcesPath/streamSample.csv" - def currentPath: String = new File(this.getClass.getResource("/").getPath + "../../") -.getCanonicalPath - val badRecordFilePath: File =new File(currentPath + "/target/test/badRecords") - - override def beforeAll { -badRecordFilePath.delete() -badRecordFilePath.mkdirs() -CarbonProperties.getInstance().addProperty( - CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, - CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT) -CarbonProperties.getInstance().addProperty( - CarbonCommonConstants.CARBON_DATE_FORMAT, - CarbonCommonConstants.CARBON_DATE_DEFAULT_FORMAT) -sql("DROP DATABASE IF EXISTS streaming CASCADE") -sql("CREATE DATABASE streaming") -sql("USE streaming") -sql( - """ -| CREATE TABLE source( -|c1 string, -|c2 int, -|c3 string, -|c5 string -| ) STORED BY 'org.apache.carbondata.format' -| TBLPROPERTIES ('streaming' = 'true') - """.stripMargin) -sql(s"""LOAD DATA LOCAL INPATH '$resourcesPath/IUD/dest.csv' INTO TABLE source""") - -dropTable() - -// 1. normal table not support streaming ingest -createTable(tableName = "batch_table", streaming = false, withBatchLoad = true) - -// 2. streaming table with different input source -// file source -createTable(tableName = "stream_table_file", streaming = true, withBatchLoad = true) - -// 3. streaming table with bad records -createTable(tableName = "bad_record_fail", streaming = true, withBatchLoad = true) - -// 4. streaming frequency check -createTable(tableName = "stream_table_1s", streaming = true, withBatchLoad = true) - -// 5. streaming table execute batch loading -// 6. detail query -// 8. compaction -// full scan + filter scan + aggregate query -createTable(tableName =
[1/6] carbondata git commit: [CARBONDATA-3064] Support separate audit log
Repository: carbondata Updated Branches: refs/heads/master 6093a32d7 -> aa2747722 http://git-wip-us.apache.org/repos/asf/carbondata/blob/aa274772/processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java -- diff --git a/processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java b/processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java new file mode 100644 index 000..e811c59 --- /dev/null +++ b/processing/src/main/java/org/apache/carbondata/processing/util/Auditor.java @@ -0,0 +1,222 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.processing.util; + +import java.io.IOException; +import java.util.Date; +import java.util.HashMap; +import java.util.Map; +import java.util.Objects; + +import org.apache.carbondata.common.annotations.InterfaceAudience; +import org.apache.carbondata.common.logging.impl.AuditLevel; + +import com.google.gson.Gson; +import com.google.gson.GsonBuilder; +import org.apache.commons.lang3.time.FastDateFormat; +import org.apache.hadoop.security.UserGroupInformation; +import org.apache.log4j.Logger; + +/** + * Audit logger. + * User can configure log4j to log to a separate file. For example + * + * log4j.logger.carbon.audit=DEBUG, audit + * log4j.appender.audit=org.apache.log4j.FileAppender + * log4j.appender.audit.File=/opt/logs/audit.out + * log4j.appender.audit.Threshold=AUDIT + * log4j.appender.audit.Append=false + * log4j.appender.audit.layout=org.apache.log4j.PatternLayout + * log4j.appender.audit.layout.ConversionPattern=%m%n + */ +@InterfaceAudience.Internal +public class Auditor { + private static final Logger LOGGER = Logger.getLogger("carbon.audit"); + private static final Gson gson = new GsonBuilder().disableHtmlEscaping().create(); + private static String username; + + static { +try { + username = UserGroupInformation.getCurrentUser().getShortUserName(); +} catch (IOException e) { + username = "unknown"; +} + } + + /** + * call this method to record audit log when operation is triggered + * @param opName operation name + * @param opId operation unique id + */ + public static void logOperationStart(String opName, String opId) { +Objects.requireNonNull(opName); +Objects.requireNonNull(opId); +OpStartMessage message = new OpStartMessage(opName, opId); +Gson gson = new GsonBuilder().disableHtmlEscaping().create(); +String json = gson.toJson(message); +LOGGER.log(AuditLevel.AUDIT, json); + } + + /** + * call this method to record audit log when operation finished + * @param opName operation name + * @param opId operation unique id + * @param success true if operation success + * @param table carbon dbName and tableName + * @param opTime elapse time in Ms for this operation + * @param extraInfo extra information to include in the audit log + */ + public static void logOperationEnd(String opName, String opId, boolean success, String table, + String opTime, Map extraInfo) { +Objects.requireNonNull(opName); +Objects.requireNonNull(opId); +Objects.requireNonNull(opTime); +OpEndMessage message = new OpEndMessage(opName, opId, table, opTime, +success ? OpStatus.SUCCESS : OpStatus.FAILED, +extraInfo != null ? extraInfo : new HashMap()); +String json = gson.toJson(message); +LOGGER.log(AuditLevel.AUDIT, json); + } + + private enum OpStatus { +// operation started +START, + +// operation succeed +SUCCESS, + +// operation failed +FAILED + } + + // log message for operation start, it is written as a JSON record in the audit log + private static class OpStartMessage { +private String time; +private String username; +private String opName; +private String opId; +private OpStatus opStatus; + +OpStartMessage(String opName, String opId) { + FastDateFormat format = + FastDateFormat.getDateTimeInstance(FastDateFormat.LONG, FastDateFormat.LONG); + this.time = format.format(new Date()); + this.username = Auditor.username; + this.opName = opName; + this.opId =
[5/6] carbondata git commit: [CARBONDATA-3064] Support separate audit log
http://git-wip-us.apache.org/repos/asf/carbondata/blob/aa274772/integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeIndexEventListener.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeIndexEventListener.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeIndexEventListener.scala index c8c9a47..35b73d6 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeIndexEventListener.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/events/MergeIndexEventListener.scala @@ -29,7 +29,6 @@ import org.apache.spark.sql.SparkSession import org.apache.spark.sql.util.CarbonException import org.apache.carbondata.common.logging.LogServiceFactory -import org.apache.carbondata.common.logging.impl.Audit import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.datamap.Segment import org.apache.carbondata.core.locks.{CarbonLockFactory, LockUsage} @@ -39,7 +38,6 @@ import org.apache.carbondata.core.statusmanager.SegmentStatusManager import org.apache.carbondata.events.{AlterTableCompactionPostEvent, AlterTableMergeIndexEvent, Event, OperationContext, OperationEventListener} import org.apache.carbondata.processing.loading.events.LoadEvents.LoadTablePostExecutionEvent import org.apache.carbondata.processing.merger.CarbonDataMergerUtil -import org.apache.carbondata.spark.util.CommonUtil class MergeIndexEventListener extends OperationEventListener with Logging { val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) @@ -47,7 +45,7 @@ class MergeIndexEventListener extends OperationEventListener with Logging { override def onEvent(event: Event, operationContext: OperationContext): Unit = { event match { case preStatusUpdateEvent: LoadTablePostExecutionEvent => -Audit.log(LOGGER, "Load post status event-listener called for merge index") +LOGGER.info("Load post status event-listener called for merge index") val loadModel = preStatusUpdateEvent.getCarbonLoadModel val carbonTable = loadModel.getCarbonDataLoadSchema.getCarbonTable val compactedSegments = loadModel.getMergedSegmentIds @@ -73,7 +71,7 @@ class MergeIndexEventListener extends OperationEventListener with Logging { } } case alterTableCompactionPostEvent: AlterTableCompactionPostEvent => -Audit.log(LOGGER, "Merge index for compaction called") +LOGGER.info("Merge index for compaction called") val carbonTable = alterTableCompactionPostEvent.carbonTable val mergedLoads = alterTableCompactionPostEvent.compactedLoads val sparkSession = alterTableCompactionPostEvent.sparkSession @@ -84,8 +82,6 @@ class MergeIndexEventListener extends OperationEventListener with Logging { val carbonMainTable = alterTableMergeIndexEvent.carbonTable val sparkSession = alterTableMergeIndexEvent.sparkSession if (!carbonMainTable.isStreamingSink) { - Audit.log(LOGGER, s"Compaction request received for table " + - s"${ carbonMainTable.getDatabaseName }.${ carbonMainTable.getTableName }") LOGGER.info(s"Merge Index request received for table " + s"${ carbonMainTable.getDatabaseName }.${ carbonMainTable.getTableName }") val lock = CarbonLockFactory.getCarbonLockObj( @@ -130,16 +126,11 @@ class MergeIndexEventListener extends OperationEventListener with Logging { clearBlockDataMapCache(carbonMainTable, validSegmentIds) val requestMessage = "Compaction request completed for table " + s"${ carbonMainTable.getDatabaseName }.${ carbonMainTable.getTableName }" - Audit.log(LOGGER, requestMessage) LOGGER.info(requestMessage) } else { val lockMessage = "Not able to acquire the compaction lock for table " + -s"${ carbonMainTable.getDatabaseName }.${ - carbonMainTable -.getTableName -}" - - Audit.log(LOGGER, lockMessage) +s"${ carbonMainTable.getDatabaseName }." + +s"${ carbonMainTable.getTableName}" LOGGER.error(lockMessage) CarbonException.analysisException( "Table is already locked for compaction. Please try after some time.") http://git-wip-us.apache.org/repos/asf/carbondata/blob/aa274772/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala -- diff --git
carbondata git commit: [CARBONDATA-3078] Disable explain collector for count star query without filter
Repository: carbondata Updated Branches: refs/heads/master a3a83dcad -> b6ff4672b [CARBONDATA-3078] Disable explain collector for count star query without filter An issue is found about count star query without filter in explain command. It is a special case. It uses different plan. Considering no useful information about block/blocklet pruning for count star query without filter, so disable explain collector and avoid the exception in https://issues.apache.org/jira/browse/CARBONDATA-3078 This closes #2900 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/b6ff4672 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/b6ff4672 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/b6ff4672 Branch: refs/heads/master Commit: b6ff4672be7bd25ab40144feb801be9e20069244 Parents: a3a83dc Author: Manhua Authored: Mon Nov 5 20:17:59 2018 +0800 Committer: xuchuanyin Committed: Tue Nov 6 19:58:24 2018 +0800 -- .../carbondata/hadoop/api/CarbonTableInputFormat.java| 9 + .../spark/testsuite/filterexpr/CountStarTestCase.scala | 11 +++ 2 files changed, 20 insertions(+) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/b6ff4672/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java -- diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java index ba3accf..86cbfec 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java @@ -43,6 +43,7 @@ import org.apache.carbondata.core.mutate.CarbonUpdateUtil; import org.apache.carbondata.core.mutate.SegmentUpdateDetails; import org.apache.carbondata.core.mutate.UpdateVO; import org.apache.carbondata.core.mutate.data.BlockMappingVO; +import org.apache.carbondata.core.profiler.ExplainCollector; import org.apache.carbondata.core.readcommitter.LatestFilesReadCommittedScope; import org.apache.carbondata.core.readcommitter.ReadCommittedScope; import org.apache.carbondata.core.readcommitter.TableStatusReadCommittedScope; @@ -575,6 +576,14 @@ public class CarbonTableInputFormat extends CarbonInputFormat { */ public BlockMappingVO getBlockRowCount(Job job, CarbonTable table, List partitions) throws IOException { +// Normal query flow goes to CarbonInputFormat#getPrunedBlocklets and initialize the +// pruning info for table we queried. But here count star query without filter uses a different +// query plan, and no pruning info is initialized. When it calls default data map to +// prune(with a null filter), exception will occur during setting pruning info. +// Considering no useful information about block/blocklet pruning for such query +// (actually no pruning), so we disable explain collector here +ExplainCollector.remove(); + AbsoluteTableIdentifier identifier = table.getAbsoluteTableIdentifier(); TableDataMap blockletMap = DataMapStoreManager.getInstance().getDefaultDataMap(table); http://git-wip-us.apache.org/repos/asf/carbondata/blob/b6ff4672/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/filterexpr/CountStarTestCase.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/filterexpr/CountStarTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/filterexpr/CountStarTestCase.scala index f26d0e7..18ad1d7 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/filterexpr/CountStarTestCase.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/filterexpr/CountStarTestCase.scala @@ -54,6 +54,17 @@ class CountStarTestCase extends QueryTest with BeforeAndAfterAll { ) } + test("explain select count star without filter") { +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.ENABLE_QUERY_STATISTICS, "true") + +sql("explain select count(*) from filterTimestampDataType").collect() + +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.ENABLE_QUERY_STATISTICS, +CarbonCommonConstants.ENABLE_QUERY_STATISTICS_DEFAULT) + } + override def afterAll { sql("drop table if exists filtertestTables") sql("drop table if exists filterTimestampDataType")
carbondata git commit: [CARBONDATA-3074] Change default sort temp compressor to snappy
Repository: carbondata Updated Branches: refs/heads/master 789055e19 -> c9fb4bc06 [CARBONDATA-3074] Change default sort temp compressor to snappy sort temp compressor used to be set as empty, which means that Carbondata will not compress the sort temp files. This PR changes the default value to snappy. Some experiments in local cluster shows that setting the compressor âsnappyâ will slightly enhance the loading performance and reduce lots of disk IO during data loading. This closes #2894 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/c9fb4bc0 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/c9fb4bc0 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/c9fb4bc0 Branch: refs/heads/master Commit: c9fb4bc064c96649fc200ed13bd2db0bcda6ed56 Parents: 789055e Author: Manhua Authored: Mon Nov 5 17:28:07 2018 +0800 Committer: xuchuanyin Committed: Mon Nov 5 20:35:25 2018 +0800 -- .../core/constants/CarbonCommonConstants.java | 6 +++--- docs/configuration-parameters.md| 3 +-- docs/performance-tuning.md | 2 +- .../dataload/TestLoadWithSortTempCompressed.scala | 16 4 files changed, 21 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/c9fb4bc0/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index bf4f7e5..9484bb4 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -873,10 +873,10 @@ public final class CarbonCommonConstants { public static final String CARBON_SORT_TEMP_COMPRESSOR = "carbon.sort.temp.compressor"; /** - * The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD'. - * By default, empty means that Carbondata will not compress the sort temp files. + * The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. + * Specially, empty means that Carbondata will not compress the sort temp files. */ - public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = ""; + public static final String CARBON_SORT_TEMP_COMPRESSOR_DEFAULT = "SNAPPY"; /** * Which storage level to persist rdd when sort_scope=global_sort */ http://git-wip-us.apache.org/repos/asf/carbondata/blob/c9fb4bc0/docs/configuration-parameters.md -- diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md index 2a3748c..5a4dea6 100644 --- a/docs/configuration-parameters.md +++ b/docs/configuration-parameters.md @@ -79,12 +79,11 @@ This section provides the details of all the configurations required for the Car | enable.inmemory.merge.sort | false | CarbonData sorts and writes data to intermediate files to limit the memory usage. These intermediate files needs to be sorted again using merge sort before writing to the final carbondata file.Performing merge sort in memory would increase the sorting performance at the cost of increased memory footprint. This Configuration specifies to do in-memory merge sort or to do file based merge sort. | | carbon.sort.storage.inmemory.size.inmb | 512 | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. When ***enable.unsafe.sort*** configuration is enabled, instead of using ***carbon.sort.size*** which is based on rows count, size occupied in memory is used to determine when to flush data pages to intermediate temp files. This configuration determines the memory to be used for storing data pages in memory. **NOTE:** Configuring a higher value ensures more data is maintained in memory and hence increases data loading performance due to reduced or no IO.Based on the memory availability in the nodes of the cluster, configure the values accordingly. | | carbon.load.sortmemory.spill.percentage | 0 | During data loading, some data pages are kept in memory upto memory configured in ***carbon.sort.storage.inmemory.size.inmb*** beyond which they are spilled to disk as intermediate temporary sort files. This configuration determines after what percentage data needs to be spilled to disk. **NOTE:** Without this configuration, when the data pages occupy upto configured memory, new data pages would be dumped to disk and old p
carbondata git commit: [HOTFIX] Throw original exception in thread pool
Repository: carbondata Updated Branches: refs/heads/master 934216de1 -> 469c52f5d [HOTFIX] Throw original exception in thread pool If there are exception occurs in the Callable.run in the thread pool, it should throw the original exception instead of throw a new one, which makes it hard for debugging. This closes #2887 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/469c52f5 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/469c52f5 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/469c52f5 Branch: refs/heads/master Commit: 469c52f5d4c18579e2a6ed4c3bb35691cf01937b Parents: 934216d Author: Jacky Li Authored: Wed Oct 31 20:16:11 2018 +0800 Committer: xuchuanyin Committed: Mon Nov 5 09:15:53 2018 +0800 -- .../spark/rdd/NewCarbonDataLoadRDD.scala| 2 + .../carbondata/spark/rdd/PartitionDropper.scala | 7 +- .../spark/rdd/PartitionSplitter.scala | 4 +- .../steps/DataWriterProcessorStepImpl.java | 68 ++-- 4 files changed, 27 insertions(+), 54 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/469c52f5/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala -- diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala index 041dc1c..0b6a2a9 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala @@ -156,6 +156,8 @@ class NewCarbonDataLoadRDD[K, V]( logInfo("Bad Record Found") case e: Exception => loadMetadataDetails.setSegmentStatus(SegmentStatus.LOAD_FAILURE) + executionErrors.failureCauses = FailureCauses.EXECUTOR_FAILURE + executionErrors.errorMsg = e.getMessage logInfo("DataLoad failure", e) LOGGER.error(e) throw e http://git-wip-us.apache.org/repos/asf/carbondata/blob/469c52f5/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/PartitionDropper.scala -- diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/PartitionDropper.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/PartitionDropper.scala index 6911b0b..353a478 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/PartitionDropper.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/PartitionDropper.scala @@ -102,8 +102,8 @@ object PartitionDropper { Seq(partitionId, targetPartitionId).toList, dbName, tableName, partitionInfo) } catch { -case e: IOException => sys.error(s"Exception while delete original carbon files " + - e.getMessage) +case e: IOException => + throw new IOException("Exception while delete original carbon files ", e) } Audit.log(logger, s"Drop Partition request completed for table " + s"${ dbName }.${ tableName }") @@ -111,7 +111,8 @@ object PartitionDropper { s"${ dbName }.${ tableName }") } } catch { -case e: Exception => sys.error(s"Exception in dropping partition action: ${ e.getMessage }") +case e: Exception => + throw new RuntimeException("Exception in dropping partition action", e) } } else { PartitionUtils.deleteOriginalCarbonFile(alterPartitionModel, absoluteTableIdentifier, http://git-wip-us.apache.org/repos/asf/carbondata/blob/469c52f5/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/PartitionSplitter.scala -- diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/PartitionSplitter.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/PartitionSplitter.scala index ca9f049..369ad51 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/PartitionSplitter.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/PartitionSplitter.scala @@ -87,8 +87,8 @@ object PartitionSplitter { deleteOriginalCarbonFile(alterPartitionMo
carbondata git commit: [CARBONDATA-3068] fixed cannot load data from hdfs files without hdfs prefix
Repository: carbondata Updated Branches: refs/heads/master ac94dbadc -> 269f4c378 [CARBONDATA-3068] fixed cannot load data from hdfs files without hdfs prefix sql: LOAD DATA INPATH '/tmp/test.csv' INTO TABLE test OPTIONS('QUOTECHAR'='"','TIMESTAMPFORMAT'='/MM/dd HH:mm:ss'); error: org.apache.carbondata.processing.exception.DataLoadingException: The input file does not exist: /tmp/test.csv (state=,code=0) but the file "test.csv" is in hdfs path, and hadoop conf "core-site.xml" has the property: fs.defaultFS hdfs://master:9000 This closes #2891 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/269f4c37 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/269f4c37 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/269f4c37 Branch: refs/heads/master Commit: 269f4c378bad1294430b7d1662469e300f0e04ed Parents: ac94dba Author: Sssan520 Authored: Mon Jul 2 19:12:24 2018 +0800 Committer: xuchuanyin Committed: Fri Nov 2 09:10:35 2018 +0800 -- .../src/main/scala/org/apache/spark/util/FileUtils.scala | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/269f4c37/integration/spark-common/src/main/scala/org/apache/spark/util/FileUtils.scala -- diff --git a/integration/spark-common/src/main/scala/org/apache/spark/util/FileUtils.scala b/integration/spark-common/src/main/scala/org/apache/spark/util/FileUtils.scala index db63f6e..3c7fbbb 100644 --- a/integration/spark-common/src/main/scala/org/apache/spark/util/FileUtils.scala +++ b/integration/spark-common/src/main/scala/org/apache/spark/util/FileUtils.scala @@ -75,7 +75,7 @@ object FileUtils { val filePaths = inputPath.split(",") for (i <- 0 until filePaths.size) { val filePath = CarbonUtil.checkAndAppendHDFSUrl(filePaths(i)) -val carbonFile = FileFactory.getCarbonFile(filePaths(i), hadoopConf) +val carbonFile = FileFactory.getCarbonFile(filePath, hadoopConf) if (!carbonFile.exists()) { throw new DataLoadingException( s"The input file does not exist: ${CarbonUtil.removeAKSK(filePaths(i))}" )
carbondata git commit: [CARBONDATA-3041]Optimize load minimum size strategy for data loading
Repository: carbondata Updated Branches: refs/heads/master db5da530e -> e2c517e3f [CARBONDATA-3041]Optimize load minimum size strategy for data loading this PR modifies the following points: 1. Delete system property carbon.load.min.size.enabledï¼modified this property load_min_size_inmb to table propertyï¼and This property can also be specified in the load option. 2. Support to alter table xxx set TBLPROPERTIES('load_min_size_inmb '='256') 3. If creating a table has this property load_min_size_inmbï¼Display this property via the desc formatted command. This closes #2864 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/e2c517e3 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/e2c517e3 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/e2c517e3 Branch: refs/heads/master Commit: e2c517e3f9225b3b3ff9e55bcfee0d73fe943f01 Parents: db5da53 Author: ndwangsen Authored: Sat Oct 27 10:38:48 2018 +0800 Committer: xuchuanyin Committed: Mon Oct 29 16:20:56 2018 +0800 -- .../core/constants/CarbonCommonConstants.java | 3 +- .../constants/CarbonLoadOptionConstants.java| 10 - .../carbondata/core/util/CarbonProperties.java | 11 - docs/configuration-parameters.md| 1 - docs/ddl-of-carbondata.md | 19 +- .../dataload/TestTableLoadMinSize.scala | 62 +- .../carbondata/spark/util/CommonUtil.scala | 28 + .../spark/sql/catalyst/CarbonDDLSqlParser.scala | 3 + .../spark/rdd/CarbonDataRDDFactory.scala| 20 +- .../table/CarbonDescribeFormattedCommand.scala | 6 + .../org/apache/spark/util/AlterTableUtil.scala | 20 +- .../loading/model/CarbonLoadModelBuilder.java | 829 ++- .../processing/loading/model/LoadOption.java| 4 +- .../processing/util/CarbonLoaderUtil.java | 4 +- 14 files changed, 555 insertions(+), 465 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/e2c517e3/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java index 37e2aa1..bf4f7e5 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java @@ -980,8 +980,7 @@ public final class CarbonCommonConstants { */ @CarbonProperty public static final String CARBON_LOAD_MIN_SIZE_INMB = "load_min_size_inmb"; - public static final String CARBON_LOAD_MIN_NODE_SIZE_INMB_DEFAULT = "256"; - + public static final String CARBON_LOAD_MIN_SIZE_INMB_DEFAULT = "0"; /** * the node minimum load data default value */ http://git-wip-us.apache.org/repos/asf/carbondata/blob/e2c517e3/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java b/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java index 82485ca..0d98cf4 100644 --- a/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java +++ b/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java @@ -151,14 +151,4 @@ public final class CarbonLoadOptionConstants { public static final String CARBON_LOAD_SORT_MEMORY_SPILL_PERCENTAGE = "carbon.load.sortmemory.spill.percentage"; public static final String CARBON_LOAD_SORT_MEMORY_SPILL_PERCENTAGE_DEFAULT = "0"; - - /** - * if loading data is too small, the original loading method will produce many small files. - * enable set the node load minimum amount of data,avoid producing many small files. - * This option is especially useful when you encounter a lot of small amounts of data. - */ - @CarbonProperty - public static final String ENABLE_CARBON_LOAD_NODE_DATA_MIN_SIZE - = "carbon.load.min.size.enabled"; - public static final String ENABLE_CARBON_LOAD_NODE_DATA_MIN_SIZE_DEFAULT = "false"; } http://git-wip-us.apache.org/repos/asf/carbondata/blob/e2c517e3/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java b/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java index 49d89e7
carbondata git commit: [CARBONDATA-3050][Doc] Remove unused parameter doc
Repository: carbondata Updated Branches: refs/heads/master 9de946673 -> 46d7595f4 [CARBONDATA-3050][Doc] Remove unused parameter doc Remove documentation of parameter 'carbon.use.multiple.temp.dir' Related PR: #2824 - removed the parameter 'carbon.use.multiple.temp.dir' This closes #2866 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/46d7595f Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/46d7595f Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/46d7595f Branch: refs/heads/master Commit: 46d7595f4bd8583cfc533820a92477f20765859c Parents: 9de9466 Author: Manhua Authored: Mon Oct 29 10:08:15 2018 +0800 Committer: xuchuanyin Committed: Mon Oct 29 14:15:29 2018 +0800 -- docs/configuration-parameters.md | 1 - 1 file changed, 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/46d7595f/docs/configuration-parameters.md -- diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md index 7a6dcab..de98c8d 100644 --- a/docs/configuration-parameters.md +++ b/docs/configuration-parameters.md @@ -84,7 +84,6 @@ This section provides the details of all the configurations required for the Car | carbon.cutOffTimestamp | (none) | CarbonData has capability to generate the Dictionary values for the timestamp columns from the data itself without the need to store the computed dictionary values. This configuration sets the start date for calculating the timestamp. Java counts the number of milliseconds from start of "1970-01-01 00:00:00". This property is used to customize the start of position. For example "2000-01-01 00:00:00". **NOTE:** The date must be in the form ***carbon.timestamp.format***. CarbonData supports storing data for upto 68 years.For example, if the cut-off time is 1970-01-01 05:30:00, then data upto 2038-01-01 05:30:00 will be supported by CarbonData. | | carbon.timegranularity | SECOND | The configuration is used to specify the data granularity level such as DAY, HOUR, MINUTE, or SECOND. This helps to store more than 68 years of data into CarbonData. | | carbon.use.local.dir | true | CarbonData,during data loading, writes files to local temp directories before copying the files to HDFS. This configuration is used to specify whether CarbonData can write locally to tmp directory of the container or to the YARN application directory. | -| carbon.use.multiple.temp.dir | false | When multiple disks are present in the system, YARN is generally configured with multiple disks to be used as temp directories for managing the containers. This configuration specifies whether to use multiple YARN local directories during data loading for disk IO load balancing.Enable ***carbon.use.local.dir*** for this configuration to take effect. **NOTE:** Data Loading is an IO intensive operation whose performance can be limited by the disk IO threshold, particularly during multi table concurrent data load.Configuring this parameter, balances the disk IO across multiple disks there by improving the over all load performance. | | carbon.sort.temp.compressor | (none) | CarbonData writes every ***carbon.sort.size*** number of records to intermediate temp files during data loading to ensure memory footprint is within limits. These temporary files can be compressed and written in order to save the storage space. This configuration specifies the name of compressor to be used to compress the intermediate sort temp files during sort procedure in data loading. The valid values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty means that Carbondata will not compress the sort temp files. **NOTE:** Compressor will be useful if you encounter disk bottleneck.Since the data needs to be compressed and decompressed,it involves additional CPU cycles,but is compensated by the high IO throughput due to less data to be written or read from the disks. | | carbon.load.skewedDataOptimization.enabled | false | During data loading,CarbonData would divide the number of blocks equally so as to ensure all executors process same number of blocks. This mechanism satisfies most of the scenarios and ensures maximum parallel processing for optimal data loading performance.In some business scenarios, there might be scenarios where the size of blocks vary significantly and hence some executors would have to do more work if they get blocks containing more data. This configuration enables size based block allocation strategy for data loading. When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data.**NOTE
carbondata git commit: [CARBONDATA_3025]make cli compilable with java 1.7
Repository: carbondata Updated Branches: refs/heads/master e6d15da74 -> b62b0fd9c [CARBONDATA_3025]make cli compilable with java 1.7 This commit removes some code in jdk1.8 style to make it compilable with jdk1.7 This closes #2853 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/b62b0fd9 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/b62b0fd9 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/b62b0fd9 Branch: refs/heads/master Commit: b62b0fd9ce0122735a24b1359e601af1e5ccb6b9 Parents: e6d15da Author: akashrn5 Authored: Wed Oct 24 17:35:00 2018 +0530 Committer: xuchuanyin Committed: Fri Oct 26 19:19:06 2018 +0800 -- .../apache/carbondata/tool/FileCollector.java | 20 +++--- .../apache/carbondata/tool/ScanBenchmark.java | 64 2 files changed, 48 insertions(+), 36 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/b62b0fd9/tools/cli/src/main/java/org/apache/carbondata/tool/FileCollector.java -- diff --git a/tools/cli/src/main/java/org/apache/carbondata/tool/FileCollector.java b/tools/cli/src/main/java/org/apache/carbondata/tool/FileCollector.java index aa48b93..b2ff061 100644 --- a/tools/cli/src/main/java/org/apache/carbondata/tool/FileCollector.java +++ b/tools/cli/src/main/java/org/apache/carbondata/tool/FileCollector.java @@ -18,11 +18,7 @@ package org.apache.carbondata.tool; import java.io.IOException; -import java.util.ArrayList; -import java.util.HashSet; -import java.util.LinkedHashMap; -import java.util.List; -import java.util.Set; +import java.util.*; import org.apache.carbondata.common.Strings; import org.apache.carbondata.core.datastore.filesystem.CarbonFile; @@ -77,13 +73,17 @@ class FileCollector { } } -unsortedFiles.sort((o1, o2) -> { - if (o1.getShardName().equalsIgnoreCase(o2.getShardName())) { -return Integer.parseInt(o1.getPartNo()) - Integer.parseInt(o2.getPartNo()); - } else { -return o1.getShardName().compareTo(o2.getShardName()); + +Collections.sort(unsortedFiles, new Comparator() { + @Override public int compare(DataFile o1, DataFile o2) { +if (o1.getShardName().equalsIgnoreCase(o2.getShardName())) { + return Integer.parseInt(o1.getPartNo()) - Integer.parseInt(o2.getPartNo()); +} else { + return o1.getShardName().compareTo(o2.getShardName()); +} } }); + for (DataFile collectedFile : unsortedFiles) { this.dataFiles.put(collectedFile.getFilePath(), collectedFile); } http://git-wip-us.apache.org/repos/asf/carbondata/blob/b62b0fd9/tools/cli/src/main/java/org/apache/carbondata/tool/ScanBenchmark.java -- diff --git a/tools/cli/src/main/java/org/apache/carbondata/tool/ScanBenchmark.java b/tools/cli/src/main/java/org/apache/carbondata/tool/ScanBenchmark.java index af5bdb3..ddb9652 100644 --- a/tools/cli/src/main/java/org/apache/carbondata/tool/ScanBenchmark.java +++ b/tools/cli/src/main/java/org/apache/carbondata/tool/ScanBenchmark.java @@ -72,57 +72,69 @@ class ScanBenchmark implements Command { } outPuts.add("\n## Benchmark"); -AtomicReference fileHeaderRef = new AtomicReference<>(); -AtomicReference fileFoorterRef = new AtomicReference<>(); -AtomicReference convertedFooterRef = new AtomicReference<>(); +final AtomicReference fileHeaderRef = new AtomicReference<>(); +final AtomicReference fileFoorterRef = new AtomicReference<>(); +final AtomicReference convertedFooterRef = new AtomicReference<>(); // benchmark read header and footer time -benchmarkOperation("ReadHeaderAndFooter", () -> { - fileHeaderRef.set(file.readHeader()); - fileFoorterRef.set(file.readFooter()); +benchmarkOperation("ReadHeaderAndFooter", new Operation() { + @Override public void run() throws IOException, MemoryException { +fileHeaderRef.set(file.readHeader()); +fileFoorterRef.set(file.readFooter()); + } }); -FileHeader fileHeader = fileHeaderRef.get(); -FileFooter3 fileFooter = fileFoorterRef.get(); +final FileHeader fileHeader = fileHeaderRef.get(); +final FileFooter3 fileFooter = fileFoorterRef.get(); // benchmark convert footer -benchmarkOperation("ConvertFooter", () -> { - convertFooter(fileHeader, fileFooter); +benchmarkOperation("ConvertFooter", new Operation() { + @Override public void run() throws IOException, MemoryException { +convertFooter(fileHeader, fileFooter); + } });
carbondata git commit: [CARBONDATA-3040][BloomDataMap] Fix bug for merging bloom index
Repository: carbondata Updated Branches: refs/heads/master e4806b9a0 -> 33a6dc2ac [CARBONDATA-3040][BloomDataMap] Fix bug for merging bloom index Problem There is a bug which causes query failure when we create two bloom datamaps on same table with data. Analyze Since we already have data, each create datamap will trigger rebuild datamap task and then trigger bloom index file merging. By debuging, we found the first datamap's bloom index files would be merged two times and the second time made bloom index file empty. The procedure goes as below: 1. create table 2. load data 3. create bloom datamap1: rebuild datamap1 for existing data, event listener is trigger to merge index files for all bloom datamaps( currently only datamap1 ) 4. create bloom datamap2: rebuild datamap2 for existing data, event listener is trigger to merge index files for all bloom datamaps (currently datamap1 and datamap2) Because the event does not has information which datamap it rebuilt, it always rebuilds all bloom datamap. So datamap1's bloom index files would be merged 2 times, but only remains a mergeShard folder when it ran the second merged such that no file input for merging and the final merge bloom index files are empty. Solution Send the datamap name in rebuild event for filter and only merge bloom index files for the specific datamap. Also add file check whether mergeShard already exists before merging. This closes #2851 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/33a6dc2a Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/33a6dc2a Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/33a6dc2a Branch: refs/heads/master Commit: 33a6dc2ac996cbb0bfb4f354d7fc80b297d652bb Parents: e4806b9 Author: Manhua Authored: Wed Oct 24 16:20:13 2018 +0800 Committer: xuchuanyin Committed: Thu Oct 25 14:17:29 2018 +0800 -- .../datamap/bloom/BloomIndexFileStore.java | 20 +++- .../carbondata/events/DataMapEvents.scala | 10 +- .../datamap/IndexDataMapRebuildRDD.scala| 2 +- .../spark/rdd/CarbonTableCompactor.scala| 2 +- .../events/MergeBloomIndexEventListener.scala | 10 -- .../management/CarbonLoadDataCommand.scala | 2 +- 6 files changed, 35 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/33a6dc2a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomIndexFileStore.java -- diff --git a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomIndexFileStore.java b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomIndexFileStore.java index 17813ba..3d6ad9b 100644 --- a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomIndexFileStore.java +++ b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomIndexFileStore.java @@ -60,6 +60,9 @@ public class BloomIndexFileStore { public static void mergeBloomIndexFile(String dmSegmentPathString, List indexCols) { + +// Step 1. check current folders + // get all shard paths of old store CarbonFile segmentPath = FileFactory.getCarbonFile(dmSegmentPathString, FileFactory.getFileType(dmSegmentPathString)); @@ -72,6 +75,9 @@ public class BloomIndexFileStore { String mergeShardPath = dmSegmentPathString + File.separator + MERGE_BLOOM_INDEX_SHARD_NAME; String mergeInprogressFile = dmSegmentPathString + File.separator + MERGE_INPROGRESS_FILE; + +// Step 2. prepare for fail-safe merging + try { // delete mergeShard folder if exists if (FileFactory.isFileExist(mergeShardPath)) { @@ -87,10 +93,12 @@ public class BloomIndexFileStore { throw new RuntimeException("Failed to create directory " + mergeShardPath); } } catch (IOException e) { - LOGGER.error("Error occurs while create directory " + mergeShardPath, e); - throw new RuntimeException("Error occurs while create directory " + mergeShardPath); + throw new RuntimeException(e); } +// Step 3. merge index files +// Query won't use mergeShard until MERGE_INPROGRESS_FILE is deleted + // for each index column, merge the bloomindex files from all shards into one for (String indexCol: indexCols) { String mergeIndexFile = getMergeBloomIndexFile(mergeShardPath, indexCol); @@ -115,15 +123,17 @@ public class BloomIndexFileStore { } } catch (IOException e) { LOGGER.error("Error occurs while merge bloom index file of column: " + indexCol, e); -// delete merge shard of bloom index for this segment when failed +
carbondata git commit: [CARBONDATA-3002] Fix some spell error and remove the data after test case finished running
Repository: carbondata Updated Branches: refs/heads/master 1be990f66 -> f810389fa [CARBONDATA-3002] Fix some spell error and remove the data after test case finished running 1.fix spell error -- change retrive to retrieve 2.remove the data after test case finished by deleting table and database 3.change dummy table with UUID, avoid error when there are multiple carbonreader This closes #2811 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/f810389f Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/f810389f Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/f810389f Branch: refs/heads/master Commit: f810389faf5ddf938b40e2c9a7be35d4a12d901e Parents: 1be990f Author: xubo245 Authored: Thu Oct 11 18:13:14 2018 +0800 Committer: xuchuanyin Committed: Fri Oct 19 16:37:27 2018 +0800 -- .../java/org/apache/carbondata/hadoop/CarbonRecordReader.java | 2 +- .../carbondata/presto/PrestoCarbonVectorizedRecordReader.java | 2 +- .../spark/vectorreader/VectorizedCarbonRecordReader.java | 2 +- .../spark/testsuite/partition/TestAlterPartitionTable.scala | 2 ++ .../java/org/apache/carbondata/sdk/file/CarbonReader.java | 7 +++ 5 files changed, 8 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/f810389f/hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java -- diff --git a/hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java b/hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java index 0d38906..d447320 100644 --- a/hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java +++ b/hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java @@ -117,7 +117,7 @@ public class CarbonRecordReader extends AbstractRecordReader { } @Override public float getProgress() throws IOException, InterruptedException { -// TODO : Implement it based on total number of rows it is going to retrive. +// TODO : Implement it based on total number of rows it is going to retrieve. return 0; } http://git-wip-us.apache.org/repos/asf/carbondata/blob/f810389f/integration/presto/src/main/java/org/apache/carbondata/presto/PrestoCarbonVectorizedRecordReader.java -- diff --git a/integration/presto/src/main/java/org/apache/carbondata/presto/PrestoCarbonVectorizedRecordReader.java b/integration/presto/src/main/java/org/apache/carbondata/presto/PrestoCarbonVectorizedRecordReader.java index 9935b54..4e2d36c 100644 --- a/integration/presto/src/main/java/org/apache/carbondata/presto/PrestoCarbonVectorizedRecordReader.java +++ b/integration/presto/src/main/java/org/apache/carbondata/presto/PrestoCarbonVectorizedRecordReader.java @@ -169,7 +169,7 @@ class PrestoCarbonVectorizedRecordReader extends AbstractRecordReader { } @Override public float getProgress() throws IOException, InterruptedException { -// TODO : Implement it based on total number of rows it is going to retrive. +// TODO : Implement it based on total number of rows it is going to retrieve. return 0; } http://git-wip-us.apache.org/repos/asf/carbondata/blob/f810389f/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java -- diff --git a/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java b/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java index 779c62f..839a8a0 100644 --- a/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java +++ b/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java @@ -215,7 +215,7 @@ public class VectorizedCarbonRecordReader extends AbstractRecordReader { @Override public float getProgress() throws IOException, InterruptedException { -// TODO : Implement it based on total number of rows it is going to retrive. +// TODO : Implement it based on total number of rows it is going to retrieve. return 0; } http://git-wip-us.apache.org/repos/asf/carbondata/blob/f810389f/integration/spark2/src/test/scala/org/apache/carbondata/spark/testsuite/partition/TestAlterPartitionTable.scala -- diff --git a/integration/spark2/src/test/scala/org/apache/carbondata/spark/testsuite/partition/TestAlterPartitionTable.scala
[1/6] carbondata git commit: [CARBONDATA-3024] Refactor to use log4j Logger directly
Repository: carbondata Updated Branches: refs/heads/master 15d38260c -> 06adb5a03 http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterBatchProcessorStepImpl.java -- diff --git a/processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterBatchProcessorStepImpl.java b/processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterBatchProcessorStepImpl.java index 26ae2d7..694b345 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterBatchProcessorStepImpl.java +++ b/processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterBatchProcessorStepImpl.java @@ -20,7 +20,6 @@ import java.io.IOException; import java.util.Iterator; import java.util.Map; -import org.apache.carbondata.common.logging.LogService; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException; import org.apache.carbondata.core.datastore.row.CarbonRow; @@ -40,6 +39,8 @@ import org.apache.carbondata.processing.store.CarbonFactHandler; import org.apache.carbondata.processing.store.CarbonFactHandlerFactory; import org.apache.carbondata.processing.util.CarbonDataProcessorUtil; +import org.apache.log4j.Logger; + /** * It reads data from batch of sorted files(it could be in-memory/disk based files) * which are generated in previous sort step. And it writes data to carbondata file. @@ -47,7 +48,7 @@ import org.apache.carbondata.processing.util.CarbonDataProcessorUtil; */ public class DataWriterBatchProcessorStepImpl extends AbstractDataLoadProcessorStep { - private static final LogService LOGGER = + private static final Logger LOGGER = LogServiceFactory.getLogService(DataWriterBatchProcessorStepImpl.class.getName()); private Map localDictionaryGeneratorMap; @@ -112,7 +113,7 @@ public class DataWriterBatchProcessorStepImpl extends AbstractDataLoadProcessorS i++; } } catch (Exception e) { - LOGGER.error(e, "Failed for table: " + tableName + " in DataWriterBatchProcessorStepImpl"); + LOGGER.error("Failed for table: " + tableName + " in DataWriterBatchProcessorStepImpl", e); if (e.getCause() instanceof BadRecordFoundException) { throw new BadRecordFoundException(e.getCause().getMessage()); } @@ -132,7 +133,7 @@ public class DataWriterBatchProcessorStepImpl extends AbstractDataLoadProcessorS } catch (Exception e) { // if throw exception from here dataHandler will not be closed. // so just holding exception and later throwing exception - LOGGER.error(e, "Failed for table: " + tableName + " in finishing data handler"); + LOGGER.error("Failed for table: " + tableName + " in finishing data handler", e); exception = new CarbonDataWriterException( "Failed for table: " + tableName + " in finishing data handler", e); } http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterProcessorStepImpl.java -- diff --git a/processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterProcessorStepImpl.java b/processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterProcessorStepImpl.java index cc038b9..3d704c9 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterProcessorStepImpl.java +++ b/processing/src/main/java/org/apache/carbondata/processing/loading/steps/DataWriterProcessorStepImpl.java @@ -29,7 +29,6 @@ import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; -import org.apache.carbondata.common.logging.LogService; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException; import org.apache.carbondata.core.datastore.row.CarbonRow; @@ -50,13 +49,15 @@ import org.apache.carbondata.processing.store.CarbonFactHandler; import org.apache.carbondata.processing.store.CarbonFactHandlerFactory; import org.apache.carbondata.processing.util.CarbonDataProcessorUtil; +import org.apache.log4j.Logger; + /** * It reads data from sorted files which are generated in previous sort step. * And it writes data to carbondata file. It also generates mdk key while writing to carbondata file */ public class DataWriterProcessorStepImpl extends AbstractDataLoadProcessorStep { - private static final LogService LOGGER = + private static final Logger LOGGER = LogServiceFactory.getLogService(DataWriterProcessorStepImpl.class.getName()); private long readCounter; @@
[5/6] carbondata git commit: [CARBONDATA-3024] Refactor to use log4j Logger directly
http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServerHandler.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServerHandler.java b/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServerHandler.java index 82efe80..0f076a4 100644 --- a/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServerHandler.java +++ b/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServerHandler.java @@ -16,7 +16,6 @@ */ package org.apache.carbondata.core.dictionary.server; -import org.apache.carbondata.common.logging.LogService; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.dictionary.generator.ServerDictionaryGenerator; import org.apache.carbondata.core.dictionary.generator.key.DictionaryMessage; @@ -26,6 +25,7 @@ import io.netty.buffer.ByteBuf; import io.netty.channel.ChannelHandler; import io.netty.channel.ChannelHandlerContext; import io.netty.channel.ChannelInboundHandlerAdapter; +import org.apache.log4j.Logger; /** * Handler for Dictionary server. @@ -33,7 +33,7 @@ import io.netty.channel.ChannelInboundHandlerAdapter; @ChannelHandler.Sharable public class NonSecureDictionaryServerHandler extends ChannelInboundHandlerAdapter { - private static final LogService LOGGER = + private static final Logger LOGGER = LogServiceFactory.getLogService(NonSecureDictionaryServerHandler.class.getName()); /** @@ -77,7 +77,7 @@ import io.netty.channel.ChannelInboundHandlerAdapter; * @param cause */ @Override public void exceptionCaught(ChannelHandlerContext ctx, Throwable cause) { -LOGGER.error(cause, "exceptionCaught"); +LOGGER.error("exceptionCaught", cause); ctx.close(); } http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/core/src/main/java/org/apache/carbondata/core/dictionary/service/AbstractDictionaryServer.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/dictionary/service/AbstractDictionaryServer.java b/core/src/main/java/org/apache/carbondata/core/dictionary/service/AbstractDictionaryServer.java index 754f253..5703051 100644 --- a/core/src/main/java/org/apache/carbondata/core/dictionary/service/AbstractDictionaryServer.java +++ b/core/src/main/java/org/apache/carbondata/core/dictionary/service/AbstractDictionaryServer.java @@ -27,13 +27,12 @@ import java.util.Collections; import java.util.Enumeration; import java.util.List; -import org.apache.carbondata.common.logging.LogService; - import org.apache.commons.lang3.SystemUtils; +import org.apache.log4j.Logger; public abstract class AbstractDictionaryServer { - public String findLocalIpAddress(LogService LOGGER) { + public String findLocalIpAddress(Logger LOGGER) { try { String defaultIpOverride = System.getenv("SPARK_LOCAL_IP"); if (defaultIpOverride != null) { http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/core/src/main/java/org/apache/carbondata/core/fileoperations/AtomicFileOperationsImpl.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/fileoperations/AtomicFileOperationsImpl.java b/core/src/main/java/org/apache/carbondata/core/fileoperations/AtomicFileOperationsImpl.java index f9f8647..13f10d7 100644 --- a/core/src/main/java/org/apache/carbondata/core/fileoperations/AtomicFileOperationsImpl.java +++ b/core/src/main/java/org/apache/carbondata/core/fileoperations/AtomicFileOperationsImpl.java @@ -21,7 +21,6 @@ import java.io.DataInputStream; import java.io.DataOutputStream; import java.io.IOException; -import org.apache.carbondata.common.logging.LogService; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.datastore.filesystem.CarbonFile; @@ -29,12 +28,14 @@ import org.apache.carbondata.core.datastore.impl.FileFactory; import org.apache.carbondata.core.datastore.impl.FileFactory.FileType; import org.apache.carbondata.core.util.CarbonUtil; +import org.apache.log4j.Logger; + class AtomicFileOperationsImpl implements AtomicFileOperations { /** * Logger instance */ - private static final LogService LOGGER = + private static final Logger LOGGER = LogServiceFactory.getLogService(AtomicFileOperationsImpl.class.getName()); private String filePath; http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/core/src/main/java/org/apache/carbondata/core/indexstore/BlockletDataMapIndexStore.java
[2/6] carbondata git commit: [CARBONDATA-3024] Refactor to use log4j Logger directly
http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala index 22ff5c4..1f1e7bd 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableAddColumnCommand.scala @@ -25,7 +25,8 @@ import org.apache.spark.sql.hive.CarbonSessionCatalog import org.apache.spark.util.AlterTableUtil import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException -import org.apache.carbondata.common.logging.{LogService, LogServiceFactory} +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.common.logging.impl.Audit import org.apache.carbondata.core.features.TableOperation import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage} import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl @@ -39,11 +40,11 @@ private[sql] case class CarbonAlterTableAddColumnCommand( extends MetadataCommand { override def processMetadata(sparkSession: SparkSession): Seq[Row] = { -val LOGGER: LogService = LogServiceFactory.getLogService(this.getClass.getCanonicalName) +val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) val tableName = alterTableAddColumnsModel.tableName val dbName = alterTableAddColumnsModel.databaseName .getOrElse(sparkSession.catalog.currentDatabase) -LOGGER.audit(s"Alter table add columns request has been received for $dbName.$tableName") +Audit.log(LOGGER, s"Alter table add columns request has been received for $dbName.$tableName") val locksToBeAcquired = List(LockUsage.METADATA_LOCK, LockUsage.COMPACTION_LOCK) var locks = List.empty[ICarbonLock] var timeStamp = 0L @@ -104,10 +105,10 @@ private[sql] case class CarbonAlterTableAddColumnCommand( carbonTable, alterTableAddColumnsModel) OperationListenerBus.getInstance.fireEvent(alterTablePostExecutionEvent, operationContext) LOGGER.info(s"Alter table for add columns is successful for table $dbName.$tableName") - LOGGER.audit(s"Alter table for add columns is successful for table $dbName.$tableName") + Audit.log(LOGGER, s"Alter table for add columns is successful for table $dbName.$tableName") } catch { case e: Exception => -LOGGER.error(e, "Alter table add columns failed") +LOGGER.error("Alter table add columns failed", e) if (newCols.nonEmpty) { LOGGER.info("Cleaning up the dictionary files as alter table add operation failed") new AlterTableDropColumnRDD(sparkSession, http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableDataTypeChangeCommand.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableDataTypeChangeCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableDataTypeChangeCommand.scala index 9ce79e9..716b9c9 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableDataTypeChangeCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableDataTypeChangeCommand.scala @@ -25,7 +25,8 @@ import org.apache.spark.sql.hive.CarbonSessionCatalog import org.apache.spark.util.AlterTableUtil import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException -import org.apache.carbondata.common.logging.{LogService, LogServiceFactory} +import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.common.logging.impl.Audit import org.apache.carbondata.core.features.TableOperation import org.apache.carbondata.core.locks.{ICarbonLock, LockUsage} import org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl @@ -40,11 +41,12 @@ private[sql] case class CarbonAlterTableDataTypeChangeCommand( extends MetadataCommand { override def processMetadata(sparkSession: SparkSession): Seq[Row] = { -val LOGGER: LogService = LogServiceFactory.getLogService(this.getClass.getCanonicalName) +val LOGGER = LogServiceFactory.getLogService(this.getClass.getCanonicalName) val tableName =
[6/6] carbondata git commit: [CARBONDATA-3024] Refactor to use log4j Logger directly
[CARBONDATA-3024] Refactor to use log4j Logger directly Currently CarbonData's log is printing the line number in StandardLogService, it is not good for maintainability, a better way is to use log4j Logger directly so that it will print line number of where we are logging. This closes #2827 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/06adb5a0 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/06adb5a0 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/06adb5a0 Branch: refs/heads/master Commit: 06adb5a0376255c15f8c393257d6db0736e35f31 Parents: 15d3826 Author: Jacky Li Authored: Wed Oct 17 20:27:15 2018 +0800 Committer: xuchuanyin Committed: Thu Oct 18 09:56:12 2018 +0800 -- .../carbondata/common/logging/LogService.java | 80 ++- .../common/logging/LogServiceFactory.java | 6 +- .../carbondata/common/logging/impl/Audit.java | 49 .../common/logging/impl/StandardLogService.java | 236 --- .../logging/LogServiceFactoryTest_UT.java | 7 +- .../logging/ft/LoggingServiceTest_FT.java | 9 +- .../logging/impl/StandardLogServiceTest_UT.java | 140 --- .../carbondata/core/cache/CacheProvider.java| 5 +- .../carbondata/core/cache/CarbonLRUCache.java | 5 +- .../dictionary/ForwardDictionaryCache.java | 5 +- .../dictionary/ManageDictionaryAndBTree.java| 5 +- .../dictionary/ReverseDictionaryCache.java | 5 +- .../core/constants/CarbonVersionConstants.java | 9 +- .../core/datamap/DataMapStoreManager.java | 10 +- .../carbondata/core/datamap/DataMapUtil.java| 4 +- .../status/DiskBasedDataMapStatusProvider.java | 17 +- .../block/SegmentPropertiesAndSchemaHolder.java | 5 +- .../blocklet/BlockletEncodedColumnPage.java | 5 +- .../datastore/compression/SnappyCompressor.java | 32 +-- .../filesystem/AbstractDFSCarbonFile.java | 4 +- .../datastore/filesystem/AlluxioCarbonFile.java | 6 +- .../datastore/filesystem/HDFSCarbonFile.java| 4 +- .../datastore/filesystem/LocalCarbonFile.java | 7 +- .../core/datastore/filesystem/S3CarbonFile.java | 4 +- .../datastore/filesystem/ViewFSCarbonFile.java | 4 +- .../core/datastore/impl/FileFactory.java| 4 +- .../datastore/page/LocalDictColumnPage.java | 9 +- .../page/encoding/ColumnPageEncoder.java| 5 +- .../client/NonSecureDictionaryClient.java | 7 +- .../NonSecureDictionaryClientHandler.java | 13 +- .../IncrementalColumnDictionaryGenerator.java | 7 +- .../generator/TableDictionaryGenerator.java | 5 +- .../server/NonSecureDictionaryServer.java | 6 +- .../NonSecureDictionaryServerHandler.java | 6 +- .../service/AbstractDictionaryServer.java | 5 +- .../AtomicFileOperationsImpl.java | 5 +- .../indexstore/BlockletDataMapIndexStore.java | 4 +- .../core/indexstore/BlockletDetailInfo.java | 8 +- .../indexstore/blockletindex/BlockDataMap.java | 13 +- .../blockletindex/SegmentIndexFileStore.java| 4 +- .../DateDirectDictionaryGenerator.java | 5 +- .../TimeStampDirectDictionaryGenerator.java | 5 +- .../core/locks/CarbonLockFactory.java | 5 +- .../carbondata/core/locks/CarbonLockUtil.java | 4 +- .../carbondata/core/locks/HdfsFileLock.java | 5 +- .../carbondata/core/locks/LocalFileLock.java| 5 +- .../carbondata/core/locks/S3FileLock.java | 7 +- .../carbondata/core/locks/ZooKeeperLocking.java | 10 +- .../carbondata/core/locks/ZookeeperInit.java| 4 +- .../core/memory/IntPointerBuffer.java | 5 +- .../core/memory/UnsafeMemoryManager.java| 5 +- .../core/memory/UnsafeSortMemoryManager.java| 5 +- .../core/metadata/SegmentFileStore.java | 4 +- .../core/metadata/schema/table/CarbonTable.java | 4 +- .../core/metadata/schema/table/TableInfo.java | 5 +- .../core/mutate/CarbonUpdateUtil.java | 4 +- .../core/mutate/DeleteDeltaBlockDetails.java| 5 +- .../core/mutate/SegmentUpdateDetails.java | 5 +- .../reader/CarbonDeleteDeltaFileReaderImpl.java | 8 - .../reader/CarbonDeleteFilesDataReader.java | 4 +- .../CarbonDictionarySortIndexReaderImpl.java| 6 +- .../scan/collector/ResultCollectorFactory.java | 16 +- .../RestructureBasedRawResultCollector.java | 8 - .../executor/impl/AbstractQueryExecutor.java| 8 +- .../impl/SearchModeDetailQueryExecutor.java | 4 +- .../SearchModeVectorDetailQueryExecutor.java| 4 +- .../expression/RangeExpressionEvaluator.java| 5 +- .../scan/filter/FilterExpressionProcessor.java | 5 +- .../carbondata/core/scan/filter/FilterUtil.java | 6 +- .../executer/RowLevelFilterExecuterImpl.java| 5 +- .../core/scan/model
[4/6] carbondata git commit: [CARBONDATA-3024] Refactor to use log4j Logger directly
http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/core/src/main/java/org/apache/carbondata/core/util/ObjectSizeCalculator.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/ObjectSizeCalculator.java b/core/src/main/java/org/apache/carbondata/core/util/ObjectSizeCalculator.java index 513e786..3d63560 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/ObjectSizeCalculator.java +++ b/core/src/main/java/org/apache/carbondata/core/util/ObjectSizeCalculator.java @@ -19,9 +19,10 @@ package org.apache.carbondata.core.util; import java.lang.reflect.Method; -import org.apache.carbondata.common.logging.LogService; import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.log4j.Logger; + /** * This wrapper class is created so that core doesnt have direct dependency on spark * TODO: Need to have carbon implementation if carbon needs to be used without spark @@ -30,7 +31,7 @@ public final class ObjectSizeCalculator { /** * Logger object for the class */ - private static final LogService LOGGER = + private static final Logger LOGGER = LogServiceFactory.getLogService(ObjectSizeCalculator.class.getName()); /** @@ -63,7 +64,7 @@ public final class ObjectSizeCalculator { } catch (Throwable ex) { // throwable is being caught as external interface is being invoked through reflection // and runtime exceptions might get thrown - LOGGER.error(ex, "Could not access method SizeEstimator:estimate.Returning default value"); + LOGGER.error("Could not access method SizeEstimator:estimate.Returning default value", ex); methodAccessible = false; return defValue; } http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java b/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java index 931e106..027e6cb 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java +++ b/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java @@ -23,8 +23,8 @@ import java.util.Map; import java.util.concurrent.ConcurrentHashMap; import org.apache.carbondata.common.constants.LoggerAction; -import org.apache.carbondata.common.logging.LogService; import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.common.logging.impl.Audit; import org.apache.carbondata.core.cache.CacheProvider; import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.constants.CarbonCommonConstantsInternal; @@ -56,12 +56,14 @@ import static org.apache.carbondata.core.constants.CarbonLoadOptionConstants.CAR import static org.apache.carbondata.core.constants.CarbonLoadOptionConstants.CARBON_OPTIONS_TIMESTAMPFORMAT; import static org.apache.carbondata.core.constants.CarbonV3DataFormatConstants.BLOCKLET_SIZE_IN_MB; +import org.apache.log4j.Logger; + /** * This class maintains carbon session params */ public class SessionParams implements Serializable, Cloneable { - private static final LogService LOGGER = + private static final Logger LOGGER = LogServiceFactory.getLogService(CacheProvider.class.getName()); private static final long serialVersionUID = -7801994600594915264L; @@ -124,7 +126,8 @@ public class SessionParams implements Serializable, Cloneable { value = value.toUpperCase(); } if (doAuditing) { -LOGGER.audit("The key " + key + " with value " + value + " added in the session param"); +Audit.log(LOGGER, +"The key " + key + " with value " + value + " added in the session param"); } sProps.put(key, value); } http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/core/src/main/java/org/apache/carbondata/core/util/TaskMetricsMap.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/TaskMetricsMap.java b/core/src/main/java/org/apache/carbondata/core/util/TaskMetricsMap.java index 16dacb2..196fd64 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/TaskMetricsMap.java +++ b/core/src/main/java/org/apache/carbondata/core/util/TaskMetricsMap.java @@ -23,17 +23,17 @@ import java.util.Map; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.CopyOnWriteArrayList; -import org.apache.carbondata.common.logging.LogService; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.hadoop.fs.FileSystem; +import org.apache.log4j.Logger; /** * This class maintains task level metrics info for all spawned child threads and parent task thread */
[3/6] carbondata git commit: [CARBONDATA-3024] Refactor to use log4j Logger directly
http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/StreamHandoffRDD.scala -- diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/StreamHandoffRDD.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/StreamHandoffRDD.scala index 57b2e44..3f0eb71 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/StreamHandoffRDD.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/StreamHandoffRDD.scala @@ -28,7 +28,9 @@ import org.apache.spark.{Partition, SerializableWritable, SparkContext, TaskCont import org.apache.spark.sql.SparkSession import org.apache.spark.sql.catalyst.expressions.GenericInternalRow +import org.apache.carbondata.api.CarbonStore.LOGGER import org.apache.carbondata.common.logging.LogServiceFactory +import org.apache.carbondata.common.logging.impl.Audit import org.apache.carbondata.converter.SparkDataTypeConverterImpl import org.apache.carbondata.core.datamap.Segment import org.apache.carbondata.core.datastore.block.SegmentProperties @@ -335,7 +337,7 @@ object StreamHandoffRDD { } catch { case ex: Exception => loadStatus = SegmentStatus.LOAD_FAILURE -LOGGER.error(ex, s"Handoff failed on streaming segment $handoffSegmenId") +LOGGER.error(s"Handoff failed on streaming segment $handoffSegmenId", ex) errorMessage = errorMessage + ": " + ex.getCause.getMessage LOGGER.error(errorMessage) } @@ -345,7 +347,7 @@ object StreamHandoffRDD { LOGGER.info("starting clean up**") CarbonLoaderUtil.deleteSegment(carbonLoadModel, carbonLoadModel.getSegmentId.toInt) LOGGER.info("clean up done**") - LOGGER.audit(s"Handoff is failed for " + + Audit.log(LOGGER, s"Handoff is failed for " + s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }") LOGGER.error("Cannot write load metadata file as handoff failed") throw new Exception(errorMessage) @@ -367,7 +369,7 @@ object StreamHandoffRDD { .fireEvent(loadTablePostStatusUpdateEvent, operationContext) if (!done) { val errorMessage = "Handoff failed due to failure in table status updation." -LOGGER.audit("Handoff is failed for " + +Audit.log(LOGGER, "Handoff is failed for " + s"${ carbonLoadModel.getDatabaseName }.${ carbonLoadModel.getTableName }") LOGGER.error("Handoff failed due to failure in table status updation.") throw new Exception(errorMessage) http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala -- diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala index 2cc2a5b..1e8d148 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CarbonScalaUtil.scala @@ -27,6 +27,7 @@ import scala.collection.mutable import scala.util.Try import com.univocity.parsers.common.TextParsingException +import org.apache.log4j.Logger import org.apache.spark.SparkException import org.apache.spark.sql._ import org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil @@ -363,7 +364,7 @@ object CarbonScalaUtil { /** * Retrieve error message from exception */ - def retrieveAndLogErrorMsg(ex: Throwable, logger: LogService): (String, String) = { + def retrieveAndLogErrorMsg(ex: Throwable, logger: Logger): (String, String) = { var errorMessage = "DataLoad failure" var executorMessage = "" if (ex != null) { http://git-wip-us.apache.org/repos/asf/carbondata/blob/06adb5a0/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala -- diff --git a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala index 67c4c9b..704382f 100644 --- a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala +++ b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala @@ -696,17 +696,17 @@ object GlobalDictionaryUtil { } catch { case ex: Exception => if (ex.getCause != null &&
carbondata git commit: [CARBONDATA-2974] Fixed multiple expressions issue on datamap chooser and bloom datamap
Repository: carbondata Updated Branches: refs/heads/master 8427771fc -> 8284d9ed1 [CARBONDATA-2974] Fixed multiple expressions issue on datamap chooser and bloom datamap DataMap framework provide a mechanism to composite expression and forward it to corresponding datamap, in this way, the datamap can handle the pruning in batch. But currently the expressions the framework forwarded contains the one that cannot be supported by the datamap, so here we optimize the datamap chooser. We will composite the expression and wrap them into AndExpression. These expressions are exactly the datamap wanted. The bloomfilter datamap changed accordingly to handle the AndExpression. This closes #2767 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/8284d9ed Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/8284d9ed Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/8284d9ed Branch: refs/heads/master Commit: 8284d9ed1fe60d8881788656b7f78c055f76e453 Parents: 8427771 Author: ravipesala Authored: Wed Sep 26 16:56:03 2018 +0530 Committer: xuchuanyin Committed: Fri Sep 28 16:46:49 2018 +0800 -- .../carbondata/core/datamap/DataMapChooser.java | 76 ++-- .../datamap/bloom/BloomCoarseGrainDataMap.java | 8 ++- .../bloom/BloomCoarseGrainDataMapSuite.scala| 62 +++- 3 files changed, 106 insertions(+), 40 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/8284d9ed/core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java index 68696cf..3b6537c 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java @@ -39,6 +39,7 @@ import org.apache.carbondata.core.scan.expression.logical.AndExpression; import org.apache.carbondata.core.scan.expression.logical.OrExpression; import org.apache.carbondata.core.scan.filter.intf.ExpressionType; import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf; +import org.apache.carbondata.core.scan.filter.resolver.resolverinfo.TrueConditionalResolverImpl; /** * This chooser does 2 jobs. @@ -123,9 +124,11 @@ public class DataMapChooser { if (resolverIntf != null) { Expression expression = resolverIntf.getFilterExpression(); List datamaps = level == DataMapLevel.CG ? cgDataMaps : fgDataMaps; - ExpressionTuple tuple = selectDataMap(expression, datamaps, resolverIntf); - if (tuple.dataMapExprWrapper != null) { -return tuple.dataMapExprWrapper; + if (datamaps.size() > 0) { +ExpressionTuple tuple = selectDataMap(expression, datamaps, resolverIntf); +if (tuple.dataMapExprWrapper != null) { + return tuple.dataMapExprWrapper; +} } } return null; @@ -177,34 +180,35 @@ public class DataMapChooser { // If both left and right has datamap then we can either merge both datamaps to single // datamap if possible. Otherwise apply AND expression. if (left.dataMapExprWrapper != null && right.dataMapExprWrapper != null) { -filterExpressionTypes.add( - left.dataMapExprWrapper.getFilterResolverIntf().getFilterExpression() -.getFilterExpressionType()); -filterExpressionTypes.add( - right.dataMapExprWrapper.getFilterResolverIntf().getFilterExpression() -.getFilterExpressionType()); +filterExpressionTypes.addAll(left.filterExpressionTypes); +filterExpressionTypes.addAll(right.filterExpressionTypes); List columnExpressions = new ArrayList<>(); columnExpressions.addAll(left.columnExpressions); columnExpressions.addAll(right.columnExpressions); // Check if we can merge them to single datamap. TableDataMap dataMap = chooseDataMap(allDataMap, columnExpressions, filterExpressionTypes); +TrueConditionalResolverImpl resolver = new TrueConditionalResolverImpl( +new AndExpression(left.expression, right.expression), false, +true); if (dataMap != null) { ExpressionTuple tuple = new ExpressionTuple(); tuple.columnExpressions = columnExpressions; - tuple.dataMapExprWrapper = new DataMapExprWrapperImpl(dataMap, filterResolverIntf); + tuple.dataMapExprWrapper = new DataMapExprWrapperImp
carbondata git commit: [CARBONDATA-2971] Add shard info of blocklet for debugging
Repository: carbondata Updated Branches: refs/heads/master 3cd8b947c -> 5c0da31a5 [CARBONDATA-2971] Add shard info of blocklet for debugging add toString method to print both shard name and blocklet id for debugging. This closes #2765 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/5c0da31a Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/5c0da31a Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/5c0da31a Branch: refs/heads/master Commit: 5c0da31a5a0afaf707455fa80ac431a082a57ec9 Parents: 3cd8b94 Author: Manhua Authored: Wed Sep 26 10:34:54 2018 +0800 Committer: xuchuanyin Committed: Thu Sep 27 11:37:56 2018 +0800 -- .../carbondata/core/indexstore/Blocklet.java| 21 .../blockletindex/BlockletDataMapFactory.java | 2 +- 2 files changed, 18 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/5c0da31a/core/src/main/java/org/apache/carbondata/core/indexstore/Blocklet.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/Blocklet.java b/core/src/main/java/org/apache/carbondata/core/indexstore/Blocklet.java index c6e1681..3270d08 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/Blocklet.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/Blocklet.java @@ -65,17 +65,20 @@ public class Blocklet implements Writable,Serializable { return filePath; } - @Override public void write(DataOutput out) throws IOException { + @Override + public void write(DataOutput out) throws IOException { out.writeUTF(filePath); out.writeUTF(blockletId); } - @Override public void readFields(DataInput in) throws IOException { + @Override + public void readFields(DataInput in) throws IOException { filePath = in.readUTF(); blockletId = in.readUTF(); } - @Override public boolean equals(Object o) { + @Override + public boolean equals(Object o) { if (this == o) return true; if (o == null || getClass() != o.getClass()) return false; @@ -92,7 +95,17 @@ public class Blocklet implements Writable,Serializable { blocklet.blockletId == null; } - @Override public int hashCode() { + @Override + public String toString() { +final StringBuffer sb = new StringBuffer("Blocklet{"); +sb.append("filePath='").append(filePath).append('\''); +sb.append(", blockletId='").append(blockletId).append('\''); +sb.append('}'); +return sb.toString(); + } + + @Override + public int hashCode() { int result = filePath != null ? filePath.hashCode() : 0; result = 31 * result; if (compareBlockletIdForObjectMatching) { http://git-wip-us.apache.org/repos/asf/carbondata/blob/5c0da31a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java index e16c3cd..096a5e3 100644 --- a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataMapFactory.java @@ -252,7 +252,7 @@ public class BlockletDataMapFactory extends CoarseGrainDataMapFactory } } } -throw new IOException("Blocklet with blockid " + blocklet.getBlockletId() + " not found "); +throw new IOException("Blocklet not found: " + blocklet.toString()); }
carbondata git commit: [CARBONDATA-2965] support Benchmark command in CarbonCli [Forced Update!]
Repository: carbondata Updated Branches: refs/heads/master 23a9e7c5f -> e07df44a1 (forced update) [CARBONDATA-2965] support Benchmark command in CarbonCli A new command called "benchmark" is added in CarbonCli tool to output the scan performance of the specified file and column. Example usage: ```bash shell>java -jar carbondata-cli.jar org.apache.carbondata.CarbonCli -cmd benchmark -p hdfs://carbon1:9000/carbon.store/tpchcarbon_base/lineitem/ -a -c l_comment ``` will scan output the scan time of l_comment column in first file in the input folder and prints: (or using -f option to provide the data file instead of folder) ``` ReadHeaderAndFooter takes 12,598 us ConvertFooter takes 4,712 us ReadAllMetaAndConvertFooter takes 8,039 us Scan column 'l_comment' Blocklet#0: ColumnChunkIO takes 222,609 us Blocklet#0: DecompressPage takes 111,985 us Blocklet#1: ColumnChunkIO takes 186,522 us Blocklet#1: DecompressPage takes 89,132 us Blocklet#2: ColumnChunkIO takes 209,129 us Blocklet#2: DecompressPage takes 84,051 us ``` This closes #2755 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/e07df44a Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/e07df44a Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/e07df44a Branch: refs/heads/master Commit: e07df44a1db52304c54ab4e379f28b0f026449fd Parents: 49f6715 Author: Jacky Li Authored: Sun Sep 23 00:01:04 2018 +0800 Committer: xuchuanyin Committed: Wed Sep 26 15:47:37 2018 +0800 -- .../core/util/DataFileFooterConverterV3.java| 6 +- pom.xml | 7 +- tools/cli/pom.xml | 5 + .../org/apache/carbondata/tool/CarbonCli.java | 90 .../org/apache/carbondata/tool/Command.java | 28 +++ .../org/apache/carbondata/tool/DataFile.java| 94 +++-- .../org/apache/carbondata/tool/DataSummary.java | 188 ++--- .../apache/carbondata/tool/FileCollector.java | 147 + .../apache/carbondata/tool/ScanBenchmark.java | 205 +++ .../apache/carbondata/tool/CarbonCliTest.java | 94 + 10 files changed, 622 insertions(+), 242 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/e07df44a/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java b/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java index 41e22fd..438e3e3 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java +++ b/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java @@ -59,12 +59,16 @@ public class DataFileFooterConverterV3 extends AbstractDataFileFooterConverter { */ @Override public DataFileFooter readDataFileFooter(TableBlockInfo tableBlockInfo) throws IOException { -DataFileFooter dataFileFooter = new DataFileFooter(); CarbonHeaderReader carbonHeaderReader = new CarbonHeaderReader(tableBlockInfo.getFilePath()); FileHeader fileHeader = carbonHeaderReader.readHeader(); CarbonFooterReaderV3 reader = new CarbonFooterReaderV3(tableBlockInfo.getFilePath(), tableBlockInfo.getBlockOffset()); FileFooter3 footer = reader.readFooterVersion3(); +return convertDataFileFooter(fileHeader, footer); + } + + public DataFileFooter convertDataFileFooter(FileHeader fileHeader, FileFooter3 footer) { +DataFileFooter dataFileFooter = new DataFileFooter(); dataFileFooter.setVersionId(ColumnarFormatVersion.valueOf((short) fileHeader.getVersion())); dataFileFooter.setNumberOfRows(footer.getNum_rows()); dataFileFooter.setSegmentInfo(getSegmentInfo(footer.getSegment_info())); http://git-wip-us.apache.org/repos/asf/carbondata/blob/e07df44a/pom.xml -- diff --git a/pom.xml b/pom.xml index eff438b..00a5287 100644 --- a/pom.xml +++ b/pom.xml @@ -106,6 +106,7 @@ store/sdk store/search assembly +tools/cli @@ -718,12 +719,6 @@ datamap/mv/core - - tools - -tools/cli - - http://git-wip-us.apache.org/repos/asf/carbondata/blob/e07df44a/tools/cli/pom.xml -- diff --git a/tools/cli/pom.xml b/tools/cli/pom.xml index 0d00438..60e69dc 100644 --- a/tools/cli/pom.xml +++ b/tools/cli/pom.xml @@ -25,6 +25,11 @@ ${project.version} + javax.servlet + servlet-api + 2.5 + + junit junit test
carbondata git commit: [CARBONDATA-2965] support Benchmark command in CarbonCli [Forced Update!]
Repository: carbondata Updated Branches: refs/heads/master 83f28800b -> 23a9e7c5f (forced update) [CARBONDATA-2965] support Benchmark command in CarbonCli A new command called "benchmark" is added in CarbonCli tool to output the scan performance of the specified file and column. Example usage: ```bash shell>java -jar carbondata-cli.jar org.apache.carbondata.CarbonCli -cmd benchmark -p hdfs://carbon1:9000/carbon.store/tpchcarbon_base/lineitem/ -a -c l_comment ``` will scan output the scan time of l_comment column in first file in the input folder and prints: (or using -f option to provide the data file instead of folder) ``` ReadHeaderAndFooter takes 12,598 us ConvertFooter takes 4,712 us ReadAllMetaAndConvertFooter takes 8,039 us Scan column 'l_comment' Blocklet#0: ColumnChunkIO takes 222,609 us Blocklet#0: DecompressPage takes 111,985 us Blocklet#1: ColumnChunkIO takes 186,522 us Blocklet#1: DecompressPage takes 89,132 us Blocklet#2: ColumnChunkIO takes 209,129 us Blocklet#2: DecompressPage takes 84,051 us ``` This closes#2755 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/23a9e7c5 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/23a9e7c5 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/23a9e7c5 Branch: refs/heads/master Commit: 23a9e7c5fde0b801b6c475949f26e2bbc05e9255 Parents: 49f6715 Author: Jacky Li Authored: Sun Sep 23 00:01:04 2018 +0800 Committer: xuchuanyin Committed: Wed Sep 26 15:45:52 2018 +0800 -- .../core/util/DataFileFooterConverterV3.java| 6 +- pom.xml | 7 +- tools/cli/pom.xml | 5 + .../org/apache/carbondata/tool/CarbonCli.java | 90 .../org/apache/carbondata/tool/Command.java | 28 +++ .../org/apache/carbondata/tool/DataFile.java| 94 +++-- .../org/apache/carbondata/tool/DataSummary.java | 188 ++--- .../apache/carbondata/tool/FileCollector.java | 147 + .../apache/carbondata/tool/ScanBenchmark.java | 205 +++ .../apache/carbondata/tool/CarbonCliTest.java | 94 + 10 files changed, 622 insertions(+), 242 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/23a9e7c5/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java b/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java index 41e22fd..438e3e3 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java +++ b/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java @@ -59,12 +59,16 @@ public class DataFileFooterConverterV3 extends AbstractDataFileFooterConverter { */ @Override public DataFileFooter readDataFileFooter(TableBlockInfo tableBlockInfo) throws IOException { -DataFileFooter dataFileFooter = new DataFileFooter(); CarbonHeaderReader carbonHeaderReader = new CarbonHeaderReader(tableBlockInfo.getFilePath()); FileHeader fileHeader = carbonHeaderReader.readHeader(); CarbonFooterReaderV3 reader = new CarbonFooterReaderV3(tableBlockInfo.getFilePath(), tableBlockInfo.getBlockOffset()); FileFooter3 footer = reader.readFooterVersion3(); +return convertDataFileFooter(fileHeader, footer); + } + + public DataFileFooter convertDataFileFooter(FileHeader fileHeader, FileFooter3 footer) { +DataFileFooter dataFileFooter = new DataFileFooter(); dataFileFooter.setVersionId(ColumnarFormatVersion.valueOf((short) fileHeader.getVersion())); dataFileFooter.setNumberOfRows(footer.getNum_rows()); dataFileFooter.setSegmentInfo(getSegmentInfo(footer.getSegment_info())); http://git-wip-us.apache.org/repos/asf/carbondata/blob/23a9e7c5/pom.xml -- diff --git a/pom.xml b/pom.xml index eff438b..00a5287 100644 --- a/pom.xml +++ b/pom.xml @@ -106,6 +106,7 @@ store/sdk store/search assembly +tools/cli @@ -718,12 +719,6 @@ datamap/mv/core - - tools - -tools/cli - - http://git-wip-us.apache.org/repos/asf/carbondata/blob/23a9e7c5/tools/cli/pom.xml -- diff --git a/tools/cli/pom.xml b/tools/cli/pom.xml index 0d00438..60e69dc 100644 --- a/tools/cli/pom.xml +++ b/tools/cli/pom.xml @@ -25,6 +25,11 @@ ${project.version} + javax.servlet + servlet-api + 2.5 + + junit junit test
carbondata git commit: support Benchmark command in CarbonCli
Repository: carbondata Updated Branches: refs/heads/master 49f67153a -> 83f28800b support Benchmark command in CarbonCli A new command called "benchmark" is added in CarbonCli tool to output the scan performance of the specified file and column. Example usage: ```bash shell>java -jar carbondata-cli.jar org.apache.carbondata.CarbonCli -cmd benchmark -p hdfs://carbon1:9000/carbon.store/tpchcarbon_base/lineitem/ -a -c l_comment ``` will scan output the scan time of l_comment column in first file in the input folder and prints: (or using -f option to provide the data file instead of folder) ``` ReadHeaderAndFooter takes 12,598 us ConvertFooter takes 4,712 us ReadAllMetaAndConvertFooter takes 8,039 us Scan column 'l_comment' Blocklet#0: ColumnChunkIO takes 222,609 us Blocklet#0: DecompressPage takes 111,985 us Blocklet#1: ColumnChunkIO takes 186,522 us Blocklet#1: DecompressPage takes 89,132 us Blocklet#2: ColumnChunkIO takes 209,129 us Blocklet#2: DecompressPage takes 84,051 us ``` Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/83f28800 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/83f28800 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/83f28800 Branch: refs/heads/master Commit: 83f28800bfc152fd1d9bfebef758ee4f4a8659db Parents: 49f6715 Author: Jacky Li Authored: Sun Sep 23 00:01:04 2018 +0800 Committer: xuchuanyin Committed: Wed Sep 26 15:44:00 2018 +0800 -- .../core/util/DataFileFooterConverterV3.java| 6 +- pom.xml | 7 +- tools/cli/pom.xml | 5 + .../org/apache/carbondata/tool/CarbonCli.java | 90 .../org/apache/carbondata/tool/Command.java | 28 +++ .../org/apache/carbondata/tool/DataFile.java| 94 +++-- .../org/apache/carbondata/tool/DataSummary.java | 188 ++--- .../apache/carbondata/tool/FileCollector.java | 147 + .../apache/carbondata/tool/ScanBenchmark.java | 205 +++ .../apache/carbondata/tool/CarbonCliTest.java | 94 + 10 files changed, 622 insertions(+), 242 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/83f28800/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java b/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java index 41e22fd..438e3e3 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java +++ b/core/src/main/java/org/apache/carbondata/core/util/DataFileFooterConverterV3.java @@ -59,12 +59,16 @@ public class DataFileFooterConverterV3 extends AbstractDataFileFooterConverter { */ @Override public DataFileFooter readDataFileFooter(TableBlockInfo tableBlockInfo) throws IOException { -DataFileFooter dataFileFooter = new DataFileFooter(); CarbonHeaderReader carbonHeaderReader = new CarbonHeaderReader(tableBlockInfo.getFilePath()); FileHeader fileHeader = carbonHeaderReader.readHeader(); CarbonFooterReaderV3 reader = new CarbonFooterReaderV3(tableBlockInfo.getFilePath(), tableBlockInfo.getBlockOffset()); FileFooter3 footer = reader.readFooterVersion3(); +return convertDataFileFooter(fileHeader, footer); + } + + public DataFileFooter convertDataFileFooter(FileHeader fileHeader, FileFooter3 footer) { +DataFileFooter dataFileFooter = new DataFileFooter(); dataFileFooter.setVersionId(ColumnarFormatVersion.valueOf((short) fileHeader.getVersion())); dataFileFooter.setNumberOfRows(footer.getNum_rows()); dataFileFooter.setSegmentInfo(getSegmentInfo(footer.getSegment_info())); http://git-wip-us.apache.org/repos/asf/carbondata/blob/83f28800/pom.xml -- diff --git a/pom.xml b/pom.xml index eff438b..00a5287 100644 --- a/pom.xml +++ b/pom.xml @@ -106,6 +106,7 @@ store/sdk store/search assembly +tools/cli @@ -718,12 +719,6 @@ datamap/mv/core - - tools - -tools/cli - - http://git-wip-us.apache.org/repos/asf/carbondata/blob/83f28800/tools/cli/pom.xml -- diff --git a/tools/cli/pom.xml b/tools/cli/pom.xml index 0d00438..60e69dc 100644 --- a/tools/cli/pom.xml +++ b/tools/cli/pom.xml @@ -25,6 +25,11 @@ ${project.version} + javax.servlet + servlet-api + 2.5 + + junit junit test http://git-wip-us.apache.org/repos/asf/carbonda
carbondata git commit: [CARBONDATA-2783][BloomDataMap][Doc] Update document for bloom filter datamap
Repository: carbondata Updated Branches: refs/heads/master 316e9de65 -> 232f6ced3 [CARBONDATA-2783][BloomDataMap][Doc] Update document for bloom filter datamap add example for enable/disable datamap This closes #2554 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/232f6ced Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/232f6ced Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/232f6ced Branch: refs/heads/master Commit: 232f6ced3944b437539ae9f1ad05c62441d44f46 Parents: 316e9de Author: Manhua Authored: Wed Jul 25 16:51:49 2018 +0800 Committer: xuchuanyin Committed: Thu Jul 26 09:15:17 2018 +0800 -- docs/datamap/bloomfilter-datamap-guide.md | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/232f6ced/docs/datamap/bloomfilter-datamap-guide.md -- diff --git a/docs/datamap/bloomfilter-datamap-guide.md b/docs/datamap/bloomfilter-datamap-guide.md index 0fb3ba2..325a508 100644 --- a/docs/datamap/bloomfilter-datamap-guide.md +++ b/docs/datamap/bloomfilter-datamap-guide.md @@ -26,7 +26,16 @@ Showing all DataMaps on this table SHOW DATAMAP ON TABLE main_table ``` -It will show all DataMaps created on main table. + +Disable Datamap +> The datamap by default is enabled. To support tuning on query, we can disable a specific datamap during query to observe whether we can gain performance enhancement from it. This will only take effect current session. + + ``` + // disable the datamap + SET carbon.datamap.visible.dbName.tableName.dataMapName = false + // enable the datamap + SET carbon.datamap.visible.dbName.tableName.dataMapName = true + ``` ## BloomFilter DataMap Introduction
carbondata git commit: [CARBONDATA-2774][BloomDataMap] Exception should be thrown if expression do not satisfy bloomFilter's requirement
Repository: carbondata Updated Branches: refs/heads/master 895239490 -> b853963b2 [CARBONDATA-2774][BloomDataMap] Exception should be thrown if expression do not satisfy bloomFilter's requirement If query on string column with bloom index using number as filter value, we would get wrong result. Because DDL will wrap the filter with a datatype conversion, and DatamapChooser chooses datamap by searching column deep down to all children of filter expression. However, bloom filter required the expression is simply column =/in filterValue. So bloom will fail to build any querymodel and no hit blocklet, such that we will get an empty result. To avoid getting wrong answer silently, we thrown exception instead of only giving a error log currently. For users who wants to get correct result, can fix the SQL using corresponding data type or disable bloom datamap. This closes #2545 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/b853963b Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/b853963b Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/b853963b Branch: refs/heads/master Commit: b853963b262689491c3eafb81654c72b35966165 Parents: 8952394 Author: Manhua Authored: Tue Jul 24 14:51:18 2018 +0800 Committer: xuchuanyin Committed: Tue Jul 24 16:45:26 2018 +0800 -- .../carbondata/datamap/bloom/BloomCoarseGrainDataMap.java| 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/b853963b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java -- diff --git a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java index 26db300..be531d6 100644 --- a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java +++ b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMap.java @@ -203,7 +203,9 @@ public class BloomCoarseGrainDataMap extends CoarseGrainDataMap { } return queryModels; } else { -LOGGER.warn("BloomFilter can only support the 'equal' filter like 'Col = PlainValue'"); +String errorMsg = "BloomFilter can only support the 'equal' filter like 'Col = PlainValue'"; +LOGGER.warn(errorMsg); +throw new RuntimeException(errorMsg); } } else if (expression instanceof InExpression) { Expression left = ((InExpression) expression).getLeft(); @@ -226,7 +228,9 @@ public class BloomCoarseGrainDataMap extends CoarseGrainDataMap { } return queryModels; } else { -LOGGER.warn("BloomFilter can only support the 'in' filter like 'Col in (PlainValues)'"); +String errorMsg = "BloomFilter can only support the 'in' filter like 'Col in PlainValue'"; +LOGGER.warn(errorMsg); +throw new RuntimeException(errorMsg); } }
carbondata git commit: [CARBONDATA-2618][32K] Split to multiple pages if varchar column page exceeds 2GB/snappy limits
Repository: carbondata Updated Branches: refs/heads/master 42a80564c -> 6f1767b5a [CARBONDATA-2618][32K] Split to multiple pages if varchar column page exceeds 2GB/snappy limits Dynamic column page size decided by long string column A varchar column page uses SafeVarLengthColumnPage/UnsafeVarLengthColumnPage to store data and encoded using HighCardDictDimensionIndexCodec which will call getByteArrayPage() from column page and flatten into byte[] for compression. Limited by the index of array, we can only put number of Integer.MAX_VALUE bytes in a page. Another limitation is from Compressor. Currently we use snappy as default compressor, and it will call MaxCompressedLength method to estimate the result size for preparing output. For safety, the estimate result is oversize: 32 + source_len + source_len/6. So the maximum bytes to compress by snappy is (2GB-32)*6/7â1.71GB. Size of a row does not exceed 2MB since UnsafeSortDataRows uses 2MB byte[] as rowBuffer. Such that we can stop adding more row here if any long string column reach this limit. This closes #2464 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/6f1767b5 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/6f1767b5 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/6f1767b5 Branch: refs/heads/master Commit: 6f1767b5a3bf7d13053210623efdeb7512bcdedb Parents: 42a8056 Author: Manhua Authored: Mon Jul 9 16:42:08 2018 +0800 Committer: xuchuanyin Committed: Tue Jul 24 14:40:24 2018 +0800 -- .../datastore/compression/SnappyCompressor.java | 3 ++ .../store/CarbonFactDataHandlerColumnar.java| 53 ++-- .../store/CarbonFactDataHandlerModel.java | 44 3 files changed, 97 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/6f1767b5/core/src/main/java/org/apache/carbondata/core/datastore/compression/SnappyCompressor.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/compression/SnappyCompressor.java b/core/src/main/java/org/apache/carbondata/core/datastore/compression/SnappyCompressor.java index 65244d2..bd740b2 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/compression/SnappyCompressor.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/compression/SnappyCompressor.java @@ -31,6 +31,9 @@ public class SnappyCompressor implements Compressor { private static final LogService LOGGER = LogServiceFactory.getLogService(SnappyCompressor.class.getName()); + // snappy estimate max compressed length as 32 + source_len + source_len/6 + public static final int MAX_BYTE_TO_COMPRESS = (int)((Integer.MAX_VALUE - 32) / 7.0 * 6); + private final SnappyNative snappyNative; public SnappyCompressor() { http://git-wip-us.apache.org/repos/asf/carbondata/blob/6f1767b5/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java -- diff --git a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java b/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java index 97737d0..94511cf 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java +++ b/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java @@ -34,8 +34,10 @@ import org.apache.carbondata.common.logging.LogService; import org.apache.carbondata.common.logging.LogServiceFactory; import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants; +import org.apache.carbondata.core.datastore.compression.SnappyCompressor; import org.apache.carbondata.core.datastore.exception.CarbonDataWriterException; import org.apache.carbondata.core.datastore.row.CarbonRow; +import org.apache.carbondata.core.datastore.row.WriteStepRowUtil; import org.apache.carbondata.core.keygenerator.KeyGenException; import org.apache.carbondata.core.keygenerator.columnar.impl.MultiDimKeyVarLengthEquiSplitGenerator; import org.apache.carbondata.core.memory.MemoryException; @@ -84,6 +86,7 @@ public class CarbonFactDataHandlerColumnar implements CarbonFactHandler { private ExecutorService consumerExecutorService; private List> consumerExecutorServiceTaskList; private List dataRows; + private int[] varcharColumnSizeInByte; /** * semaphore which will used for managing node holder objects */ @@ -191,7 +194,7 @@ public class CarbonFactDataHandlerColumnar impl
carbondata git commit: [CARBONDATA-2682][32K] fix create table with long_string_columns properties bugs
Repository: carbondata Updated Branches: refs/heads/master 43285bbd1 -> 45960f4a8 [CARBONDATA-2682][32K] fix create table with long_string_columns properties bugs This PR fixes create table with long_string_columns bugs which are: 1.create table with columns both in long_string_columns and partition or no_inverted_index property should be blocked. 2.create table with duplicate columns in long_string_column property should be blocked. This closes #2506 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/45960f4a Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/45960f4a Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/45960f4a Branch: refs/heads/master Commit: 45960f4a8d45c99d917ea9aec3579dc3b2345af4 Parents: 43285bb Author: Sssan520 Authored: Thu Jul 12 19:58:23 2018 +0800 Committer: xuchuanyin Committed: Mon Jul 23 14:53:27 2018 +0800 -- .../VarcharDataTypesBasicTestCase.scala | 66 +-- .../spark/sql/catalyst/CarbonDDLSqlParser.scala | 88 +--- 2 files changed, 137 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/45960f4a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala index cb7cd81..4b8187f 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/longstring/VarcharDataTypesBasicTestCase.scala @@ -25,6 +25,7 @@ import org.apache.spark.sql.test.util.QueryTest import org.apache.spark.sql.types.{IntegerType, StringType, StructField, StructType} import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach} +import org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.metadata.CarbonMetadata import org.apache.carbondata.core.metadata.datatype.DataTypes @@ -79,7 +80,7 @@ class VarcharDataTypesBasicTestCase extends QueryTest with BeforeAndAfterEach wi } test("long string columns cannot be dictionary include") { -val exceptionCaught = intercept[Exception] { +val exceptionCaught = intercept[MalformedCarbonCommandException] { sql( s""" | CREATE TABLE if not exists $longStringTable( @@ -93,7 +94,7 @@ class VarcharDataTypesBasicTestCase extends QueryTest with BeforeAndAfterEach wi } test("long string columns cannot be dictionay exclude") { -val exceptionCaught = intercept[Exception] { +val exceptionCaught = intercept[MalformedCarbonCommandException] { sql( s""" | CREATE TABLE if not exists $longStringTable( @@ -107,7 +108,7 @@ class VarcharDataTypesBasicTestCase extends QueryTest with BeforeAndAfterEach wi } test("long string columns cannot be sort_columns") { -val exceptionCaught = intercept[Exception] { +val exceptionCaught = intercept[MalformedCarbonCommandException] { sql( s""" | CREATE TABLE if not exists $longStringTable( @@ -121,20 +122,73 @@ class VarcharDataTypesBasicTestCase extends QueryTest with BeforeAndAfterEach wi } test("long string columns can only be string columns") { -val exceptionCaught = intercept[Exception] { +val exceptionCaught = intercept[MalformedCarbonCommandException] { sql( s""" | CREATE TABLE if not exists $longStringTable( | id INT, name STRING, description STRING, address STRING, note STRING | ) STORED BY 'carbondata' | TBLPROPERTIES('LONG_STRING_COLUMNS'='id, note') - |""". - stripMargin) + |""".stripMargin) } assert(exceptionCaught.getMessage.contains("long_string_columns: id")) assert(exceptionCaught.getMessage.contains("its data type is not string")) } + test("long string columns cannot contain duplicate columns") { +val exceptionCaught = intercept[MalformedCarbonCommandException] { + sql( +s""" + | CREATE TABLE if not exists $longStringTable( + | i
carbondata git commit: [CARBONDATA-2757][BloomDataMap] Fix bug when building bloomfilter on measure column
Repository: carbondata Updated Branches: refs/heads/master 9f42fbf33 -> 7551cc696 [CARBONDATA-2757][BloomDataMap] Fix bug when building bloomfilter on measure column 1. support to get raw data from decimal column page when building datamap in loading process 2. convert decimal column to java datatype when rebuilding bloom datamap from query result 3. convert boolean to byte as carbon wants 4. fix bugs when measure column is null This closes #2526 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/7551cc69 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/7551cc69 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/7551cc69 Branch: refs/heads/master Commit: 7551cc69657b725d2c5566e9718f9bebb42d14f5 Parents: 9f42fbf Author: Manhua Authored: Thu Jul 19 16:26:18 2018 +0800 Committer: xuchuanyin Committed: Sun Jul 22 14:01:47 2018 +0800 -- .../core/datastore/page/ColumnPage.java | 3 + .../core/datastore/page/DecimalColumnPage.java | 48 ++ .../datastore/page/SafeDecimalColumnPage.java | 21 --- .../datastore/page/UnsafeDecimalColumnPage.java | 23 --- .../bloom/AbstractBloomDataMapWriter.java | 10 +- .../datamap/bloom/BloomCoarseGrainDataMap.java | 12 +- .../datamap/bloom/DataConvertUtil.java | 22 ++- .../datamap/IndexDataMapRebuildRDD.scala| 14 +- .../bloom/BloomCoarseGrainDataMapSuite.scala| 169 +++ 9 files changed, 270 insertions(+), 52 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/7551cc69/core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java index ea250cf..75e47de 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/ColumnPage.java @@ -525,6 +525,9 @@ public abstract class ColumnPage { result = getBoolean(rowId); } else if (dataType == DataTypes.BYTE) { result = getByte(rowId); + if (columnSpec.getSchemaDataType() == DataTypes.BOOLEAN) { +result = BooleanConvert.byte2Boolean((byte)result); + } } else if (dataType == DataTypes.SHORT) { result = getShort(rowId); } else if (dataType == DataTypes.INT) { http://git-wip-us.apache.org/repos/asf/carbondata/blob/7551cc69/core/src/main/java/org/apache/carbondata/core/datastore/page/DecimalColumnPage.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/DecimalColumnPage.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/DecimalColumnPage.java index 2624223..368a289 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/DecimalColumnPage.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/DecimalColumnPage.java @@ -17,8 +17,11 @@ package org.apache.carbondata.core.datastore.page; +import java.math.BigDecimal; + import org.apache.carbondata.core.datastore.TableSpec; import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; import org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory; /** @@ -106,4 +109,49 @@ public abstract class DecimalColumnPage extends VarLengthColumnPageBase { throw new UnsupportedOperationException("invalid data type: " + dataType); } + // used for building datamap in loading process + private BigDecimal getDecimalFromRawData(int rowId) { +long value; +switch (decimalConverter.getDecimalConverterType()) { + case DECIMAL_INT: +value = getInt(rowId); +break; + case DECIMAL_LONG: +value = getLong(rowId); +break; + default: +value = getByte(rowId); +} +return decimalConverter.getDecimal(value); + } + + private BigDecimal getDecimalFromDecompressData(int rowId) { +long value; +if (dataType == DataTypes.BYTE) { + value = getByte(rowId); +} else if (dataType == DataTypes.SHORT) { + value = getShort(rowId); +} else if (dataType == DataTypes.SHORT_INT) { + value = getShortInt(rowId); +} else if (dataType == DataTypes.INT) { + value = getInt(rowId); +} else if (dataType == DataTypes.LONG) { + value = getLong(rowId); +} else { + return decimalConverter.getDecimal(getBytes(rowId)); +} +return decimalConverter.getDecimal(value); + } + + @Override + public BigDecimal getDeci
carbondata git commit: [CARBONDATA-2747][Lucene] Fix Lucene datamap choosing and DataMapDistributable building
Repository: carbondata Updated Branches: refs/heads/master 4a37e05ca -> 46f0c8517 [CARBONDATA-2747][Lucene] Fix Lucene datamap choosing and DataMapDistributable building 1. choose lucene datamap for query column 2. build DataMapDistributable only for target datamap This closes #2519 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/46f0c851 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/46f0c851 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/46f0c851 Branch: refs/heads/master Commit: 46f0c8517d4e79a402ff6dc8a077f3d3955f39b5 Parents: 4a37e05 Author: Manhua Authored: Wed Jul 18 10:14:40 2018 +0800 Committer: xuchuanyin Committed: Fri Jul 20 15:04:25 2018 +0800 -- .../carbondata/core/datamap/DataMapChooser.java | 14 +++ .../bloom/BloomCoarseGrainDataMapFactory.java | 1 - .../lucene/LuceneDataMapFactoryBase.java| 25 +++- .../lucene/LuceneFineGrainDataMapSuite.scala| 9 +-- 4 files changed, 30 insertions(+), 19 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/46f0c851/core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java index cf5dffd..68696cf 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapChooser.java @@ -34,6 +34,7 @@ import org.apache.carbondata.core.datamap.status.DataMapStatusManager; import org.apache.carbondata.core.metadata.schema.table.CarbonTable; import org.apache.carbondata.core.scan.expression.ColumnExpression; import org.apache.carbondata.core.scan.expression.Expression; +import org.apache.carbondata.core.scan.expression.MatchExpression; import org.apache.carbondata.core.scan.expression.logical.AndExpression; import org.apache.carbondata.core.scan.expression.logical.OrExpression; import org.apache.carbondata.core.scan.filter.intf.ExpressionType; @@ -269,6 +270,14 @@ public class DataMapChooser { List columnExpressions) { if (expression instanceof ColumnExpression) { columnExpressions.add((ColumnExpression) expression); +} else if (expression instanceof MatchExpression) { + // this is a special case for lucene + // build a fake ColumnExpression to filter datamaps which contain target column + // a Lucene query string is alike "column:query term" + String[] queryItems = expression.getString().split(":", 2); + if (queryItems.length == 2) { +columnExpressions.add(new ColumnExpression(queryItems[0], null)); + } } else if (expression != null) { List children = expression.getChildren(); if (children != null && children.size() > 0) { @@ -303,11 +312,6 @@ public class DataMapChooser { */ private boolean contains(DataMapMeta mapMeta, List columnExpressions, Set expressionTypes) { -if (mapMeta.getOptimizedOperation().contains(ExpressionType.TEXT_MATCH) && -expressionTypes.contains(ExpressionType.TEXT_MATCH)) { - // TODO: fix it with right logic - return true; -} if (mapMeta.getIndexedColumns().size() == 0 || columnExpressions.size() == 0) { return false; } http://git-wip-us.apache.org/repos/asf/carbondata/blob/46f0c851/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java -- diff --git a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java index 4b5bc7c..652e1fc 100644 --- a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java +++ b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java @@ -278,7 +278,6 @@ public class BloomCoarseGrainDataMapFactory extends DataMapFactory 0) { for (TableDataMap dataMap : dataMaps) { -// different from lucene, bloom only get corresponding directory of current datamap if (dataMap.getDataMapSchema().getDataMapName().equals(this.dataMapName)) { List indexFiles; String dmPath = CarbonTablePath.getDataMapStorePath(tablePath, segmentId, http://git-wip-us.apache.org/repos/asf/carbondata/blob/46f0c851/datamap/lucene/src/main/java/org/apache/carbondata/da
carbondata git commit: [CARBONDATA-2746][BloomDataMap] Fix bug for getting datamap file when table has multiple datamaps
Repository: carbondata Updated Branches: refs/heads/master a16289786 -> 4612e0031 [CARBONDATA-2746][BloomDataMap] Fix bug for getting datamap file when table has multiple datamaps Currently, if table has multiple bloom datamap and carbon is set to use distributed datamap, query will throw an exception when accessing the index file, because carbon gets all the datamaps but sets them with same datamap schema. The error is appeared when getting the full path of bloom index by concating index directory and index column. This PR fix this problem by filter the index directories of target datamap when using distributed datamap. Test shows that lucene is not affected by this. On the other hand, lucene gets wrong result if we apply this filter This closes #2512 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/4612e003 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/4612e003 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/4612e003 Branch: refs/heads/master Commit: 4612e003186ccc6bae89443043bd0db3463f8fc1 Parents: a162897 Author: Manhua Authored: Mon Jul 16 19:29:07 2018 +0800 Committer: xuchuanyin Committed: Wed Jul 18 09:10:22 2018 +0800 -- .../bloom/BloomCoarseGrainDataMapFactory.java | 27 +++-- .../lucene/LuceneFineGrainDataMapSuite.scala| 7 .../bloom/BloomCoarseGrainDataMapSuite.scala| 40 3 files changed, 62 insertions(+), 12 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/4612e003/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java -- diff --git a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java index 35ebd20..4b5bc7c 100644 --- a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java +++ b/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java @@ -278,18 +278,21 @@ public class BloomCoarseGrainDataMapFactory extends DataMapFactory 0) { for (TableDataMap dataMap : dataMaps) { -List indexFiles; -String dmPath = CarbonTablePath -.getDataMapStorePath(tablePath, segmentId, dataMap.getDataMapSchema().getDataMapName()); -FileFactory.FileType fileType = FileFactory.getFileType(dmPath); -final CarbonFile dirPath = FileFactory.getCarbonFile(dmPath, fileType); -indexFiles = Arrays.asList(dirPath.listFiles(new CarbonFileFilter() { - @Override - public boolean accept(CarbonFile file) { -return file.isDirectory(); - } -})); -indexDirs.addAll(indexFiles); +// different from lucene, bloom only get corresponding directory of current datamap +if (dataMap.getDataMapSchema().getDataMapName().equals(this.dataMapName)) { + List indexFiles; + String dmPath = CarbonTablePath.getDataMapStorePath(tablePath, segmentId, + dataMap.getDataMapSchema().getDataMapName()); + FileFactory.FileType fileType = FileFactory.getFileType(dmPath); + final CarbonFile dirPath = FileFactory.getCarbonFile(dmPath, fileType); + indexFiles = Arrays.asList(dirPath.listFiles(new CarbonFileFilter() { +@Override +public boolean accept(CarbonFile file) { + return file.isDirectory(); +} + })); + indexDirs.addAll(indexFiles); +} } } return indexDirs.toArray(new CarbonFile[0]); http://git-wip-us.apache.org/repos/asf/carbondata/blob/4612e003/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala index 657a3eb..aebbde4 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala @@ -34,6 +34,10 @@ import org.apache.carbondata.core.datamap.status.DataMapStatusManager class LuceneFineGrainDataMapSuite extends QueryTest with BeforeAndAfterAll { + val originDistributedDatamapStatus = CarbonProperties.getInstance().getPrope
carbondata git commit: [CARBONDATA-2724][DataMap]Unsupported create datamap on table with V1 or V2 format data
Repository: carbondata Updated Branches: refs/heads/master 81038f55e -> a16289786 [CARBONDATA-2724][DataMap]Unsupported create datamap on table with V1 or V2 format data block creating datamap on carbon table with V1 or V2 format Currently the version info is read from carbon data file This closes #2488 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/a1628978 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/a1628978 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/a1628978 Branch: refs/heads/master Commit: a162897862c92947ea8fd63713b7dbe6098f3b13 Parents: 81038f5 Author: ndwangsen Authored: Wed Jul 11 17:41:25 2018 +0800 Committer: xuchuanyin Committed: Tue Jul 17 23:35:50 2018 +0800 -- .../apache/carbondata/core/util/CarbonUtil.java | 51 .../datamap/CarbonCreateDataMapCommand.scala| 8 ++- 2 files changed, 58 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/a1628978/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java b/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java index 9796696..642fe8e 100644 --- a/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java +++ b/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java @@ -88,6 +88,7 @@ import org.apache.carbondata.core.util.path.CarbonTablePath; import org.apache.carbondata.format.BlockletHeader; import org.apache.carbondata.format.DataChunk2; import org.apache.carbondata.format.DataChunk3; +import org.apache.carbondata.format.FileHeader; import com.google.gson.Gson; import com.google.gson.GsonBuilder; @@ -3184,4 +3185,54 @@ public final class CarbonUtil { } return columnLocalDictGenMap; } + + /** + * This method get the carbon file format version + * + * @param carbonTable + * carbon Table + */ + public static ColumnarFormatVersion getFormatVersion(CarbonTable carbonTable) + throws IOException { +String storePath = null; +// if the carbontable is support flat folder +boolean supportFlatFolder = carbonTable.isSupportFlatFolder(); +if (supportFlatFolder) { + storePath = carbonTable.getTablePath(); +} else { + // get the valid segments + SegmentStatusManager segmentStatusManager = + new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier()); + SegmentStatusManager.ValidAndInvalidSegmentsInfo validAndInvalidSegmentsInfo = + segmentStatusManager.getValidAndInvalidSegments(); + List validSegments = validAndInvalidSegmentsInfo.getValidSegments(); + CarbonProperties carbonProperties = CarbonProperties.getInstance(); + if (validSegments.isEmpty()) { +return carbonProperties.getFormatVersion(); + } + storePath = carbonTable.getSegmentPath(validSegments.get(0).getSegmentNo()); +} + +CarbonFile[] carbonFiles = FileFactory +.getCarbonFile(storePath) +.listFiles(new CarbonFileFilter() { + @Override + public boolean accept(CarbonFile file) { +if (file == null) { + return false; +} +return file.getName().endsWith("carbondata"); + } +}); +if (carbonFiles == null || carbonFiles.length < 1) { + return CarbonProperties.getInstance().getFormatVersion(); +} + +CarbonFile carbonFile = carbonFiles[0]; +// get the carbon file header +CarbonHeaderReader headerReader = new CarbonHeaderReader(carbonFile.getCanonicalPath()); +FileHeader fileHeader = headerReader.readHeader(); +int version = fileHeader.getVersion(); +return ColumnarFormatVersion.valueOf((short)version); + } } http://git-wip-us.apache.org/repos/asf/carbondata/blob/a1628978/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala -- diff --git a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala index 7600160..336793e 100644 --- a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala +++ b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonCreateDataMapCommand.scala @@ -26,9 +26,10 @@ import org.apache.carbondata.common.exceptions.sql.{MalformedCarbonComma
carbondata git commit: [CARBONDATA-2727][BloomDataMap] Support create bloom datamap on newly added column
Repository: carbondata Updated Branches: refs/heads/master aec47e06f -> 81038f55e [CARBONDATA-2727][BloomDataMap] Support create bloom datamap on newly added column Add a result collector with rowId infomation for datamap rebuild if table schema is changed; Use keygenerator to retrieve surrogate value of dictIndexColumn from query result; This closes #2490 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/81038f55 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/81038f55 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/81038f55 Branch: refs/heads/master Commit: 81038f55ef9a582f82305378988f603ded76e524 Parents: aec47e0 Author: Manhua Authored: Wed Jul 11 19:39:31 2018 +0800 Committer: xuchuanyin Committed: Tue Jul 17 23:31:43 2018 +0800 -- .../scan/collector/ResultCollectorFactory.java | 31 ++--- ...RowIdRestructureBasedRawResultCollector.java | 138 +++ .../bloom/AbstractBloomDataMapWriter.java | 72 +- .../bloom/BloomCoarseGrainDataMapFactory.java | 2 +- .../datamap/bloom/BloomDataMapBuilder.java | 8 ++ .../datamap/bloom/BloomDataMapWriter.java | 72 ++ .../datamap/IndexDataMapRebuildRDD.scala| 131 +++--- .../bloom/BloomCoarseGrainDataMapSuite.scala| 96 + 8 files changed, 413 insertions(+), 137 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/81038f55/core/src/main/java/org/apache/carbondata/core/scan/collector/ResultCollectorFactory.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/scan/collector/ResultCollectorFactory.java b/core/src/main/java/org/apache/carbondata/core/scan/collector/ResultCollectorFactory.java index ea4afd1..e0a0b90 100644 --- a/core/src/main/java/org/apache/carbondata/core/scan/collector/ResultCollectorFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/scan/collector/ResultCollectorFactory.java @@ -18,15 +18,7 @@ package org.apache.carbondata.core.scan.collector; import org.apache.carbondata.common.logging.LogService; import org.apache.carbondata.common.logging.LogServiceFactory; -import org.apache.carbondata.core.scan.collector.impl.AbstractScannedResultCollector; -import org.apache.carbondata.core.scan.collector.impl.DictionaryBasedResultCollector; -import org.apache.carbondata.core.scan.collector.impl.DictionaryBasedVectorResultCollector; -import org.apache.carbondata.core.scan.collector.impl.RawBasedResultCollector; -import org.apache.carbondata.core.scan.collector.impl.RestructureBasedDictionaryResultCollector; -import org.apache.carbondata.core.scan.collector.impl.RestructureBasedRawResultCollector; -import org.apache.carbondata.core.scan.collector.impl.RestructureBasedVectorResultCollector; -import org.apache.carbondata.core.scan.collector.impl.RowIdBasedResultCollector; -import org.apache.carbondata.core.scan.collector.impl.RowIdRawBasedResultCollector; +import org.apache.carbondata.core.scan.collector.impl.*; import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo; /** @@ -51,14 +43,21 @@ public class ResultCollectorFactory { AbstractScannedResultCollector scannerResultAggregator = null; if (blockExecutionInfo.isRawRecordDetailQuery()) { if (blockExecutionInfo.isRestructuredBlock()) { -LOGGER.info("Restructure based raw collector is used to scan and collect the data"); -scannerResultAggregator = new RestructureBasedRawResultCollector(blockExecutionInfo); - } else if (blockExecutionInfo.isRequiredRowId()) { -LOGGER.info("RowId based raw collector is used to scan and collect the data"); -scannerResultAggregator = new RowIdRawBasedResultCollector(blockExecutionInfo); +if (blockExecutionInfo.isRequiredRowId()) { + LOGGER.info("RowId Restructure based raw ollector is used to scan and collect the data"); + scannerResultAggregator = new RowIdRestructureBasedRawResultCollector(blockExecutionInfo); +} else { + LOGGER.info("Restructure based raw collector is used to scan and collect the data"); + scannerResultAggregator = new RestructureBasedRawResultCollector(blockExecutionInfo); +} } else { -LOGGER.info("Row based raw collector is used to scan and collect the data"); -scannerResultAggregator = new RawBasedResultCollector(blockExecutionInfo); +if (blockExecutionInfo.isRequiredRowId()) { + LOGGER.info("RowId based raw collector is used to scan and collect the data"); + scannerResultAggregator = new RowIdRawBasedResultCollector(blockExecutio
carbondata git commit: [CARBONDATA-2698][CARBONDATA-2700][CARBONDATA-2732][BloomDataMap] block some operations of bloomfilter datamap
Repository: carbondata Updated Branches: refs/heads/master 8e7895715 -> 1c4358e89 [CARBONDATA-2698][CARBONDATA-2700][CARBONDATA-2732][BloomDataMap] block some operations of bloomfilter datamap 1.Block create bloomfilter datamap index on column which its datatype is complex type; 2.Block change datatype for bloomfilter index datamap; 3.Block dropping index columns for bloomfilter index datamap This closes #2505 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/1c4358e8 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/1c4358e8 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/1c4358e8 Branch: refs/heads/master Commit: 1c4358e89f5cba1132e9512107d3a0cb22087b7b Parents: 8e78957 Author: Sssan520 Authored: Mon Jul 16 10:59:43 2018 +0800 Committer: xuchuanyin Committed: Tue Jul 17 16:34:14 2018 +0800 -- .../core/datamap/dev/DataMapFactory.java| 13 +++ .../core/metadata/schema/table/CarbonTable.java | 13 ++- .../bloom/BloomCoarseGrainDataMapFactory.java | 37 +++- .../datamap/CarbonCreateDataMapCommand.scala| 10 ++ .../CarbonAlterTableDataTypeChangeCommand.scala | 3 +- .../CarbonAlterTableDropColumnCommand.scala | 3 +- .../bloom/BloomCoarseGrainDataMapSuite.scala| 99 7 files changed, 171 insertions(+), 7 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/1c4358e8/core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java b/core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java index 0889f8b..ab0f8ea 100644 --- a/core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java +++ b/core/src/main/java/org/apache/carbondata/core/datamap/dev/DataMapFactory.java @@ -144,4 +144,17 @@ public abstract class DataMapFactory { } } + /** + * whether to block operation on corresponding table or column. + * For example, bloomfilter datamap will block changing datatype for bloomindex column. + * By default it will not block any operation. + * + * @param operation table operation + * @param targets objects which the operation impact on + * @return true the operation will be blocked;false the operation will not be blocked + */ + public boolean isOperationBlocked(TableOperation operation, Object... targets) { +return false; + } + } http://git-wip-us.apache.org/repos/asf/carbondata/blob/1c4358e8/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java -- diff --git a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java index 71256d4..995f943 100644 --- a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java +++ b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java @@ -1054,11 +1054,12 @@ public class CarbonTable implements Serializable { /** * methods returns true if operation is allowed for the corresponding datamap or not * if this operation makes datamap stale it is not allowed - * @param carbonTable - * @param operation - * @return + * @param carbonTable carbontable to be operated + * @param operation which operation on the table,such as drop column,change datatype. + * @param targets objects which the operation impact on,such as column + * @return true allow;false not allow */ - public boolean canAllow(CarbonTable carbonTable, TableOperation operation) { + public boolean canAllow(CarbonTable carbonTable, TableOperation operation, Object... targets) { try { List datamaps = DataMapStoreManager.getInstance().getAllDataMap(carbonTable); if (!datamaps.isEmpty()) { @@ -1069,6 +1070,10 @@ public class CarbonTable implements Serializable { if (factoryClass.willBecomeStale(operation)) { return false; } + // check whether the operation is blocked for datamap + if (factoryClass.isOperationBlocked(operation, targets)) { +return false; + } } } } catch (Exception e) { http://git-wip-us.apache.org/repos/asf/carbondata/blob/1c4358e8/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java -- diff --git a/datamap/bloom/src/main/java/org/apache/carbondata/datamap/bloom/BloomCoarseGrainDataMapFactory.java b/datamap/bloom/src/m
carbondata git commit: [CARBONDATA-2719] Block update and delete on table having datamaps
Repository: carbondata Updated Branches: refs/heads/master 84102a22a -> 56e7dad7b [CARBONDATA-2719] Block update and delete on table having datamaps Table update/delete is needed to block on table which has datamaps. This close #2483 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/56e7dad7 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/56e7dad7 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/56e7dad7 Branch: refs/heads/master Commit: 56e7dad7b18b6d5946ccdc49c0d264384225d231 Parents: 84102a2 Author: ndwangsen Authored: Wed Jul 11 11:52:09 2018 +0800 Committer: xuchuanyin Committed: Fri Jul 13 09:50:56 2018 +0800 -- .../lucene/LuceneFineGrainDataMapSuite.scala| 8 ++-- .../iud/DeleteCarbonTableTestCase.scala | 44 +++ .../TestInsertAndOtherCommandConcurrent.scala | 12 +++-- .../iud/UpdateCarbonTableTestCase.scala | 46 .../spark/sql/hive/CarbonAnalysisRules.scala| 37 +++- 5 files changed, 138 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/56e7dad7/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala index fd55145..657a3eb 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala @@ -641,15 +641,15 @@ class LuceneFineGrainDataMapSuite extends QueryTest with BeforeAndAfterAll { assert(ex4.getMessage.contains("alter table drop column is not supported")) sql(s"LOAD DATA LOCAL INPATH '$file2' INTO TABLE datamap_test7 OPTIONS('header'='false')") -val ex5 = intercept[MalformedCarbonCommandException] { +val ex5 = intercept[UnsupportedOperationException] { sql("UPDATE datamap_test7 d set(d.city)=('luc') where d.name='n10'").show() } -assert(ex5.getMessage.contains("update operation is not supported")) +assert(ex5.getMessage.contains("Update operation is not supported")) -val ex6 = intercept[MalformedCarbonCommandException] { +val ex6 = intercept[UnsupportedOperationException] { sql("delete from datamap_test7 where name = 'n10'").show() } -assert(ex6.getMessage.contains("delete operation is not supported")) +assert(ex6.getMessage.contains("Delete operation is not supported")) } test("test lucene fine grain multiple data map on table") { http://git-wip-us.apache.org/repos/asf/carbondata/blob/56e7dad7/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/DeleteCarbonTableTestCase.scala -- diff --git a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/DeleteCarbonTableTestCase.scala b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/DeleteCarbonTableTestCase.scala index 64aae1d..de93229 100644 --- a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/DeleteCarbonTableTestCase.scala +++ b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/iud/DeleteCarbonTableTestCase.scala @@ -298,6 +298,50 @@ class DeleteCarbonTableTestCase extends QueryTest with BeforeAndAfterAll { } + test("block deleting records from table which has preaggregate datamap") { +sql("drop table if exists test_dm_main") +sql("drop table if exists test_dm_main_preagg1") + +sql("create table test_dm_main (a string, b string, c string) stored by 'carbondata'") +sql("insert into test_dm_main select 'aaa','bbb','ccc'") +sql("insert into test_dm_main select 'bbb','bbb','ccc'") +sql("insert into test_dm_main select 'ccc','bbb','ccc'") + +sql( + "create datamap preagg1 on table test_dm_main using 'preaggregate' as select" + + " a,sum(b) from test_dm_main group by a") + +assert(intercept[UnsupportedOperationException] { + sql("delete from test_dm_main_preagg1 where test_dm_main_a = 'bbb'") +}.getMessage.contains("Delete operation i
carbondata git commit: [HOTFIX][CARBONDATA-2716][DataMap] fix bug for loading datamap [Forced Update!]
Repository: carbondata Updated Branches: refs/heads/master 0fb1e02a9 -> 84102a22a (forced update) [HOTFIX][CARBONDATA-2716][DataMap] fix bug for loading datamap In some scenarios, input parameter of getCarbonFactDataHandlerModel called carbonTable may be different from the one in loadmodel. This close #2497 Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/84102a22 Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/84102a22 Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/84102a22 Branch: refs/heads/master Commit: 84102a22accd3d24d52faccc747a62887c13d502 Parents: 9d7a9a2 Author: Manhua Authored: Thu Jul 12 16:47:15 2018 +0800 Committer: xuchuanyin Committed: Fri Jul 13 09:25:25 2018 +0800 -- .../carbondata/processing/store/CarbonFactDataHandlerModel.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/84102a22/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java -- diff --git a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java b/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java index 63e47f0..ca75b8c 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java +++ b/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java @@ -332,7 +332,7 @@ public class CarbonFactDataHandlerModel { new TableSpec(loadModel.getCarbonDataLoadSchema().getCarbonTable()); DataMapWriterListener listener = new DataMapWriterListener(); listener.registerAllWriter( -loadModel.getCarbonDataLoadSchema().getCarbonTable(), +carbonTable, loadModel.getSegmentId(), CarbonTablePath.getShardName( CarbonTablePath.DataFileUtil.getTaskIdFromTaskNo(loadModel.getTaskNo()),
carbondata git commit: [HOTFIX][CARBONDATA-2716][DataMap] fix bug for loading datamap
Repository: carbondata Updated Branches: refs/heads/master 9d7a9a2a9 -> 0fb1e02a9 [HOTFIX][CARBONDATA-2716][DataMap] fix bug for loading datamap In some scenarios, input parameter of getCarbonFactDataHandlerModel called carbonTable may be different from the one in loadmodel. Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/0fb1e02a Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/0fb1e02a Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/0fb1e02a Branch: refs/heads/master Commit: 0fb1e02a94ed1032598c99807516e34e0ed63e00 Parents: 9d7a9a2 Author: Manhua Authored: Thu Jul 12 16:47:15 2018 +0800 Committer: xuchuanyin Committed: Fri Jul 13 09:22:20 2018 +0800 -- .../carbondata/processing/store/CarbonFactDataHandlerModel.java| 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/carbondata/blob/0fb1e02a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java -- diff --git a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java b/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java index 63e47f0..ca75b8c 100644 --- a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java +++ b/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java @@ -332,7 +332,7 @@ public class CarbonFactDataHandlerModel { new TableSpec(loadModel.getCarbonDataLoadSchema().getCarbonTable()); DataMapWriterListener listener = new DataMapWriterListener(); listener.registerAllWriter( -loadModel.getCarbonDataLoadSchema().getCarbonTable(), +carbonTable, loadModel.getSegmentId(), CarbonTablePath.getShardName( CarbonTablePath.DataFileUtil.getTaskIdFromTaskNo(loadModel.getTaskNo()),