[GitHub] carbondata pull request #2848: [CARBONDATA-3036] Cache Columns And Refresh T...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/2848 [CARBONDATA-3036] Cache Columns And Refresh Table Isuue Fix Refresh Table Issue : Refresh Table command acting in case sensitive manner. Cache Columns Issue : Results inconsistent when cache is set but min/max exceeds. Columns are dictionary excluded. Fix 1 : Path for carbon file was been taken as whatever table name given in the query(Lowercase/Uppercase). Changed it to lowercase. Fix 2 : MinMaxFlag array was not set according to the columns to be cached giving inconsistent results. Changed it according to the min/max values array for whatever columns given in Cache Columns only. - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata RefreshAndCacheColumnsFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2848.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2848 commit 7158960c750cf6ed7243e1c7c4bbc44fe158326c Author: Manish Nalla Date: 2018-10-24T05:45:15Z CacheAndRefreshIsuueFix ---
[GitHub] carbondata issue #2845: [CARBONDATA-3039] Fix Custom Deterministic Expressio...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2845 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/981/ ---
[jira] [Created] (CARBONDATA-3039) Fix Custom Deterministic Expression for rand() UDF
Indhumathi Muthumurugesh created CARBONDATA-3039: Summary: Fix Custom Deterministic Expression for rand() UDF Key: CARBONDATA-3039 URL: https://issues.apache.org/jira/browse/CARBONDATA-3039 Project: CarbonData Issue Type: Improvement Reporter: Indhumathi Muthumurugesh -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2814 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1193/ ---
[GitHub] carbondata pull request #2842: [CARBONDATA-3032] Remove carbon.blocklet.size...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2842#discussion_r227635359 --- Diff: docs/sdk-guide.md --- @@ -24,7 +24,8 @@ CarbonData provides SDK to facilitate # SDK Writer -In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader. +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader. +If you want to use SDK, it needs other carbon jar or you can use carbondata-sdk.jar. --- End diff -- user use carbondata-store-sdk-x.x.x-SNAPSHOT.jar is not enough, it needs other carbon jar. but if user use carbondata-sdk.jar, it's enough ---
[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2814 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9246/ ---
[jira] [Resolved] (CARBONDATA-3008) make yarn-local and multiple dir for temp data enable by default
[ https://issues.apache.org/jira/browse/CARBONDATA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-3008. -- Resolution: Fixed Fix Version/s: 1.5.1 > make yarn-local and multiple dir for temp data enable by default > > > Key: CARBONDATA-3008 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3008 > Project: CarbonData > Issue Type: Improvement >Reporter: xuchuanyin >Priority: Minor > Fix For: 1.5.1 > > Time Spent: 3h > Remaining Estimate: 0h > > About a year ago, we introduced 'multiple dirs for temp data during data > loading' to solve disk hotspot problem. For about one years' usage in > productive environment, this feature turns to be effective and correct. So > here I propose to enable the related parameters by default. The related > parameters contains: > `carbon.use.local.dir` : Currently it is `false` by default, we will turn it > to `true` by default; > `carbon.user.multiple.dir` : Currently it is `false` by default, we will turn > it to `true` by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2824 ---
[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2824 LGTM ---
[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2814 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/980/ ---
[GitHub] carbondata pull request #2843: [CARBONDATA-3034] Carding parameters,Organize...
Github user ndwangsen commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2843#discussion_r227626868 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -23,86 +23,26 @@ import org.apache.carbondata.core.util.CarbonProperty; public final class CarbonCommonConstants { - /** - * surrogate value of null - */ - public static final int DICT_VALUE_NULL = 1; - /** - * surrogate value of null for direct dictionary - */ - public static final int DIRECT_DICT_VALUE_NULL = 1; - /** - * integer size in bytes - */ - public static final int INT_SIZE_IN_BYTE = 4; - /** - * short size in bytes - */ - public static final int SHORT_SIZE_IN_BYTE = 2; - /** - * DOUBLE size in bytes - */ - public static final int DOUBLE_SIZE_IN_BYTE = 8; - /** - * LONG size in bytes - */ - public static final int LONG_SIZE_IN_BYTE = 8; - /** - * byte to KB conversion factor - */ - public static final int BYTE_TO_KB_CONVERSION_FACTOR = 1024; - /** - * BYTE_ENCODING - */ - public static final String BYTE_ENCODING = "ISO-8859-1"; - /** - * measure meta data file name - */ - public static final String MEASURE_METADATA_FILE_NAME = "/msrMetaData_"; - - /** - * set the segment ids to query from the table - */ - public static final String CARBON_INPUT_SEGMENTS = "carbon.input.segments."; - - /** - * key prefix for set command. 'carbon.datamap.visible.dbName.tableName.dmName = false' means - * that the query on 'dbName.table' will not use the datamap 'dmName' - */ - @InterfaceStability.Unstable - public static final String CARBON_DATAMAP_VISIBLE = "carbon.datamap.visible."; - - /** - * Fetch and validate the segments. - * Used for aggregate table load as segment validation is not required. - */ - public static final String VALIDATE_CARBON_INPUT_SEGMENTS = "validate.carbon.input.segments."; + private CarbonCommonConstants() { + } /** --- End diff -- okï¼i modify it ---
[jira] [Updated] (CARBONDATA-3038) Refactor dynamic configuration
[ https://issues.apache.org/jira/browse/CARBONDATA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li updated CARBONDATA-3038: - Fix Version/s: 1.5.1 Description: Refactor dynamic configuration for carbon: 1. Decide and collect all dynamic configurations which can be SET in carbondata application like in beeline. 2. For every dynamic configuration, use an annotation to tag them. (re-use the CarbonProperty annotation but change its name). This annotation should be used for validation when user invoking SET command. Summary: Refactor dynamic configuration (was: Refactor dynamic confiugration) > Refactor dynamic configuration > -- > > Key: CARBONDATA-3038 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3038 > Project: CarbonData > Issue Type: Improvement >Reporter: Jacky Li >Priority: Major > Fix For: 1.5.1 > > > Refactor dynamic configuration for carbon: > 1. Decide and collect all dynamic configurations which can be SET in > carbondata application like in beeline. > 2. For every dynamic configuration, use an annotation to tag them. (re-use > the CarbonProperty annotation but change its name). This annotation should be > used for validation when user invoking SET command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3038) Refactor dynamic confiugration
Jacky Li created CARBONDATA-3038: Summary: Refactor dynamic confiugration Key: CARBONDATA-3038 URL: https://issues.apache.org/jira/browse/CARBONDATA-3038 Project: CarbonData Issue Type: Improvement Reporter: Jacky Li -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2824#discussion_r227625617 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala --- @@ -172,33 +172,8 @@ with Serializable { dataSchema: StructType, context: TaskAttemptContext): OutputWriter = { val model = CarbonTableOutputFormat.getLoadModel(context.getConfiguration) -val isCarbonUseMultiDir = CarbonProperties.getInstance().isUseMultiTempDir -var storeLocation: Array[String] = Array[String]() -val isCarbonUseLocalDir = CarbonProperties.getInstance() - .getProperty("carbon.use.local.dir", "false").equalsIgnoreCase("true") - - val taskNumber = generateTaskNumber(path, context, model.getSegmentId) -val tmpLocationSuffix = - File.separator + "carbon" + System.nanoTime() + File.separator + taskNumber -if (isCarbonUseLocalDir) { - val yarnStoreLocations = Util.getConfiguredLocalDirs(SparkEnv.get.conf) - if (!isCarbonUseMultiDir && null != yarnStoreLocations && yarnStoreLocations.nonEmpty) { -// use single dir -storeLocation = storeLocation :+ - (yarnStoreLocations(Random.nextInt(yarnStoreLocations.length)) + tmpLocationSuffix) -if (storeLocation == null || storeLocation.isEmpty) { - storeLocation = storeLocation :+ -(System.getProperty("java.io.tmpdir") + tmpLocationSuffix) -} - } else { -// use all the yarn dirs -storeLocation = yarnStoreLocations.map(_ + tmpLocationSuffix) - } -} else { - storeLocation = -storeLocation :+ (System.getProperty("java.io.tmpdir") + tmpLocationSuffix) -} +val storeLocation = CommonUtil.getTempStoreLocations(taskNumber) --- End diff -- @jackylk @QiangCai I've debugged and reviewed the code again and found it works as expected: all the emp locations were cleared. The `TempStoreLocations` generated at the begining of data loading is just the same as that at the closure of `CarbonTableOutputFormat` in which these locations will be cleared. ---
[GitHub] carbondata pull request #2843: [CARBONDATA-3034] Carding parameters,Organize...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2843#discussion_r227624952 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -23,86 +23,26 @@ import org.apache.carbondata.core.util.CarbonProperty; public final class CarbonCommonConstants { - /** - * surrogate value of null - */ - public static final int DICT_VALUE_NULL = 1; - /** - * surrogate value of null for direct dictionary - */ - public static final int DIRECT_DICT_VALUE_NULL = 1; - /** - * integer size in bytes - */ - public static final int INT_SIZE_IN_BYTE = 4; - /** - * short size in bytes - */ - public static final int SHORT_SIZE_IN_BYTE = 2; - /** - * DOUBLE size in bytes - */ - public static final int DOUBLE_SIZE_IN_BYTE = 8; - /** - * LONG size in bytes - */ - public static final int LONG_SIZE_IN_BYTE = 8; - /** - * byte to KB conversion factor - */ - public static final int BYTE_TO_KB_CONVERSION_FACTOR = 1024; - /** - * BYTE_ENCODING - */ - public static final String BYTE_ENCODING = "ISO-8859-1"; - /** - * measure meta data file name - */ - public static final String MEASURE_METADATA_FILE_NAME = "/msrMetaData_"; - - /** - * set the segment ids to query from the table - */ - public static final String CARBON_INPUT_SEGMENTS = "carbon.input.segments."; - - /** - * key prefix for set command. 'carbon.datamap.visible.dbName.tableName.dmName = false' means - * that the query on 'dbName.table' will not use the datamap 'dmName' - */ - @InterfaceStability.Unstable - public static final String CARBON_DATAMAP_VISIBLE = "carbon.datamap.visible."; - - /** - * Fetch and validate the segments. - * Used for aggregate table load as segment validation is not required. - */ - public static final String VALIDATE_CARBON_INPUT_SEGMENTS = "validate.carbon.input.segments."; + private CarbonCommonConstants() { + } /** --- End diff -- To make this description more easier to find, suggest to use: ``` // // System level property start here // // System level property is the global property for CarbonData // application, these properties are stored in a singleton instance // so that all processing logic in CarbonData uses the same // property value ``` And you can write the description for Table level property and others ---
[jira] [Resolved] (CARBONDATA-3030) Remove no use parameter in test case
[ https://issues.apache.org/jira/browse/CARBONDATA-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacky Li resolved CARBONDATA-3030. -- Resolution: Fixed Fix Version/s: 1.5.1 > Remove no use parameter in test case > > > Key: CARBONDATA-3030 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3030 > Project: CarbonData > Issue Type: Improvement >Reporter: xubo245 >Assignee: xubo245 >Priority: Major > Fix For: 1.5.1 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Remove no use parameter in test case > 1. remove persistSchema parameter in SDK test case > 2. remove isTransactional parameter in SDK test case > because https://github.com/apache/carbondata/pull/2749 remove the parameter > in SDK carbonWriter -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata pull request #2839: [CARBONDATA-3030] Remove no use parameter in ...
Github user asfgit closed the pull request at: https://github.com/apache/carbondata/pull/2839 ---
[GitHub] carbondata issue #2839: [CARBONDATA-3030] Remove no use parameter in test ca...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2839 LGTM, the coverage decrease 0.005% is very small for the whole project ---
[GitHub] carbondata pull request #2836: [CARBONDATA-3027] Increase unsafe working mem...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2836#discussion_r227623050 --- Diff: store/sdk/src/main/resources/log4j.properties --- @@ -0,0 +1,11 @@ +# Root logger option +log4j.rootLogger=INFO,stdout + + +# Redirect log messages to console +log4j.appender.debug=org.apache.log4j.RollingFileAppender +log4j.appender.stdout=org.apache.log4j.ConsoleAppender +log4j.appender.stdout.Target=System.out +log4j.appender.stdout.layout=org.apache.log4j.PatternLayout +log4j.appender.stdout.layout.ConversionPattern=%d{-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n --- End diff -- Now we are using log4j Logger directly, you can add %C also in the pattern to print the class name ---
[GitHub] carbondata pull request #2836: [CARBONDATA-3027] Increase unsafe working mem...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2836#discussion_r227622952 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1234,7 +1234,7 @@ @CarbonProperty public static final String UNSAFE_WORKING_MEMORY_IN_MB = "carbon.unsafe.working.memory.in.mb"; - public static final String UNSAFE_WORKING_MEMORY_IN_MB_DEFAULT = "512"; + public static final String UNSAFE_WORKING_MEMORY_IN_MB_DEFAULT = "1024"; --- End diff -- Can you describe the issue you encountered using the original default value? ---
[GitHub] carbondata issue #2836: [CARBONDATA-3027] Increase unsafe working memory def...
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2836 Can you describe a issue you encounter using the original default value? ---
[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2823#discussion_r227621527 --- Diff: integration/spark-datasource/src/main/spark2.1andspark2.2/org/apache/spark/sql/CarbonVectorProxy.java --- @@ -150,127 +140,189 @@ public void reset() { columnarBatch.reset(); } -public void putRowToColumnBatch(int rowId, Object value, int offset) { -org.apache.spark.sql.types.DataType t = dataType(offset); -if (null == value) { -putNull(rowId, offset); -} else { -if (t == org.apache.spark.sql.types.DataTypes.BooleanType) { -putBoolean(rowId, (boolean) value, offset); -} else if (t == org.apache.spark.sql.types.DataTypes.ByteType) { -putByte(rowId, (byte) value, offset); -} else if (t == org.apache.spark.sql.types.DataTypes.ShortType) { -putShort(rowId, (short) value, offset); -} else if (t == org.apache.spark.sql.types.DataTypes.IntegerType) { -putInt(rowId, (int) value, offset); -} else if (t == org.apache.spark.sql.types.DataTypes.LongType) { -putLong(rowId, (long) value, offset); -} else if (t == org.apache.spark.sql.types.DataTypes.FloatType) { -putFloat(rowId, (float) value, offset); -} else if (t == org.apache.spark.sql.types.DataTypes.DoubleType) { -putDouble(rowId, (double) value, offset); -} else if (t == org.apache.spark.sql.types.DataTypes.StringType) { -UTF8String v = (UTF8String) value; -putByteArray(rowId, v.getBytes(), offset); -} else if (t instanceof org.apache.spark.sql.types.DecimalType) { -DecimalType dt = (DecimalType) t; -Decimal d = Decimal.fromDecimal(value); -if (dt.precision() <= Decimal.MAX_INT_DIGITS()) { -putInt(rowId, (int) d.toUnscaledLong(), offset); -} else if (dt.precision() <= Decimal.MAX_LONG_DIGITS()) { -putLong(rowId, d.toUnscaledLong(), offset); -} else { -final BigInteger integer = d.toJavaBigDecimal().unscaledValue(); -byte[] bytes = integer.toByteArray(); -putByteArray(rowId, bytes, 0, bytes.length, offset); + +public static class ColumnVectorProxy { + +private ColumnVector vector; + +public ColumnVectorProxy(ColumnarBatch columnarBatch, int ordinal) { +this.vector = columnarBatch.column(ordinal); +} + +public void putRowToColumnBatch(int rowId, Object value, int offset) { +org.apache.spark.sql.types.DataType t = dataType(offset); --- End diff -- It seems the offset param is not used in dataType, and please change the function name of dataType ---
[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2823#discussion_r227620816 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/scanner/LazyBlockletLoad.java --- @@ -0,0 +1,158 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.scan.scanner; + +import java.io.IOException; + +import org.apache.carbondata.core.datastore.FileReader; +import org.apache.carbondata.core.datastore.chunk.AbstractRawColumnChunk; +import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk; +import org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk; +import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo; +import org.apache.carbondata.core.scan.processor.RawBlockletColumnChunks; +import org.apache.carbondata.core.stats.QueryStatistic; +import org.apache.carbondata.core.stats.QueryStatisticsConstants; +import org.apache.carbondata.core.stats.QueryStatisticsModel; + +/** + * Reads the blocklet column chunks lazily, it means it reads the column chunks from disk when + * execution engine wants to access it. + * It is useful in case of filter queries with high cardinality columns. + */ +public class LazyBlockletLoad { + + private RawBlockletColumnChunks rawBlockletColumnChunks; + + private BlockExecutionInfo blockExecutionInfo; + + private LazyChunkWrapper[] dimLazyWrapperChunks; + + private LazyChunkWrapper[] msrLazyWrapperChunks; --- End diff -- can we unify the processing of `dimLazyWrapperChunks` and `msrLazyWrapperChunks` so that we can use one flow for them? ---
[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2823#discussion_r227620665 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/directread/AbstractCarbonColumnarVector.java --- @@ -130,4 +131,9 @@ public CarbonColumnVector getDictionaryVector() { public void convert() { // Do nothing } + + @Override + public void setLazyPage(LazyPageLoad lazyPage) { +throw new UnsupportedOperationException("Not allowed from here"); --- End diff -- Put the class name in the message, it is easier for debugging ---
[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2823#discussion_r227620364 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java --- @@ -145,6 +147,8 @@ protected QueryStatisticsModel queryStatisticsModel; + protected LazyBlockletLoad lazyBlockletLoad; --- End diff -- Actually I am confused with the name xxxLoad, why is it called load? I am wondering is there a common name for this technique used in presto? ---
[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2823#discussion_r227620200 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -48,7 +48,11 @@ public int getLengthFromBuffer(ByteBuffer buffer) { public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, int numberOfRows) { if (type == DataTypes.STRING) { - return new StringVectorFiller(lengthSize, numberOfRows); + if (lengthSize > 2) { --- End diff -- 2 is magic number, can you change to constant or add a function to make it more readable ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227619567 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java --- @@ -124,6 +124,11 @@ private boolean preFetchData = true; + /** + * It fills the vector directly from decoded column page with out any staging and conversions --- End diff -- "It fills the vector", can you give more detail for which vector? and describe how spark/presto is integrated with this? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227619641 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java --- @@ -72,6 +72,11 @@ */ private int[] pageFilteredRowCount; + /** + * Filtered pages to be decoded and loaded to vector. + */ + private int[] pagesFiltered; --- End diff -- ```suggestion private int[] pagesIdFiltered; ``` ---
[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2814 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/979/ ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227619247 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java --- @@ -478,6 +478,17 @@ private BlockExecutionInfo getBlockExecutionInfoForBlock(QueryModel queryModel, } else { blockExecutionInfo.setPrefetchBlocklet(queryModel.isPreFetchData()); } +// In case of fg datamap it should not go to direct fill. +boolean fgDataMapPathPresent = false; +for (TableBlockInfo blockInfo : queryModel.getTableBlockInfos()) { + fgDataMapPathPresent = blockInfo.getDataMapWriterPath() != null; + if (fgDataMapPathPresent) { +break; --- End diff -- Is it possible to set the queryModel.setDirectVectorFill directly? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227619046 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedVectorResultCollector.java --- @@ -198,4 +219,48 @@ void fillColumnVectorDetails(CarbonColumnarBatch columnarBatch, int rowCounter, } } + private void collectResultInColumnarBatchDirect(BlockletScannedResult scannedResult, --- End diff -- add comment for this function ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227618801 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveIntegralCodec.java --- @@ -248,6 +269,143 @@ public double decodeDouble(float value) { public double decodeDouble(double value) { throw new RuntimeException("internal error: " + debugInfo()); } + +@Override +public void decodeAndFillVector(ColumnPage columnPage, ColumnVectorInfo vectorInfo) { + CarbonColumnVector vector = vectorInfo.vector; + BitSet nullBits = columnPage.getNullBits(); + DataType vectorDataType = vector.getType(); + DataType pageDataType = columnPage.getDataType(); + int pageSize = columnPage.getPageSize(); + BitSet deletedRows = vectorInfo.deletedRows; + fillVector(columnPage, vector, vectorDataType, pageDataType, pageSize, vectorInfo); + if (deletedRows == null || deletedRows.isEmpty()) { +for (int i = nullBits.nextSetBit(0); i >= 0; i = nullBits.nextSetBit(i + 1)) { + vector.putNull(i); +} + } +} + +private void fillVector(ColumnPage columnPage, CarbonColumnVector vector, +DataType vectorDataType, DataType pageDataType, int pageSize, ColumnVectorInfo vectorInfo) { + if (pageDataType == DataTypes.BOOLEAN || pageDataType == DataTypes.BYTE) { +byte[] byteData = columnPage.getBytePage(); +if (vectorDataType == DataTypes.SHORT) { + for (int i = 0; i < pageSize; i++) { +vector.putShort(i, (short) byteData[i]); + } +} else if (vectorDataType == DataTypes.INT) { + for (int i = 0; i < pageSize; i++) { +vector.putInt(i, (int) byteData[i]); + } +} else if (vectorDataType == DataTypes.LONG) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, byteData[i]); + } +} else if (vectorDataType == DataTypes.TIMESTAMP) { + for (int i = 0; i < pageSize; i++) { +vector.putLong(i, byteData[i] * 1000); + } +} else if (vectorDataType == DataTypes.BOOLEAN) { + vector.putBytes(0, pageSize, byteData, 0); + --- End diff -- remove empty line ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227618507 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java --- @@ -29,6 +31,12 @@ */ ColumnPage decode(byte[] input, int offset, int length) throws MemoryException, IOException; + /** + * Apply decoding algorithm on input byte array and fill the vector here. + */ + void decodeAndFillVector(byte[] input, int offset, int length, ColumnVectorInfo vectorInfo, + BitSet nullBits, boolean isLVEncoded) throws MemoryException, IOException; --- End diff -- I feel it is not good to add `isLVEncoded` just for LVEncoded, can we pass a more generic parameter, since this is a common class for all Decoder ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227618222 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java --- @@ -66,6 +66,14 @@ public abstract ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, */ public ColumnPageDecoder createDecoder(List encodings, List encoderMetas, String compressor) throws IOException { +return createDecoder(encodings, encoderMetas, compressor, false); + } + + /** + * Return new decoder based on encoder metadata read from file --- End diff -- In the comment, can you describe what is the behavior when `fullVectorFill` is true? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227618184 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java --- @@ -66,6 +66,14 @@ public abstract ColumnPageEncoder createEncoder(TableSpec.ColumnSpec columnSpec, */ public ColumnPageDecoder createDecoder(List encodings, List encoderMetas, String compressor) throws IOException { +return createDecoder(encodings, encoderMetas, compressor, false); + } + + /** + * Return new decoder based on encoder metadata read from file --- End diff -- In the comment, can you describe what is the behavior when `fullVectorFill` is true? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227617936 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageEncoderMeta.java --- @@ -49,6 +49,8 @@ // Make it protected for RLEEncoderMeta protected String compressorName; + private transient boolean fillCompleteVector; --- End diff -- add comment for this variable ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227618017 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java --- @@ -176,7 +179,7 @@ private static ColumnPage getDecimalColumnPage(TableSpec.ColumnSpec columnSpec, rowOffset.putInt(counter, offset); VarLengthColumnPageBase page; -if (unsafe) { +if (unsafe && !meta.isFillCompleteVector()) { --- End diff -- many place check like this, can we make a function for it and make it more readable by give proper function name? ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227617725 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/SafeDecimalColumnPage.java --- @@ -193,6 +193,30 @@ public void convertValue(ColumnPageValueConverter codec) { } } + @Override public byte[] getBytePage() { --- End diff -- move Override to previous line ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227617413 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,278 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { + + protected int lengthSize; + protected int numberOfRows; + + public AbstractNonDictionaryVectorFiller(int lengthSize, int numberOfRows) { +this.lengthSize = lengthSize; +this.numberOfRows = numberOfRows; + } + + public abstract void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer); + + public int getLengthFromBuffer(ByteBuffer buffer) { +return buffer.getShort(); + } +} + +class NonDictionaryVectorFillerFactory { + + public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType type, int lengthSize, + int numberOfRows) { +if (type == DataTypes.STRING) { + return new StringVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.VARCHAR) { + return new LongStringVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.TIMESTAMP) { + return new TimeStampVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.BOOLEAN) { + return new BooleanVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.SHORT) { + return new ShortVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.INT) { + return new IntVectorFiller(lengthSize, numberOfRows); +} else if (type == DataTypes.LONG) { + return new LongVectorFiller(lengthSize, numberOfRows); +} else { + throw new UnsupportedOperationException("Not supported datatype : " + type); +} + + } + +} + +class StringVectorFiller extends AbstractNonDictionaryVectorFiller { + + public StringVectorFiller(int lengthSize, int numberOfRows) { +super(lengthSize, numberOfRows); + } + + @Override + public void fillVector(byte[] data, CarbonColumnVector vector, ByteBuffer buffer) { +// start position will be used to store the current data position +int startOffset = 0; +// as first position will be start from length of bytes as data is stored first in the memory +// block we need to skip first two bytes this is because first two bytes will be length of the +// data which we have to skip +int currentOffset = lengthSize; +ByteUtil.UnsafeComparer comparator = ByteUtil.UnsafeComparer.INSTANCE; +for (int i = 0; i < numberOfRows - 1; i++) { + buffer.position(startOffset); + startOffset += getLengthFromBuffer(buffer) + lengthSize; + int length = startOffset - (currentOffset); + if (comparator.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, 0, + CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY.length, data, currentOffset, length)) { +vector.putNull(i); + } else { +vector.putByteArray(i, currentOffset, length, data); + } + currentOffset = startOffset + lengthSize; +} +// Handle last row +int length = (data.length - currentOffset); +if (comparator.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, 0, +
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227617094 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java --- @@ -0,0 +1,278 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.chunk.store.impl.safe; + +import java.nio.ByteBuffer; + +import org.apache.carbondata.core.constants.CarbonCommonConstants; +import org.apache.carbondata.core.metadata.datatype.DataType; +import org.apache.carbondata.core.metadata.datatype.DataTypes; +import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector; +import org.apache.carbondata.core.util.ByteUtil; +import org.apache.carbondata.core.util.DataTypeUtil; + +public abstract class AbstractNonDictionaryVectorFiller { --- End diff -- For public class, please add interface annotation ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227617004 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -1845,6 +1845,18 @@ public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MIN = 10; public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MAX = 1000; + /** + * When enabled complete row filters will be handled by carbon in case of vector. + * If it is disabled then only page level pruning will be done by carbon and row level filtering + * will be done by spark for vector. + * There is no change in flow for non-vector based queries. --- End diff -- can you also add in which case it is suggested to set to false? since default is true ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227616799 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java --- @@ -54,10 +75,15 @@ public VariableLengthDimensionColumnPage(byte[] dataChunks, int[] invertedIndex, } dataChunkStore = DimensionChunkStoreFactory.INSTANCE .getDimensionChunkStore(0, isExplicitSorted, numberOfRows, totalSize, dimStoreType, -dictionary); -dataChunkStore.putArray(invertedIndex, invertedIndexReverse, dataChunks); +dictionary, vectorInfo != null); +if (vectorInfo != null) { + dataChunkStore.fillVector(invertedIndex, invertedIndexReverse, dataChunks, vectorInfo); +} else { + dataChunkStore.putArray(invertedIndex, invertedIndexReverse, dataChunks); +} } + --- End diff -- remove this ---
[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2819#discussion_r227616722 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/MeasureRawColumnChunk.java --- @@ -105,6 +106,22 @@ public ColumnPage convertToColumnPageWithOutCache(int index) { } } + /** + * Convert raw data with specified page number processed to DimensionColumnDataChunk and fill the + * vector + * + * @param pageNumber page number to decode and fill the vector + * @param vectorInfo vector to be filled with column page + */ + public void convertToColumnPageAndFillVector(int pageNumber, ColumnVectorInfo vectorInfo) { +assert pageNumber < pagesCount; +try { + chunkReader.decodeColumnPageAndFillVector(this, pageNumber, vectorInfo); +} catch (IOException | MemoryException e) { + throw new RuntimeException(e); --- End diff -- Why not throw e directly? ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r227615426 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); + } finally { +gzos.close(); + } +} catch (IOException e) { + e.printStackTrace(); +} + +return bt.toByteArray(); --- End diff -- why `bt` is still open? ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r227615561 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); + } finally { +gzos.close(); + } +} catch (IOException e) { + e.printStackTrace(); +} + +return bt.toByteArray(); + } + + /* + * Method called for decompressing the data and + * return a byte array + */ + private byte[] decompressData(byte[] data) { + +ByteArrayInputStream bt = new ByteArrayInputStream(data); +ByteArrayOutputStream bot = new ByteArrayOutputStream(); + +try { + GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt); + byte[] buffer = new byte[1024]; + int len; + + while ((len = gzis.read(buffer)) != -1) { +bot.write(buffer, 0, len); + } + +} catch (IOException e) { + e.printStackTrace(); --- End diff -- please optimize the logging! ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r227615489 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); --- End diff -- please optimize the logging! ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r227615581 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); + } finally { +gzos.close(); + } +} catch (IOException e) { + e.printStackTrace(); +} + +return bt.toByteArray(); + } + + /* + * Method called for decompressing the data and + * return a byte array + */ + private byte[] decompressData(byte[] data) { + +ByteArrayInputStream bt = new ByteArrayInputStream(data); +ByteArrayOutputStream bot = new ByteArrayOutputStream(); + +try { + GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt); + byte[] buffer = new byte[1024]; + int len; + + while ((len = gzis.read(buffer)) != -1) { +bot.write(buffer, 0, len); + } + +} catch (IOException e) { + e.printStackTrace(); +} + +return bot.toByteArray(); --- End diff -- `bot` not closed ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r227615513 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); + } finally { +gzos.close(); + } +} catch (IOException e) { + e.printStackTrace(); --- End diff -- please optimize the logging! ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2847#discussion_r227615128 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java --- @@ -0,0 +1,201 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.core.datastore.compression; + +import java.io.ByteArrayInputStream; +import java.io.ByteArrayOutputStream; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.DoubleBuffer; +import java.nio.FloatBuffer; +import java.nio.IntBuffer; +import java.nio.LongBuffer; +import java.nio.ShortBuffer; + +import org.apache.carbondata.core.util.ByteUtil; + +import org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream; +import org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream; + +public class GzipCompressor implements Compressor { + + public GzipCompressor() { + } + + @Override public String getName() { +return "gzip"; + } + + /* + * Method called for compressing the data and + * return a byte array + */ + private byte[] compressData(byte[] data) { + +ByteArrayOutputStream bt = new ByteArrayOutputStream(); +try { + GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt); + try { +gzos.write(data); + } catch (IOException e) { +e.printStackTrace(); + } finally { +gzos.close(); + } +} catch (IOException e) { + e.printStackTrace(); +} + +return bt.toByteArray(); + } + + /* + * Method called for decompressing the data and + * return a byte array + */ + private byte[] decompressData(byte[] data) { + +ByteArrayInputStream bt = new ByteArrayInputStream(data); +ByteArrayOutputStream bot = new ByteArrayOutputStream(); + +try { + GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt); + byte[] buffer = new byte[1024]; + int len; + + while ((len = gzis.read(buffer)) != -1) { +bot.write(buffer, 0, len); + } + +} catch (IOException e) { + e.printStackTrace(); +} + +return bot.toByteArray(); + } + + @Override public byte[] compressByte(byte[] unCompInput) { +return compressData(unCompInput); + } + + @Override public byte[] compressByte(byte[] unCompInput, int byteSize) { +return compressData(unCompInput); + } + + @Override public byte[] unCompressByte(byte[] compInput) { +return decompressData(compInput); + } + + @Override public byte[] unCompressByte(byte[] compInput, int offset, int length) { +byte[] data = new byte[length]; +System.arraycopy(compInput, offset, data, 0, length); +return decompressData(data); + } + + @Override public byte[] compressShort(short[] unCompInput) { +ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * ByteUtil.SIZEOF_SHORT); +unCompBuffer.asShortBuffer().put(unCompInput); +return compressData(unCompBuffer.array()); + } + + @Override public short[] unCompressShort(byte[] compInput, int offset, int length) { +byte[] unCompArray = unCompressByte(compInput, offset, length); +ShortBuffer unCompBuffer = ByteBuffer.wrap(unCompArray).asShortBuffer(); +short[] shorts = new short[unCompArray.length / ByteUtil.SIZEOF_SHORT]; +unCompBuffer.get(shorts); +return shorts; + } + + @Override public byte[] compressInt(int[] unCompInput) { +ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * ByteUtil.SIZEOF_INT); +unCompBuffer.asIntBuffer().put(unCompInput); +return compressData(unCompBuffer.array()); + } + + @Override public int[] unCompressInt(byte[]
[GitHub] carbondata issue #2846: [WIP] Added direct fill
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2846 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1191/ ---
[GitHub] carbondata issue #2846: [WIP] Added direct fill
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2846 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/978/ ---
[GitHub] carbondata issue #2846: [WIP] Added direct fill
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2846 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9244/ ---
[GitHub] carbondata issue #2846: [WIP] Added direct fill
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2846 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1190/ ---
[GitHub] carbondata issue #2846: [WIP] Added direct fill
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2846 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/977/ ---
[GitHub] carbondata issue #2846: [WIP] Added direct fill
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2846 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9243/ ---
[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2814 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9242/ ---
[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2814 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1189/ ---
[GitHub] carbondata issue #2847: [WIP]Support Gzip as column compressor
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2847 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1187/ ---
[GitHub] carbondata issue #2845: [WIP] Rand function issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2845 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1188/ ---
[GitHub] carbondata issue #2845: [WIP] Rand function issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2845 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9241/ ---
[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2814 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/976/ ---
[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9240/ ---
[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2822 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9239/ ---
[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2823 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1184/ ---
[GitHub] carbondata issue #2845: [WIP] Rand function issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2845 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/975/ ---
[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1186/ ---
[GitHub] carbondata issue #2847: [WIP]Support Gzip as column compressor
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2847 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/974/ ---
[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2823 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9238/ ---
[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2822 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1185/ ---
[GitHub] carbondata issue #2847: [WIP]Support Gzip as column compressor
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2847 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9235/ ---
[GitHub] carbondata issue #2841: [WIP] Unsafe fallback to heap and unsafe query fix
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2841 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1183/ ---
[GitHub] carbondata issue #2841: [WIP] Unsafe fallback to heap and unsafe query fix
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2841 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9237/ ---
[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9234/ ---
[GitHub] carbondata issue #2845: [WIP] Rand function issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2845 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1182/ ---
[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2822 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9233/ ---
[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2829 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1179/ ---
[GitHub] carbondata issue #2845: [WIP] Rand function issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2845 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9236/ ---
[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/973/ ---
[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2822 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/972/ ---
[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2823 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/971/ ---
[GitHub] carbondata issue #2826: [CARBONDATA-3023] Alter add column issue with SORT_C...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2826 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/970/ ---
[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2829 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/969/ ---
[GitHub] carbondata issue #2830: [CARBONDATA-3025]Added CLI enhancements
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2830 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/968/ ---
[GitHub] carbondata issue #2841: [WIP] Unsafe fallback to heap and unsafe query fix
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2841 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/967/ ---
[GitHub] carbondata issue #2845: [WIP] Rand function issue
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2845 Can you explain why this PR is `Rand` related? I just cannot find any code changes related to `Rand`. ---
[GitHub] carbondata issue #2845: [WIP] Rand function issue
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2845 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/966/ ---
[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2823 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9232/ ---
[GitHub] carbondata issue #2826: [CARBONDATA-3023] Alter add column issue with SORT_C...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2826 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1176/ ---
[GitHub] carbondata pull request #2751: [CARBONDATA-2946] Add bloomindex version info...
Github user xuchuanyin closed the pull request at: https://github.com/apache/carbondata/pull/2751 ---
[GitHub] carbondata issue #2826: [CARBONDATA-3023] Alter add column issue with SORT_C...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2826 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9231/ ---
[GitHub] carbondata issue #2830: [CARBONDATA-3025]Added CLI enhancements
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2830 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1178/ ---
[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2829 Build Success with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9230/ ---
[GitHub] carbondata issue #2830: [CARBONDATA-3025]Added CLI enhancements
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2830 Build Failed with Spark 2.3.1, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/9229/ ---
[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor
GitHub user shardul-cr7 opened a pull request: https://github.com/apache/carbondata/pull/2847 [WIP]Support Gzip as column compressor Gzip compressed file size is less than that of snappy but takes more time. Data generated by tpch-dbgen(lineitem) **Load Performance Comparisons (Compression)** *Test Case 1* *File Size 3.9G* *Records ~30M* | Codec Used | Load Time | File Size After Load | | -- | -- | -- | | Snappy | 156s | 101M | Zstd| 153s | 2.2M | Gzip| 163s | 12.1M *Test Case 2* *File Size 7.8G* *Records ~60M* | Codec Used | Load Time | File Size After Load | | -- | -- | -- | | Snappy | 336s | 203.6M | Zstd| 352s | 4.3M | Gzip| 354s | 12.1M **Query Performance (Decompression)** *Test Case 1* | Codec Used | Full Scan Time | -- | -- | Snappy | 16.108s | Zstd| 14.595s | Gzip| 14.313s *Test Case 2* | Codec Used | Full Scan Time | -- | -- | Snappy | 23.559s | Zstd| 23.913s | Gzip| 26.741s Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done added some testcases - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/shardul-cr7/carbondata b010 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2847.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2847 commit 6ad88ccc5663353d16372d91878d7efb223b16d6 Author: shardul-cr7 Date: 2018-10-23T11:57:47Z [WIP]Support Gzip ---
[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2806 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1174/ ---
[GitHub] carbondata issue #2845: [WIP] Rand function issue
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2845 What is this for? ---
[GitHub] carbondata issue #2842: [CARBONDATA-3032] Remove carbon.blocklet.size from p...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2842 For 'carbon.blocklet.size', why not remove this property in code now? ---
[GitHub] carbondata pull request #2842: [CARBONDATA-3032] Remove carbon.blocklet.size...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2842#discussion_r227356312 --- Diff: docs/sdk-guide.md --- @@ -24,7 +24,8 @@ CarbonData provides SDK to facilitate # SDK Writer -In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader. +In the carbon jars package, there exist a carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader. +If you want to use SDK, it needs other carbon jar or you can use carbondata-sdk.jar. --- End diff -- What does this mean? User can 1. only use SDK jar 2. use other carbon jars instread (SDK jar not included?) Is my understanding correct? ---
[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/2751 Since we will leave this problem as it is, I'll close this PR now. ---
[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2824#discussion_r227352023 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala --- @@ -172,33 +172,8 @@ with Serializable { dataSchema: StructType, context: TaskAttemptContext): OutputWriter = { val model = CarbonTableOutputFormat.getLoadModel(context.getConfiguration) -val isCarbonUseMultiDir = CarbonProperties.getInstance().isUseMultiTempDir -var storeLocation: Array[String] = Array[String]() -val isCarbonUseLocalDir = CarbonProperties.getInstance() - .getProperty("carbon.use.local.dir", "false").equalsIgnoreCase("true") - - val taskNumber = generateTaskNumber(path, context, model.getSegmentId) -val tmpLocationSuffix = - File.separator + "carbon" + System.nanoTime() + File.separator + taskNumber -if (isCarbonUseLocalDir) { - val yarnStoreLocations = Util.getConfiguredLocalDirs(SparkEnv.get.conf) - if (!isCarbonUseMultiDir && null != yarnStoreLocations && yarnStoreLocations.nonEmpty) { -// use single dir -storeLocation = storeLocation :+ - (yarnStoreLocations(Random.nextInt(yarnStoreLocations.length)) + tmpLocationSuffix) -if (storeLocation == null || storeLocation.isEmpty) { - storeLocation = storeLocation :+ -(System.getProperty("java.io.tmpdir") + tmpLocationSuffix) -} - } else { -// use all the yarn dirs -storeLocation = yarnStoreLocations.map(_ + tmpLocationSuffix) - } -} else { - storeLocation = -storeLocation :+ (System.getProperty("java.io.tmpdir") + tmpLocationSuffix) -} +val storeLocation = CommonUtil.getTempStoreLocations(taskNumber) --- End diff -- yes, I also noticed this. The suffix has no meanings, just used to separate each thread's output. I think it's a mistake by hand -- that's why we need to extract these code to avoid problems like this. ---