[GitHub] carbondata pull request #2848: [CARBONDATA-3036] Cache Columns And Refresh T...

2018-10-23 Thread manishnalla1994
GitHub user manishnalla1994 opened a pull request:

https://github.com/apache/carbondata/pull/2848

[CARBONDATA-3036] Cache Columns And Refresh Table Isuue Fix

Refresh Table Issue : Refresh Table command acting in case sensitive manner.
Cache Columns Issue : Results inconsistent when cache is set but min/max 
exceeds. Columns are dictionary excluded.

Fix 1 : Path for carbon file was been taken as whatever table name given in 
the query(Lowercase/Uppercase). Changed it to lowercase.
Fix 2 : MinMaxFlag array was not set according to the columns to be cached 
giving inconsistent results. Changed it according to the min/max values array 
for whatever columns given in Cache Columns only.

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishnalla1994/carbondata 
RefreshAndCacheColumnsFix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2848.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2848


commit 7158960c750cf6ed7243e1c7c4bbc44fe158326c
Author: Manish Nalla 
Date:   2018-10-24T05:45:15Z

CacheAndRefreshIsuueFix




---


[GitHub] carbondata issue #2845: [CARBONDATA-3039] Fix Custom Deterministic Expressio...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2845
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/981/



---


[jira] [Created] (CARBONDATA-3039) Fix Custom Deterministic Expression for rand() UDF

2018-10-23 Thread Indhumathi Muthumurugesh (JIRA)
Indhumathi Muthumurugesh created CARBONDATA-3039:


 Summary: Fix Custom Deterministic Expression for rand() UDF
 Key: CARBONDATA-3039
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3039
 Project: CarbonData
  Issue Type: Improvement
Reporter: Indhumathi Muthumurugesh






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2814
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1193/



---


[GitHub] carbondata pull request #2842: [CARBONDATA-3032] Remove carbon.blocklet.size...

2018-10-23 Thread xubo245
Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2842#discussion_r227635359
  
--- Diff: docs/sdk-guide.md ---
@@ -24,7 +24,8 @@ CarbonData provides SDK to facilitate
 
 # SDK Writer
 
-In the carbon jars package, there exist a 
carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader.
+In the carbon jars package, there exist a 
carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader. 
+If you want to use SDK, it needs other carbon jar or you can use 
carbondata-sdk.jar.
--- End diff --

 user use carbondata-store-sdk-x.x.x-SNAPSHOT.jar is not enough, it needs 
other carbon jar.

but if user use carbondata-sdk.jar, it's enough



---


[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2814
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9246/



---


[jira] [Resolved] (CARBONDATA-3008) make yarn-local and multiple dir for temp data enable by default

2018-10-23 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-3008.
--
   Resolution: Fixed
Fix Version/s: 1.5.1

> make yarn-local and multiple dir for temp data enable by default
> 
>
> Key: CARBONDATA-3008
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3008
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: xuchuanyin
>Priority: Minor
> Fix For: 1.5.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> About a year ago, we introduced 'multiple dirs for temp data during data 
> loading' to solve disk hotspot problem. For about one years' usage in 
> productive environment, this feature turns to be effective and correct. So 
> here I propose to enable the related parameters by default. The related 
> parameters contains:
> `carbon.use.local.dir` : Currently it is `false` by default, we will turn it 
> to `true` by default;
> `carbon.user.multiple.dir` : Currently it is `false` by default, we will turn 
> it to `true` by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

2018-10-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2824


---


[GitHub] carbondata issue #2824: [CARBONDATA-3008] Optimize default value for multipl...

2018-10-23 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2824
  
LGTM


---


[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2814
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/980/



---


[GitHub] carbondata pull request #2843: [CARBONDATA-3034] Carding parameters,Organize...

2018-10-23 Thread ndwangsen
Github user ndwangsen commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2843#discussion_r227626868
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -23,86 +23,26 @@
 import org.apache.carbondata.core.util.CarbonProperty;
 
 public final class CarbonCommonConstants {
-  /**
-   * surrogate value of null
-   */
-  public static final int DICT_VALUE_NULL = 1;
-  /**
-   * surrogate value of null for direct dictionary
-   */
-  public static final int DIRECT_DICT_VALUE_NULL = 1;
-  /**
-   * integer size in bytes
-   */
-  public static final int INT_SIZE_IN_BYTE = 4;
-  /**
-   * short size in bytes
-   */
-  public static final int SHORT_SIZE_IN_BYTE = 2;
-  /**
-   * DOUBLE size in bytes
-   */
-  public static final int DOUBLE_SIZE_IN_BYTE = 8;
-  /**
-   * LONG size in bytes
-   */
-  public static final int LONG_SIZE_IN_BYTE = 8;
-  /**
-   * byte to KB conversion factor
-   */
-  public static final int BYTE_TO_KB_CONVERSION_FACTOR = 1024;
-  /**
-   * BYTE_ENCODING
-   */
-  public static final String BYTE_ENCODING = "ISO-8859-1";
-  /**
-   * measure meta data file name
-   */
-  public static final String MEASURE_METADATA_FILE_NAME = "/msrMetaData_";
-
-  /**
-   * set the segment ids to query from the table
-   */
-  public static final String CARBON_INPUT_SEGMENTS = 
"carbon.input.segments.";
-
-  /**
-   * key prefix for set command. 
'carbon.datamap.visible.dbName.tableName.dmName = false' means
-   * that the query on 'dbName.table' will not use the datamap 'dmName'
-   */
-  @InterfaceStability.Unstable
-  public static final String CARBON_DATAMAP_VISIBLE = 
"carbon.datamap.visible.";
-
-  /**
-   * Fetch and validate the segments.
-   * Used for aggregate table load as segment validation is not required.
-   */
-  public static final String VALIDATE_CARBON_INPUT_SEGMENTS = 
"validate.carbon.input.segments.";
 
+  private CarbonCommonConstants() {
+  }
   /**
--- End diff --

ok,i modify it


---


[jira] [Updated] (CARBONDATA-3038) Refactor dynamic configuration

2018-10-23 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li updated CARBONDATA-3038:
-
Fix Version/s: 1.5.1
  Description: 
Refactor dynamic configuration for carbon:
1. Decide and collect all dynamic configurations which can be SET in carbondata 
application like in beeline.
2. For every dynamic configuration, use an annotation to tag them. (re-use the 
CarbonProperty annotation but change its name). This annotation should be used 
for validation when user invoking SET command.
  Summary: Refactor dynamic configuration  (was: Refactor dynamic 
confiugration)

> Refactor dynamic configuration
> --
>
> Key: CARBONDATA-3038
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3038
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Priority: Major
> Fix For: 1.5.1
>
>
> Refactor dynamic configuration for carbon:
> 1. Decide and collect all dynamic configurations which can be SET in 
> carbondata application like in beeline.
> 2. For every dynamic configuration, use an annotation to tag them. (re-use 
> the CarbonProperty annotation but change its name). This annotation should be 
> used for validation when user invoking SET command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3038) Refactor dynamic confiugration

2018-10-23 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-3038:


 Summary: Refactor dynamic confiugration
 Key: CARBONDATA-3038
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3038
 Project: CarbonData
  Issue Type: Improvement
Reporter: Jacky Li






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2824#discussion_r227625617
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
 ---
@@ -172,33 +172,8 @@ with Serializable {
   dataSchema: StructType,
   context: TaskAttemptContext): OutputWriter = {
 val model = 
CarbonTableOutputFormat.getLoadModel(context.getConfiguration)
-val isCarbonUseMultiDir = 
CarbonProperties.getInstance().isUseMultiTempDir
-var storeLocation: Array[String] = Array[String]()
-val isCarbonUseLocalDir = CarbonProperties.getInstance()
-  .getProperty("carbon.use.local.dir", 
"false").equalsIgnoreCase("true")
-
-
 val taskNumber = generateTaskNumber(path, context, 
model.getSegmentId)
-val tmpLocationSuffix =
-  File.separator + "carbon" + System.nanoTime() + File.separator + 
taskNumber
-if (isCarbonUseLocalDir) {
-  val yarnStoreLocations = 
Util.getConfiguredLocalDirs(SparkEnv.get.conf)
-  if (!isCarbonUseMultiDir && null != yarnStoreLocations && 
yarnStoreLocations.nonEmpty) {
-// use single dir
-storeLocation = storeLocation :+
-  
(yarnStoreLocations(Random.nextInt(yarnStoreLocations.length)) + 
tmpLocationSuffix)
-if (storeLocation == null || storeLocation.isEmpty) {
-  storeLocation = storeLocation :+
-(System.getProperty("java.io.tmpdir") + tmpLocationSuffix)
-}
-  } else {
-// use all the yarn dirs
-storeLocation = yarnStoreLocations.map(_ + tmpLocationSuffix)
-  }
-} else {
-  storeLocation =
-storeLocation :+ (System.getProperty("java.io.tmpdir") + 
tmpLocationSuffix)
-}
+val storeLocation = CommonUtil.getTempStoreLocations(taskNumber)
--- End diff --

@jackylk @QiangCai 
I've debugged and reviewed the code again and found it works as expected: 
all the emp locations were cleared.

The `TempStoreLocations` generated at the begining of data loading is just 
the same as that at the closure of `CarbonTableOutputFormat` in which these 
locations will be cleared.


---


[GitHub] carbondata pull request #2843: [CARBONDATA-3034] Carding parameters,Organize...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2843#discussion_r227624952
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -23,86 +23,26 @@
 import org.apache.carbondata.core.util.CarbonProperty;
 
 public final class CarbonCommonConstants {
-  /**
-   * surrogate value of null
-   */
-  public static final int DICT_VALUE_NULL = 1;
-  /**
-   * surrogate value of null for direct dictionary
-   */
-  public static final int DIRECT_DICT_VALUE_NULL = 1;
-  /**
-   * integer size in bytes
-   */
-  public static final int INT_SIZE_IN_BYTE = 4;
-  /**
-   * short size in bytes
-   */
-  public static final int SHORT_SIZE_IN_BYTE = 2;
-  /**
-   * DOUBLE size in bytes
-   */
-  public static final int DOUBLE_SIZE_IN_BYTE = 8;
-  /**
-   * LONG size in bytes
-   */
-  public static final int LONG_SIZE_IN_BYTE = 8;
-  /**
-   * byte to KB conversion factor
-   */
-  public static final int BYTE_TO_KB_CONVERSION_FACTOR = 1024;
-  /**
-   * BYTE_ENCODING
-   */
-  public static final String BYTE_ENCODING = "ISO-8859-1";
-  /**
-   * measure meta data file name
-   */
-  public static final String MEASURE_METADATA_FILE_NAME = "/msrMetaData_";
-
-  /**
-   * set the segment ids to query from the table
-   */
-  public static final String CARBON_INPUT_SEGMENTS = 
"carbon.input.segments.";
-
-  /**
-   * key prefix for set command. 
'carbon.datamap.visible.dbName.tableName.dmName = false' means
-   * that the query on 'dbName.table' will not use the datamap 'dmName'
-   */
-  @InterfaceStability.Unstable
-  public static final String CARBON_DATAMAP_VISIBLE = 
"carbon.datamap.visible.";
-
-  /**
-   * Fetch and validate the segments.
-   * Used for aggregate table load as segment validation is not required.
-   */
-  public static final String VALIDATE_CARBON_INPUT_SEGMENTS = 
"validate.carbon.input.segments.";
 
+  private CarbonCommonConstants() {
+  }
   /**
--- End diff --

To make this description more easier to find, suggest to use:
```
 
//
 // System level property start here
 
//
 
 // System level property is the global property for CarbonData 
 // application, these properties are stored in a singleton instance
 // so that all processing logic in CarbonData uses the same 
 // property value 
```

And you can write the description for Table level property and others


---


[jira] [Resolved] (CARBONDATA-3030) Remove no use parameter in test case

2018-10-23 Thread Jacky Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li resolved CARBONDATA-3030.
--
   Resolution: Fixed
Fix Version/s: 1.5.1

> Remove no use parameter in test case
> 
>
> Key: CARBONDATA-3030
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3030
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
> Fix For: 1.5.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Remove no use parameter in test case
> 1. remove persistSchema parameter in SDK test case
> 2. remove isTransactional parameter in SDK test case
>  because https://github.com/apache/carbondata/pull/2749 remove the parameter 
> in SDK carbonWriter



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] carbondata pull request #2839: [CARBONDATA-3030] Remove no use parameter in ...

2018-10-23 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/2839


---


[GitHub] carbondata issue #2839: [CARBONDATA-3030] Remove no use parameter in test ca...

2018-10-23 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2839
  
LGTM, the coverage decrease 0.005% is very small for the whole project


---


[GitHub] carbondata pull request #2836: [CARBONDATA-3027] Increase unsafe working mem...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2836#discussion_r227623050
  
--- Diff: store/sdk/src/main/resources/log4j.properties ---
@@ -0,0 +1,11 @@
+# Root logger option
+log4j.rootLogger=INFO,stdout
+
+
+# Redirect log messages to console
+log4j.appender.debug=org.apache.log4j.RollingFileAppender
+log4j.appender.stdout=org.apache.log4j.ConsoleAppender
+log4j.appender.stdout.Target=System.out
+log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
+log4j.appender.stdout.layout.ConversionPattern=%d{-MM-dd HH:mm:ss} 
%-5p %c{1}:%L - %m%n
--- End diff --

Now we are using log4j Logger directly, you can add %C also in the pattern 
to print the class name


---


[GitHub] carbondata pull request #2836: [CARBONDATA-3027] Increase unsafe working mem...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2836#discussion_r227622952
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1234,7 +1234,7 @@
 
   @CarbonProperty
   public static final String UNSAFE_WORKING_MEMORY_IN_MB = 
"carbon.unsafe.working.memory.in.mb";
-  public static final String UNSAFE_WORKING_MEMORY_IN_MB_DEFAULT = "512";
+  public static final String UNSAFE_WORKING_MEMORY_IN_MB_DEFAULT = "1024";
--- End diff --

Can you describe the issue you encountered using the original default value?


---


[GitHub] carbondata issue #2836: [CARBONDATA-3027] Increase unsafe working memory def...

2018-10-23 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/2836
  
Can you describe a issue you encounter using the original default value?


---


[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2823#discussion_r227621527
  
--- Diff: 
integration/spark-datasource/src/main/spark2.1andspark2.2/org/apache/spark/sql/CarbonVectorProxy.java
 ---
@@ -150,127 +140,189 @@ public void reset() {
 columnarBatch.reset();
 }
 
-public void putRowToColumnBatch(int rowId, Object value, int offset) {
-org.apache.spark.sql.types.DataType t = dataType(offset);
-if (null == value) {
-putNull(rowId, offset);
-} else {
-if (t == org.apache.spark.sql.types.DataTypes.BooleanType) {
-putBoolean(rowId, (boolean) value, offset);
-} else if (t == org.apache.spark.sql.types.DataTypes.ByteType) 
{
-putByte(rowId, (byte) value, offset);
-} else if (t == 
org.apache.spark.sql.types.DataTypes.ShortType) {
-putShort(rowId, (short) value, offset);
-} else if (t == 
org.apache.spark.sql.types.DataTypes.IntegerType) {
-putInt(rowId, (int) value, offset);
-} else if (t == org.apache.spark.sql.types.DataTypes.LongType) 
{
-putLong(rowId, (long) value, offset);
-} else if (t == 
org.apache.spark.sql.types.DataTypes.FloatType) {
-putFloat(rowId, (float) value, offset);
-} else if (t == 
org.apache.spark.sql.types.DataTypes.DoubleType) {
-putDouble(rowId, (double) value, offset);
-} else if (t == 
org.apache.spark.sql.types.DataTypes.StringType) {
-UTF8String v = (UTF8String) value;
-putByteArray(rowId, v.getBytes(), offset);
-} else if (t instanceof 
org.apache.spark.sql.types.DecimalType) {
-DecimalType dt = (DecimalType) t;
-Decimal d = Decimal.fromDecimal(value);
-if (dt.precision() <= Decimal.MAX_INT_DIGITS()) {
-putInt(rowId, (int) d.toUnscaledLong(), offset);
-} else if (dt.precision() <= Decimal.MAX_LONG_DIGITS()) {
-putLong(rowId, d.toUnscaledLong(), offset);
-} else {
-final BigInteger integer = 
d.toJavaBigDecimal().unscaledValue();
-byte[] bytes = integer.toByteArray();
-putByteArray(rowId, bytes, 0, bytes.length, offset);
+
+public static class ColumnVectorProxy {
+
+private ColumnVector vector;
+
+public ColumnVectorProxy(ColumnarBatch columnarBatch, int ordinal) 
{
+this.vector = columnarBatch.column(ordinal);
+}
+
+public void putRowToColumnBatch(int rowId, Object value, int 
offset) {
+org.apache.spark.sql.types.DataType t = dataType(offset);
--- End diff --

It seems the offset param is not used in dataType, and please change the 
function name of dataType


---


[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2823#discussion_r227620816
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/scanner/LazyBlockletLoad.java
 ---
@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.scan.scanner;
+
+import java.io.IOException;
+
+import org.apache.carbondata.core.datastore.FileReader;
+import org.apache.carbondata.core.datastore.chunk.AbstractRawColumnChunk;
+import 
org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk;
+import 
org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk;
+import org.apache.carbondata.core.scan.executor.infos.BlockExecutionInfo;
+import org.apache.carbondata.core.scan.processor.RawBlockletColumnChunks;
+import org.apache.carbondata.core.stats.QueryStatistic;
+import org.apache.carbondata.core.stats.QueryStatisticsConstants;
+import org.apache.carbondata.core.stats.QueryStatisticsModel;
+
+/**
+ * Reads the blocklet column chunks lazily, it means it reads the column 
chunks from disk when
+ * execution engine wants to access it.
+ * It is useful in case of filter queries with high cardinality columns.
+ */
+public class LazyBlockletLoad {
+
+  private RawBlockletColumnChunks rawBlockletColumnChunks;
+
+  private BlockExecutionInfo blockExecutionInfo;
+
+  private LazyChunkWrapper[] dimLazyWrapperChunks;
+
+  private LazyChunkWrapper[] msrLazyWrapperChunks;
--- End diff --

can we unify the processing of `dimLazyWrapperChunks` and 
`msrLazyWrapperChunks` so that we can use one flow for them?


---


[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2823#discussion_r227620665
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/directread/AbstractCarbonColumnarVector.java
 ---
@@ -130,4 +131,9 @@ public CarbonColumnVector getDictionaryVector() {
   public void convert() {
 // Do nothing
   }
+
+  @Override
+  public void setLazyPage(LazyPageLoad lazyPage) {
+throw new UnsupportedOperationException("Not allowed from here");
--- End diff --

Put the class name in the message, it is easier for debugging


---


[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2823#discussion_r227620364
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java
 ---
@@ -145,6 +147,8 @@
 
   protected QueryStatisticsModel queryStatisticsModel;
 
+  protected LazyBlockletLoad lazyBlockletLoad;
--- End diff --

Actually I am confused with the name xxxLoad, why is it called load? I am 
wondering is there a common name for this technique used in presto?


---


[GitHub] carbondata pull request #2823: [CARBONDATA-3015] Support Lazy load in carbon...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2823#discussion_r227620200
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java
 ---
@@ -48,7 +48,11 @@ public int getLengthFromBuffer(ByteBuffer buffer) {
   public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType 
type, int lengthSize,
   int numberOfRows) {
 if (type == DataTypes.STRING) {
-  return new StringVectorFiller(lengthSize, numberOfRows);
+  if (lengthSize > 2) {
--- End diff --

2 is magic number, can you change to constant or add a function to make it 
more readable


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227619567
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java ---
@@ -124,6 +124,11 @@
 
   private boolean preFetchData = true;
 
+  /**
+   * It fills the vector directly from decoded column page with out any 
staging and conversions
--- End diff --

"It fills the vector", can you give more detail for which vector? and 
describe how spark/presto is integrated with this?


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227619641
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/result/BlockletScannedResult.java
 ---
@@ -72,6 +72,11 @@
*/
   private int[] pageFilteredRowCount;
 
+  /**
+   * Filtered pages to be decoded and loaded to vector.
+   */
+  private int[] pagesFiltered;
--- End diff --

```suggestion
  private int[] pagesIdFiltered;
```


---


[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2814
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/979/



---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227619247
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/executor/impl/AbstractQueryExecutor.java
 ---
@@ -478,6 +478,17 @@ private BlockExecutionInfo 
getBlockExecutionInfoForBlock(QueryModel queryModel,
 } else {
   blockExecutionInfo.setPrefetchBlocklet(queryModel.isPreFetchData());
 }
+// In case of fg datamap it should not go to direct fill.
+boolean fgDataMapPathPresent = false;
+for (TableBlockInfo blockInfo : queryModel.getTableBlockInfos()) {
+  fgDataMapPathPresent = blockInfo.getDataMapWriterPath() != null;
+  if (fgDataMapPathPresent) {
+break;
--- End diff --

Is it possible to set the queryModel.setDirectVectorFill directly?


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227619046
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/collector/impl/DictionaryBasedVectorResultCollector.java
 ---
@@ -198,4 +219,48 @@ void fillColumnVectorDetails(CarbonColumnarBatch 
columnarBatch, int rowCounter,
 }
   }
 
+  private void collectResultInColumnarBatchDirect(BlockletScannedResult 
scannedResult,
--- End diff --

add comment for this function


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227618801
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveIntegralCodec.java
 ---
@@ -248,6 +269,143 @@ public double decodeDouble(float value) {
 public double decodeDouble(double value) {
   throw new RuntimeException("internal error: " + debugInfo());
 }
+
+@Override
+public void decodeAndFillVector(ColumnPage columnPage, 
ColumnVectorInfo vectorInfo) {
+  CarbonColumnVector vector = vectorInfo.vector;
+  BitSet nullBits = columnPage.getNullBits();
+  DataType vectorDataType = vector.getType();
+  DataType pageDataType = columnPage.getDataType();
+  int pageSize = columnPage.getPageSize();
+  BitSet deletedRows = vectorInfo.deletedRows;
+  fillVector(columnPage, vector, vectorDataType, pageDataType, 
pageSize, vectorInfo);
+  if (deletedRows == null || deletedRows.isEmpty()) {
+for (int i = nullBits.nextSetBit(0); i >= 0; i = 
nullBits.nextSetBit(i + 1)) {
+  vector.putNull(i);
+}
+  }
+}
+
+private void fillVector(ColumnPage columnPage, CarbonColumnVector 
vector,
+DataType vectorDataType, DataType pageDataType, int pageSize, 
ColumnVectorInfo vectorInfo) {
+  if (pageDataType == DataTypes.BOOLEAN || pageDataType == 
DataTypes.BYTE) {
+byte[] byteData = columnPage.getBytePage();
+if (vectorDataType == DataTypes.SHORT) {
+  for (int i = 0; i < pageSize; i++) {
+vector.putShort(i, (short) byteData[i]);
+  }
+} else if (vectorDataType == DataTypes.INT) {
+  for (int i = 0; i < pageSize; i++) {
+vector.putInt(i, (int) byteData[i]);
+  }
+} else if (vectorDataType == DataTypes.LONG) {
+  for (int i = 0; i < pageSize; i++) {
+vector.putLong(i, byteData[i]);
+  }
+} else if (vectorDataType == DataTypes.TIMESTAMP) {
+  for (int i = 0; i < pageSize; i++) {
+vector.putLong(i, byteData[i] * 1000);
+  }
+} else if (vectorDataType == DataTypes.BOOLEAN) {
+  vector.putBytes(0, pageSize, byteData, 0);
+
--- End diff --

remove empty line


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227618507
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java
 ---
@@ -29,6 +31,12 @@
*/
   ColumnPage decode(byte[] input, int offset, int length) throws 
MemoryException, IOException;
 
+  /**
+   *  Apply decoding algorithm on input byte array and fill the vector 
here.
+   */
+  void decodeAndFillVector(byte[] input, int offset, int length, 
ColumnVectorInfo vectorInfo,
+  BitSet nullBits, boolean isLVEncoded) throws MemoryException, 
IOException;
--- End diff --

I feel it is not good to add `isLVEncoded` just for LVEncoded, can we pass 
a more generic parameter, since this is a common class for all Decoder


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227618222
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java
 ---
@@ -66,6 +66,14 @@ public abstract ColumnPageEncoder 
createEncoder(TableSpec.ColumnSpec columnSpec,
*/
   public ColumnPageDecoder createDecoder(List encodings, 
List encoderMetas,
   String compressor) throws IOException {
+return createDecoder(encodings, encoderMetas, compressor, false);
+  }
+
+  /**
+   * Return new decoder based on encoder metadata read from file
--- End diff --

In the comment, can you describe what is the behavior when `fullVectorFill` 
is true?


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227618184
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java
 ---
@@ -66,6 +66,14 @@ public abstract ColumnPageEncoder 
createEncoder(TableSpec.ColumnSpec columnSpec,
*/
   public ColumnPageDecoder createDecoder(List encodings, 
List encoderMetas,
   String compressor) throws IOException {
+return createDecoder(encodings, encoderMetas, compressor, false);
+  }
+
+  /**
+   * Return new decoder based on encoder metadata read from file
--- End diff --

In the comment, can you describe what is the behavior when `fullVectorFill` 
is true?


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227617936
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageEncoderMeta.java
 ---
@@ -49,6 +49,8 @@
   // Make it protected for RLEEncoderMeta
   protected String compressorName;
 
+  private transient boolean fillCompleteVector;
--- End diff --

add comment for this variable


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227618017
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
 ---
@@ -176,7 +179,7 @@ private static ColumnPage 
getDecimalColumnPage(TableSpec.ColumnSpec columnSpec,
 rowOffset.putInt(counter, offset);
 
 VarLengthColumnPageBase page;
-if (unsafe) {
+if (unsafe && !meta.isFillCompleteVector()) {
--- End diff --

many place check like this, can we make a function for it and make it more 
readable by give proper function name?


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227617725
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/SafeDecimalColumnPage.java
 ---
@@ -193,6 +193,30 @@ public void convertValue(ColumnPageValueConverter 
codec) {
 }
   }
 
+  @Override public byte[] getBytePage() {
--- End diff --

move Override to previous line


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227617413
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java
 ---
@@ -0,0 +1,278 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.chunk.store.impl.safe;
+
+import java.nio.ByteBuffer;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.util.ByteUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+
+public abstract class AbstractNonDictionaryVectorFiller {
+
+  protected int lengthSize;
+  protected int numberOfRows;
+
+  public AbstractNonDictionaryVectorFiller(int lengthSize, int 
numberOfRows) {
+this.lengthSize = lengthSize;
+this.numberOfRows = numberOfRows;
+  }
+
+  public abstract void fillVector(byte[] data, CarbonColumnVector vector, 
ByteBuffer buffer);
+
+  public int getLengthFromBuffer(ByteBuffer buffer) {
+return buffer.getShort();
+  }
+}
+
+class NonDictionaryVectorFillerFactory {
+
+  public static AbstractNonDictionaryVectorFiller getVectorFiller(DataType 
type, int lengthSize,
+  int numberOfRows) {
+if (type == DataTypes.STRING) {
+  return new StringVectorFiller(lengthSize, numberOfRows);
+} else if (type == DataTypes.VARCHAR) {
+  return new LongStringVectorFiller(lengthSize, numberOfRows);
+} else if (type == DataTypes.TIMESTAMP) {
+  return new TimeStampVectorFiller(lengthSize, numberOfRows);
+} else if (type == DataTypes.BOOLEAN) {
+  return new BooleanVectorFiller(lengthSize, numberOfRows);
+} else if (type == DataTypes.SHORT) {
+  return new ShortVectorFiller(lengthSize, numberOfRows);
+} else if (type == DataTypes.INT) {
+  return new IntVectorFiller(lengthSize, numberOfRows);
+} else if (type == DataTypes.LONG) {
+  return new LongVectorFiller(lengthSize, numberOfRows);
+} else {
+  throw new UnsupportedOperationException("Not supported datatype : " 
+ type);
+}
+
+  }
+
+}
+
+class StringVectorFiller extends AbstractNonDictionaryVectorFiller {
+
+  public StringVectorFiller(int lengthSize, int numberOfRows) {
+super(lengthSize, numberOfRows);
+  }
+
+  @Override
+  public void fillVector(byte[] data, CarbonColumnVector vector, 
ByteBuffer buffer) {
+// start position will be used to store the current data position
+int startOffset = 0;
+// as first position will be start from length of bytes as data is 
stored first in the memory
+// block we need to skip first two bytes this is because first two 
bytes will be length of the
+// data which we have to skip
+int currentOffset = lengthSize;
+ByteUtil.UnsafeComparer comparator = ByteUtil.UnsafeComparer.INSTANCE;
+for (int i = 0; i < numberOfRows - 1; i++) {
+  buffer.position(startOffset);
+  startOffset += getLengthFromBuffer(buffer) + lengthSize;
+  int length = startOffset - (currentOffset);
+  if 
(comparator.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, 0,
+  CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY.length, data, 
currentOffset, length)) {
+vector.putNull(i);
+  } else {
+vector.putByteArray(i, currentOffset, length, data);
+  }
+  currentOffset = startOffset + lengthSize;
+}
+// Handle last row
+int length = (data.length - currentOffset);
+if (comparator.equals(CarbonCommonConstants.MEMBER_DEFAULT_VAL_ARRAY, 
0,
+

[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227617094
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java
 ---
@@ -0,0 +1,278 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.chunk.store.impl.safe;
+
+import java.nio.ByteBuffer;
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
+import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
+import org.apache.carbondata.core.util.ByteUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+
+public abstract class AbstractNonDictionaryVectorFiller {
--- End diff --

For public class, please add interface annotation


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227617004
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1845,6 +1845,18 @@
   public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MIN = 10;
   public static final int CARBON_MINMAX_ALLOWED_BYTE_COUNT_MAX = 1000;
 
+  /**
+   * When enabled complete row filters will be handled by carbon in case 
of vector.
+   * If it is disabled then only page level pruning will be done by carbon 
and row level filtering
+   * will be done by spark for vector.
+   * There is no change in flow for non-vector based queries.
--- End diff --

can you also add in which case it is suggested to set to false? since 
default is true


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227616799
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/VariableLengthDimensionColumnPage.java
 ---
@@ -54,10 +75,15 @@ public VariableLengthDimensionColumnPage(byte[] 
dataChunks, int[] invertedIndex,
 }
 dataChunkStore = DimensionChunkStoreFactory.INSTANCE
 .getDimensionChunkStore(0, isExplicitSorted, numberOfRows, 
totalSize, dimStoreType,
-dictionary);
-dataChunkStore.putArray(invertedIndex, invertedIndexReverse, 
dataChunks);
+dictionary, vectorInfo != null);
+if (vectorInfo != null) {
+  dataChunkStore.fillVector(invertedIndex, invertedIndexReverse, 
dataChunks, vectorInfo);
+} else {
+  dataChunkStore.putArray(invertedIndex, invertedIndexReverse, 
dataChunks);
+}
   }
 
+
--- End diff --

remove this


---


[GitHub] carbondata pull request #2819: [CARBONDATA-3012] Added support for full scan...

2018-10-23 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2819#discussion_r227616722
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/chunk/impl/MeasureRawColumnChunk.java
 ---
@@ -105,6 +106,22 @@ public ColumnPage convertToColumnPageWithOutCache(int 
index) {
 }
   }
 
+  /**
+   * Convert raw data with specified page number processed to 
DimensionColumnDataChunk and fill the
+   * vector
+   *
+   * @param pageNumber page number to decode and fill the vector
+   * @param vectorInfo vector to be filled with column page
+   */
+  public void convertToColumnPageAndFillVector(int pageNumber, 
ColumnVectorInfo vectorInfo) {
+assert pageNumber < pagesCount;
+try {
+  chunkReader.decodeColumnPageAndFillVector(this, pageNumber, 
vectorInfo);
+} catch (IOException | MemoryException e) {
+  throw new RuntimeException(e);
--- End diff --

Why not throw e directly?


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r227615426
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
+  } finally {
+gzos.close();
+  }
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bt.toByteArray();
--- End diff --

why `bt` is still open? 


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r227615561
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
+  } finally {
+gzos.close();
+  }
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bt.toByteArray();
+  }
+
+  /*
+   * Method called for decompressing the data and
+   * return a byte array
+   */
+  private byte[] decompressData(byte[] data) {
+
+ByteArrayInputStream bt = new ByteArrayInputStream(data);
+ByteArrayOutputStream bot = new ByteArrayOutputStream();
+
+try {
+  GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt);
+  byte[] buffer = new byte[1024];
+  int len;
+
+  while ((len = gzis.read(buffer)) != -1) {
+bot.write(buffer, 0, len);
+  }
+
+} catch (IOException e) {
+  e.printStackTrace();
--- End diff --

please optimize the logging!


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r227615489
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
--- End diff --

please optimize the logging!


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r227615581
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
+  } finally {
+gzos.close();
+  }
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bt.toByteArray();
+  }
+
+  /*
+   * Method called for decompressing the data and
+   * return a byte array
+   */
+  private byte[] decompressData(byte[] data) {
+
+ByteArrayInputStream bt = new ByteArrayInputStream(data);
+ByteArrayOutputStream bot = new ByteArrayOutputStream();
+
+try {
+  GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt);
+  byte[] buffer = new byte[1024];
+  int len;
+
+  while ((len = gzis.read(buffer)) != -1) {
+bot.write(buffer, 0, len);
+  }
+
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bot.toByteArray();
--- End diff --

`bot` not closed


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r227615513
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
+  } finally {
+gzos.close();
+  }
+} catch (IOException e) {
+  e.printStackTrace();
--- End diff --

please optimize the logging!


---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2847#discussion_r227615128
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 ---
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.DoubleBuffer;
+import java.nio.FloatBuffer;
+import java.nio.IntBuffer;
+import java.nio.LongBuffer;
+import java.nio.ShortBuffer;
+
+import org.apache.carbondata.core.util.ByteUtil;
+
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream;
+import 
org.apache.commons.compress.compressors.gzip.GzipCompressorOutputStream;
+
+public class GzipCompressor implements Compressor {
+
+  public GzipCompressor() {
+  }
+
+  @Override public String getName() {
+return "gzip";
+  }
+
+  /*
+   * Method called for compressing the data and
+   * return a byte array
+   */
+  private byte[] compressData(byte[] data) {
+
+ByteArrayOutputStream bt = new ByteArrayOutputStream();
+try {
+  GzipCompressorOutputStream gzos = new GzipCompressorOutputStream(bt);
+  try {
+gzos.write(data);
+  } catch (IOException e) {
+e.printStackTrace();
+  } finally {
+gzos.close();
+  }
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bt.toByteArray();
+  }
+
+  /*
+   * Method called for decompressing the data and
+   * return a byte array
+   */
+  private byte[] decompressData(byte[] data) {
+
+ByteArrayInputStream bt = new ByteArrayInputStream(data);
+ByteArrayOutputStream bot = new ByteArrayOutputStream();
+
+try {
+  GzipCompressorInputStream gzis = new GzipCompressorInputStream(bt);
+  byte[] buffer = new byte[1024];
+  int len;
+
+  while ((len = gzis.read(buffer)) != -1) {
+bot.write(buffer, 0, len);
+  }
+
+} catch (IOException e) {
+  e.printStackTrace();
+}
+
+return bot.toByteArray();
+  }
+
+  @Override public byte[] compressByte(byte[] unCompInput) {
+return compressData(unCompInput);
+  }
+
+  @Override public byte[] compressByte(byte[] unCompInput, int byteSize) {
+return compressData(unCompInput);
+  }
+
+  @Override public byte[] unCompressByte(byte[] compInput) {
+return decompressData(compInput);
+  }
+
+  @Override public byte[] unCompressByte(byte[] compInput, int offset, int 
length) {
+byte[] data = new byte[length];
+System.arraycopy(compInput, offset, data, 0, length);
+return decompressData(data);
+  }
+
+  @Override public byte[] compressShort(short[] unCompInput) {
+ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * 
ByteUtil.SIZEOF_SHORT);
+unCompBuffer.asShortBuffer().put(unCompInput);
+return compressData(unCompBuffer.array());
+  }
+
+  @Override public short[] unCompressShort(byte[] compInput, int offset, 
int length) {
+byte[] unCompArray = unCompressByte(compInput, offset, length);
+ShortBuffer unCompBuffer = 
ByteBuffer.wrap(unCompArray).asShortBuffer();
+short[] shorts = new short[unCompArray.length / ByteUtil.SIZEOF_SHORT];
+unCompBuffer.get(shorts);
+return shorts;
+  }
+
+  @Override public byte[] compressInt(int[] unCompInput) {
+ByteBuffer unCompBuffer = ByteBuffer.allocate(unCompInput.length * 
ByteUtil.SIZEOF_INT);
+unCompBuffer.asIntBuffer().put(unCompInput);
+return compressData(unCompBuffer.array());
+  }
+
+  @Override public int[] unCompressInt(byte[] 

[GitHub] carbondata issue #2846: [WIP] Added direct fill

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2846
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1191/



---


[GitHub] carbondata issue #2846: [WIP] Added direct fill

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2846
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/978/



---


[GitHub] carbondata issue #2846: [WIP] Added direct fill

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2846
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9244/



---


[GitHub] carbondata issue #2846: [WIP] Added direct fill

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2846
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1190/



---


[GitHub] carbondata issue #2846: [WIP] Added direct fill

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2846
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/977/



---


[GitHub] carbondata issue #2846: [WIP] Added direct fill

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2846
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9243/



---


[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2814
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9242/



---


[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2814
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1189/



---


[GitHub] carbondata issue #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2847
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1187/



---


[GitHub] carbondata issue #2845: [WIP] Rand function issue

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2845
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1188/



---


[GitHub] carbondata issue #2845: [WIP] Rand function issue

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2845
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9241/



---


[GitHub] carbondata issue #2814: [WIP][CARBONDATA-3001] configurable page size in MB

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2814
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/976/



---


[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9240/



---


[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2822
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9239/



---


[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2823
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1184/



---


[GitHub] carbondata issue #2845: [WIP] Rand function issue

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2845
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/975/



---


[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1186/



---


[GitHub] carbondata issue #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2847
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/974/



---


[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2823
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9238/



---


[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2822
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1185/



---


[GitHub] carbondata issue #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2847
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9235/



---


[GitHub] carbondata issue #2841: [WIP] Unsafe fallback to heap and unsafe query fix

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2841
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1183/



---


[GitHub] carbondata issue #2841: [WIP] Unsafe fallback to heap and unsafe query fix

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2841
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9237/



---


[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9234/



---


[GitHub] carbondata issue #2845: [WIP] Rand function issue

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2845
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1182/



---


[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2822
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9233/



---


[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2829
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1179/



---


[GitHub] carbondata issue #2845: [WIP] Rand function issue

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2845
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9236/



---


[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/973/



---


[GitHub] carbondata issue #2822: [CARBONDATA-3014] Added support for inverted index a...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2822
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/972/



---


[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2823
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/971/



---


[GitHub] carbondata issue #2826: [CARBONDATA-3023] Alter add column issue with SORT_C...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2826
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/970/



---


[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2829
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/969/



---


[GitHub] carbondata issue #2830: [CARBONDATA-3025]Added CLI enhancements

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2830
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/968/



---


[GitHub] carbondata issue #2841: [WIP] Unsafe fallback to heap and unsafe query fix

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2841
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/967/



---


[GitHub] carbondata issue #2845: [WIP] Rand function issue

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2845
  
Can you explain why this PR is `Rand` related? I just cannot find any code 
changes related to `Rand`.


---


[GitHub] carbondata issue #2845: [WIP] Rand function issue

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2845
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/966/



---


[GitHub] carbondata issue #2823: [CARBONDATA-3015] Support Lazy load in carbon vector

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2823
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9232/



---


[GitHub] carbondata issue #2826: [CARBONDATA-3023] Alter add column issue with SORT_C...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2826
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1176/



---


[GitHub] carbondata pull request #2751: [CARBONDATA-2946] Add bloomindex version info...

2018-10-23 Thread xuchuanyin
Github user xuchuanyin closed the pull request at:

https://github.com/apache/carbondata/pull/2751


---


[GitHub] carbondata issue #2826: [CARBONDATA-3023] Alter add column issue with SORT_C...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2826
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9231/



---


[GitHub] carbondata issue #2830: [CARBONDATA-3025]Added CLI enhancements

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2830
  
Build Failed with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1178/



---


[GitHub] carbondata issue #2829: [CARBONDATA-3025]add more metadata in carbon file fo...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2829
  
Build Success with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9230/



---


[GitHub] carbondata issue #2830: [CARBONDATA-3025]Added CLI enhancements

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2830
  
Build Failed  with Spark 2.3.1, Please check CI 
http://136.243.101.176:8080/job/carbondataprbuilder2.3/9229/



---


[GitHub] carbondata pull request #2847: [WIP]Support Gzip as column compressor

2018-10-23 Thread shardul-cr7
GitHub user shardul-cr7 opened a pull request:

https://github.com/apache/carbondata/pull/2847

[WIP]Support Gzip as column compressor

Gzip compressed file size is less than that of snappy but takes more time.

Data generated by tpch-dbgen(lineitem)

**Load Performance Comparisons (Compression)**

*Test Case 1*
*File Size 3.9G*
*Records ~30M*

| Codec Used | Load Time | File Size After Load | 
| -- | -- | -- |
| Snappy | 156s | 101M 
| Zstd| 153s | 2.2M 
| Gzip| 163s | 12.1M

*Test Case 2*
*File Size 7.8G*
*Records ~60M*

| Codec Used | Load Time | File Size After Load | 
| -- | -- | -- |
| Snappy | 336s | 203.6M 
| Zstd| 352s | 4.3M 
| Gzip| 354s | 12.1M

**Query Performance (Decompression)**

*Test Case 1*

| Codec Used | Full Scan Time  
| -- | -- 
| Snappy | 16.108s 
| Zstd| 14.595s 
| Gzip| 14.313s 

*Test Case 2*

| Codec Used | Full Scan Time  
| -- | -- 
| Snappy | 23.559s 
| Zstd| 23.913s 
| Gzip| 26.741s 

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed?
 
 - [ ] Any backward compatibility impacted?
 
 - [ ] Document update required?

 - [x] Testing done
  added some testcases
   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shardul-cr7/carbondata b010

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2847.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2847


commit 6ad88ccc5663353d16372d91878d7efb223b16d6
Author: shardul-cr7 
Date:   2018-10-23T11:57:47Z

[WIP]Support Gzip




---


[GitHub] carbondata issue #2806: [CARBONDATA-2998] Refresh column schema for old stor...

2018-10-23 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2806
  
Build Success with Spark 2.2.1, Please check CI 
http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/1174/



---


[GitHub] carbondata issue #2845: [WIP] Rand function issue

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2845
  
What is this for?


---


[GitHub] carbondata issue #2842: [CARBONDATA-3032] Remove carbon.blocklet.size from p...

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2842
  
For 'carbon.blocklet.size', why not remove this property in code now?


---


[GitHub] carbondata pull request #2842: [CARBONDATA-3032] Remove carbon.blocklet.size...

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2842#discussion_r227356312
  
--- Diff: docs/sdk-guide.md ---
@@ -24,7 +24,8 @@ CarbonData provides SDK to facilitate
 
 # SDK Writer
 
-In the carbon jars package, there exist a 
carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader.
+In the carbon jars package, there exist a 
carbondata-store-sdk-x.x.x-SNAPSHOT.jar, including SDK writer and reader. 
+If you want to use SDK, it needs other carbon jar or you can use 
carbondata-sdk.jar.
--- End diff --

What does this mean?

User can 
1. only use SDK jar
2. use other carbon jars instread (SDK jar not included?)

Is my understanding correct?


---


[GitHub] carbondata issue #2751: [CARBONDATA-2946] Add bloomindex version info file f...

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2751
  
Since we will leave this problem as it is, I'll close this PR now.


---


[GitHub] carbondata pull request #2824: [CARBONDATA-3008] Optimize default value for ...

2018-10-23 Thread xuchuanyin
Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2824#discussion_r227352023
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/datasources/SparkCarbonTableFormat.scala
 ---
@@ -172,33 +172,8 @@ with Serializable {
   dataSchema: StructType,
   context: TaskAttemptContext): OutputWriter = {
 val model = 
CarbonTableOutputFormat.getLoadModel(context.getConfiguration)
-val isCarbonUseMultiDir = 
CarbonProperties.getInstance().isUseMultiTempDir
-var storeLocation: Array[String] = Array[String]()
-val isCarbonUseLocalDir = CarbonProperties.getInstance()
-  .getProperty("carbon.use.local.dir", 
"false").equalsIgnoreCase("true")
-
-
 val taskNumber = generateTaskNumber(path, context, 
model.getSegmentId)
-val tmpLocationSuffix =
-  File.separator + "carbon" + System.nanoTime() + File.separator + 
taskNumber
-if (isCarbonUseLocalDir) {
-  val yarnStoreLocations = 
Util.getConfiguredLocalDirs(SparkEnv.get.conf)
-  if (!isCarbonUseMultiDir && null != yarnStoreLocations && 
yarnStoreLocations.nonEmpty) {
-// use single dir
-storeLocation = storeLocation :+
-  
(yarnStoreLocations(Random.nextInt(yarnStoreLocations.length)) + 
tmpLocationSuffix)
-if (storeLocation == null || storeLocation.isEmpty) {
-  storeLocation = storeLocation :+
-(System.getProperty("java.io.tmpdir") + tmpLocationSuffix)
-}
-  } else {
-// use all the yarn dirs
-storeLocation = yarnStoreLocations.map(_ + tmpLocationSuffix)
-  }
-} else {
-  storeLocation =
-storeLocation :+ (System.getProperty("java.io.tmpdir") + 
tmpLocationSuffix)
-}
+val storeLocation = CommonUtil.getTempStoreLocations(taskNumber)
--- End diff --

yes, I also noticed this.
The suffix has no meanings, just used to separate each thread's output.
I think it's a mistake by hand -- that's why we need to extract these code 
to avoid problems like this.


---


  1   2   >