[carbondata] branch master updated: [CARBONDATA-3830] Support Array and Struct of all primitive type reading from presto

2020-08-30 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new ed7e049  [CARBONDATA-3830] Support Array and Struct of all primitive 
type reading from presto
ed7e049 is described below

commit ed7e04961c9e4cf038276b154feb9a2f3a105457
Author: ajantha-bhat 
AuthorDate: Thu Aug 13 22:05:15 2020 +0530

[CARBONDATA-3830] Support Array and Struct of all primitive type reading 
from presto

Why is this PR needed?
Currently, presto cannot read complex data type stores. Sometimes it gives 
empty results and some times exception.

What changes were proposed in this PR?
Supported all the 13 complex primitive types (including binary, refer the 
testcase added) with non-nested array and struct data type

Supported complex type in Direct vector filling flow :
Currently, spark integration carbondata will use row level filling for 
complex type instead of vector filling. But presto supports only vector 
reading. so need to support complex type in vector filling.

Supported complex primitive vector handling in DIRECT_COMPESS, 
ADAPTIVE_CODEC flows
Encoding of all the complex primitive type is either DIRECT_COMPESS or 
ADAPTIVE_CODEC, it will never use a legacy encoding. so, because of this 
string, varchar (with/without local dictionary), binary, date vector filling 
need to handle in DIRECT_COMPESS. Parent column also comes as DIRECT_COMPESS. 
Extracted data from parent column page here.

Supported vector stack in complex column vectorInfo to store all the 
children vectors.

Keep a list of children vector inside CarbonColumnVectorImpl.java

Support ComplexStreamReader to fill presto ROW (struct) block and ARRAY 
block.

Handle null value filling by wrapping children vector with 
ColumnarVectorWrapperDirect

Limitations / next work:
Some pending TODO 's are,

Local dictionary need to handle for string / varchar columns as 
DIRECT_COMPRESS flow don't have that handling
Can support map of all primitive types
Can support multilevel nested arrays and struct

Does this PR introduce any user interface change?
No

Is any new testcase added?
Yes [Added test case for all 13 primitive type with array and struct, null 
values and more than one page data]

This closes #3887

Co-authored-by: akkio-97 
---
 .../dimension/v3/DimensionChunkReaderV3.java   |   5 +
 .../impl/LocalDictDimensionDataChunkStore.java |   8 +-
 .../SafeFixedLengthDimensionDataChunkStore.java|   3 +-
 .../SafeVariableLengthDimensionDataChunkStore.java |   4 +-
 .../adaptive/AdaptiveDeltaFloatingCodec.java   |  45 ++-
 .../adaptive/AdaptiveDeltaIntegralCodec.java   |  70 ++--
 .../encoding/adaptive/AdaptiveFloatingCodec.java   |  47 ++-
 .../encoding/adaptive/AdaptiveIntegralCodec.java   |  56 ++--
 .../encoding/compress/DirectCompressCodec.java | 227 +
 .../metadata/datatype/DecimalConverterFactory.java | 109 +--
 .../impl/DictionaryBasedVectorResultCollector.java |  18 +-
 .../scan/executor/impl/AbstractQueryExecutor.java  |  22 +-
 .../core/scan/result/BlockletScannedResult.java|  41 ++-
 .../scan/result/vector/CarbonColumnVector.java |  18 ++
 .../core/scan/result/vector/ColumnVectorInfo.java  |  25 ++
 .../result/vector/impl/CarbonColumnVectorImpl.java |  54 
 .../ColumnarVectorWrapperDirectFactory.java|  12 +-
 ...ColumnarVectorWrapperDirectWithDeleteDelta.java |   6 +
 .../presto/CarbonColumnVectorWrapper.java  |   4 +
 .../carbondata/presto/CarbonVectorBatch.java   |  19 +-
 .../presto/ColumnarVectorWrapperDirect.java|  20 +-
 .../presto/PrestoCarbonVectorizedRecordReader.java |  24 +-
 .../presto/readers/ComplexTypeStreamReader.java| 196 +++
 .../presto/readers/SliceStreamReader.java  |   2 +-
 .../PrestoTestNonTransactionalTableFiles.scala | 358 -
 .../processing/datatypes/PrimitiveDataType.java|  12 +-
 26 files changed, 1168 insertions(+), 237 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/DimensionChunkReaderV3.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/DimensionChunkReaderV3.java
index d53c9d3..2538687 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/DimensionChunkReaderV3.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/DimensionChunkReaderV3.java
@@ -257,6 +257,11 @@ public class DimensionChunkReaderV3 extends 
AbstractDimensionChunkReader {
   .decodeAndFillVector(pageData.array(), offset, 
pageMetadata.data_page_length, vectorInfo,
   nullBitSet

[carbondata] branch master updated: [CARBONDATA-3555] Make move filter related methods under DataMapFilter

2019-11-19 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 488a547  [CARBONDATA-3555] Make move filter related methods under 
DataMapFilter
488a547 is described below

commit 488a5470d2a172019f8c608bb0fab0b78ed14bdc
Author: kunal642 
AuthorDate: Wed Oct 23 16:14:38 2019 +0530

[CARBONDATA-3555] Make move filter related methods under DataMapFilter

1. This PR will make DataMapFilter as a filter holder for the filter 
expression and the FilterResolver objects so that
2. all the major API's can accept dataMapFilter as an argument.
3. Moved all the filter resolving methods inside DataMapFilter for ease of 
use
4. Fixed Datasource issue where invalid datamaps are getting pruned

This closes #3419
---
 .../carbondata/core/datamap/DataMapFilter.java | 110 +
 .../core/datamap/DataMapStoreManager.java  |  15 ++-
 .../apache/carbondata/core/datamap/Segment.java|   5 +
 .../carbondata/core/datamap/TableDataMap.java  |   2 +-
 .../indexstore/blockletindex/BlockDataMap.java |  20 +---
 .../core/metadata/schema/table/CarbonTable.java|  23 -
 .../core/metadata/schema/table/TableInfo.java  |   4 +
 .../scan/executor/impl/AbstractQueryExecutor.java  |  38 ---
 .../carbondata/core/scan/model/QueryModel.java |  29 ++
 .../core/scan/model/QueryModelBuilder.java |  30 +++---
 dev/findbugs-exclude.xml   |   8 ++
 .../hadoop/api/CarbonFileInputFormat.java  |  12 ++-
 .../carbondata/hadoop/api/CarbonInputFormat.java   |  36 ---
 .../hadoop/api/CarbonTableInputFormat.java |  33 ---
 .../hadoop/stream/StreamRecordReader.java  |   4 +-
 .../carbondata/hadoop/testutil/StoreCreator.java   |   3 +-
 .../hadoop/util/CarbonInputFormatUtil.java |   7 +-
 .../hadoop/ft/CarbonTableInputFormatTest.java  |  12 ++-
 .../carbondata/presto/CarbondataPageSource.java|  11 ++-
 .../carbondata/presto/impl/CarbonTableReader.java  |   9 +-
 ...ryWithColumnMetCacheAndCacheLevelProperty.scala |   7 +-
 .../filterexpr/FilterProcessorTestCase.scala   |   7 ++
 .../filterexpr/TestImplicitFilterExpression.scala  |   5 +-
 .../carbondata/spark/rdd/CarbonScanRDD.scala   |  38 ---
 .../command/carbonTableSchemaCommon.scala  |   5 +-
 .../vectorreader/VectorizedCarbonRecordReader.java |   4 +-
 .../execution/datasources/CarbonFileIndex.scala|   6 +-
 .../datasources/SparkCarbonFileFormat.scala|  12 ++-
 .../apache/carbondata/store/SparkCarbonStore.scala |   4 +-
 .../spark/sql/CarbonDatasourceHadoopRelation.scala |   7 +-
 .../command/management/CarbonAddLoadCommand.scala  |   1 +
 .../strategy/CarbonLateDecodeStrategy.scala|   2 +
 .../merger/CarbonCompactionExecutor.java   |   4 +-
 .../carbondata/sdk/file/CarbonReaderBuilder.java   |   4 +-
 .../carbondata/sdk/file/CarbonSchemaReader.java|  52 --
 .../apache/carbondata/store/LocalCarbonStore.java  |   4 +-
 36 files changed, 322 insertions(+), 251 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java 
b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java
index 46f37db..23805e2 100644
--- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java
+++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapFilter.java
@@ -19,6 +19,7 @@ package org.apache.carbondata.core.datamap;
 
 import java.io.IOException;
 import java.io.Serializable;
+import java.util.ArrayList;
 import java.util.HashSet;
 import java.util.Set;
 
@@ -29,7 +30,11 @@ import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure;
 import org.apache.carbondata.core.scan.executor.util.RestructureUtil;
 import org.apache.carbondata.core.scan.expression.ColumnExpression;
 import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.core.scan.filter.intf.FilterOptimizer;
+import org.apache.carbondata.core.scan.filter.optimizer.RangeFilterOptmizer;
 import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.model.QueryModel;
 import org.apache.carbondata.core.util.ObjectSerializationUtil;
 
 /**
@@ -37,7 +42,9 @@ import 
org.apache.carbondata.core.util.ObjectSerializationUtil;
  */
 public class DataMapFilter implements Serializable {
 
-  private CarbonTable table;
+  private static final long serialVersionUID = 6276855832288220240L;
+
+  private transient CarbonTable table;
 
   private Expression expression;
 
@@ -45,9 +52,16 @@ public class DataMapFilter implements Serializable {
 
   private String serializedExpression;
 
+  private

[carbondata] branch master updated: [CARBONDATA-3584] Fix Select Query failure for Boolean dictionary column when Codegen is diasbled

2019-11-18 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 10149eb  [CARBONDATA-3584] Fix Select Query failure for Boolean 
dictionary column when Codegen is diasbled
10149eb is described below

commit 10149eb9f58dce578702cc7e7266671201198412
Author: Indhumathi27 
AuthorDate: Fri Nov 15 16:51:28 2019 +0530

[CARBONDATA-3584] Fix Select Query failure for Boolean dictionary column 
when Codegen is diasbled

Problem:
Select query fails for boolean dictionary column with CastException when 
codegen is disabled.

Solution:
Added Boolean case in getDataBasedOnDataType and decode Boolean in 
CodegenContext

This closes #3463
---
 .../org/apache/carbondata/core/util/DataTypeUtil.java   |  6 ++
 .../org/apache/spark/sql/CarbonDictionaryDecoder.scala  | 16 
 .../booleantype/BooleanDataTypesBaseTest.scala  | 17 +
 3 files changed, 39 insertions(+)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java 
b/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
index 660c705..f138323 100644
--- a/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
+++ b/core/src/main/java/org/apache/carbondata/core/util/DataTypeUtil.java
@@ -715,6 +715,12 @@ public final class DataTypeUtil {
   javaDecVal = 
javaDecVal.setScale(dimension.getColumnSchema().getScale());
 }
 return 
getDataTypeConverter().convertFromBigDecimalToDecimal(javaDecVal);
+  } else if (dataType == DataTypes.BOOLEAN) {
+String data8 = new String(dataInBytes, 
CarbonCommonConstants.DEFAULT_CHARSET_CLASS);
+if (data8.isEmpty()) {
+  return null;
+}
+return BooleanConvert.parseBoolean(data8);
   } else {
 return getDataTypeConverter().convertFromByteToUTF8String(dataInBytes);
   }
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
index 3b20c2f..9b9d7a6 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
@@ -231,6 +231,17 @@ case class CarbonDictionaryDecoder(
|  tuple.setValue(UTF8String.fromBytes((byte[])tuple.getValue()));
|  return tuple;
|}""".stripMargin)
+  val decodeBool = ctx.freshName("deDictBool")
+  ctx.addNewFunction(decodeStr,
+s"""
+   |private org.apache.spark.sql.DictTuple $decodeBool(
+   |  org.apache.spark.sql.ForwardDictionaryWrapper dict, int surg)
+   | throws java.io.IOException {
+   |  org.apache.spark.sql.DictTuple tuple = $decodeDictionary(dict, 
surg);
+   |  tuple.setValue(Boolean.parseBoolean(new 
String((byte[])tuple.getValue(),
+   |  
org.apache.carbondata.core.constants.CarbonCommonConstants.DEFAULT_CHARSET_CLASS)));
+   |  return tuple;
+   |}""".stripMargin)
 
 
   val resultVars = exprs.zipWithIndex.map { case (expr, index) =>
@@ -271,6 +282,11 @@ case class CarbonDictionaryDecoder(
 |org.apache.spark.sql.DictTuple $value = 
$decodeLong($dictRef, ${ ev.value });
  """.stripMargin
 ExprCode(code, s"$value.getIsNull()", 
s"((Long)$value.getValue())")
+  case CarbonDataTypes.BOOLEAN => code +=
+s"""
+   |org.apache.spark.sql.DictTuple $value = 
$decodeBool($dictRef, ${ ev.value });
+ """.stripMargin
+ExprCode(code, s"$value.getIsNull()", 
s"((Boolean)$value.getValue())")
   case _ => code +=
 s"""
|org.apache.spark.sql.DictTuple $value = 
$decodeStr($dictRef, ${ev.value});
diff --git 
a/integration/spark2/src/test/scala/org/apache/carbondata/spark/testsuite/booleantype/BooleanDataTypesBaseTest.scala
 
b/integration/spark2/src/test/scala/org/apache/carbondata/spark/testsuite/booleantype/BooleanDataTypesBaseTest.scala
index c0087a8..82894d4 100644
--- 
a/integration/spark2/src/test/scala/org/apache/carbondata/spark/testsuite/booleantype/BooleanDataTypesBaseTest.scala
+++ 
b/integration/spark2/src/test/scala/org/apache/carbondata/spark/testsuite/booleantype/BooleanDataTypesBaseTest.scala
@@ -154,4 +154,21 @@ class BooleanDataTypesBaseTest extends QueryTest with 
BeforeAndAfterEach with Be
 sql("delete from carbon_table where cc=true")
 checkAnswer(sql("select COUNT(

svn commit: r36459 - /release/carbondata/1.6.1/

2019-10-24 Thread kumarvishal09
Author: kumarvishal09
Date: Thu Oct 24 09:47:09 2019
New Revision: 36459

Log:
Upload 1.6.1 release

Added:
release/carbondata/1.6.1/

release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar 
  (with props)

release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.asc

release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.sha512

release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar 
  (with props)

release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.asc

release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.sha512

release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.3.2-hadoop2.7.2.jar 
  (with props)

release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.3.2-hadoop2.7.2.jar.asc

release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.3.2-hadoop2.7.2.jar.sha512
release/carbondata/1.6.1/apache-carbondata-1.6.1-source-release.zip   (with 
props)
release/carbondata/1.6.1/apache-carbondata-1.6.1-source-release.zip.asc
release/carbondata/1.6.1/apache-carbondata-1.6.1-source-release.zip.sha512

Added: 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar
==
Binary file - no diff available.

Propchange: 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar
--
svn:mime-type = application/octet-stream

Added: 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.asc
==
--- 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.asc
 (added)
+++ 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.asc
 Thu Oct 24 09:47:09 2019
@@ -0,0 +1,11 @@
+-BEGIN PGP SIGNATURE-
+
+iQEzBAEBCAAdFiEEsZE8naWI0MngB++fuw0pZv1r+vAFAl2vKLEACgkQuw0pZv1r
++vA60ggAlOIODNcQfQK2fIP3BjQx7PBr1xljZ9D2W9zgY1O75r4N1mQ6shjimC4S
+xLo+MVNOK2eT69lhNALo5a7ZgXjNS8oLNce2lvS7gaast7XT6SwwpGBK45mPQo32
+nuoB8C6MXTejSOliut948WTLNrF4WJ6VRCXunDmwHVkGKjb3qife1uRQhNiBd9yI
+OqmdfgyPbRy0r9PVNGj5VJ5iEZT+QYzNs85MGgKeQ+dTlnhouCYs3NbEasybmOlX
+hR4QB3cregt3rINV2hW5T2bszYe2Td79XVY57UkLs2X1/kCrnkxYhC4zUb3aJA93
+5LF/FBL4GaDNVFXamHXPRGHmEsSO0w==
+=VYhJ
+-END PGP SIGNATURE-

Added: 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.sha512
==
--- 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.sha512
 (added)
+++ 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar.sha512
 Thu Oct 24 09:47:09 2019
@@ -0,0 +1 @@
+d5a30bf1ff13e8f4381fb78df7f1b3a6b660643f4922cf3318c445462c0ba3a48db0d7000acf0a4aa5129ac5c686299213182f6a51dc56a16b0585c8823e23aa
  apache-carbondata-1.6.1-bin-spark2.1.0-hadoop2.7.2.jar

Added: 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar
==
Binary file - no diff available.

Propchange: 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar
--
svn:mime-type = application/octet-stream

Added: 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.asc
==
--- 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.asc
 (added)
+++ 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.asc
 Thu Oct 24 09:47:09 2019
@@ -0,0 +1,11 @@
+-BEGIN PGP SIGNATURE-
+
+iQEzBAEBCAAdFiEEsZE8naWI0MngB++fuw0pZv1r+vAFAl2vKMQACgkQuw0pZv1r
++vBPJwgAi8K3iVqMayaVabHrMDqnbARLEv6sEmXs20YvNJDGwUIXAyCD8KfClvwq
+1v9tBezdjLy7jtgdTXyW6wvs8aCKnXunh/xmyJ8fESgIzbDfTaX2va6NW2latLTP
+SgrYcf1GDc2+/hv9Po5x0+yZNWhzDZniuSiQSMGIUWXNRUzxPk2scx7Ak/M+JUDv
+MqUfFVVe2Ec9X7HCeBQO25Ar6DX7d2vWcTBps2GvVMmNA273ZLMTIrvgUtmXn8Bi
+O3nTt6z9K6NcZZOpl8u6RBcWXLzazb15LQ7IER8g/UtOi2+hxwjli8pek8aHoqmd
++rv8aIKK0yWJ0deRWJ65T8waMIECuA==
+=q/zn
+-END PGP SIGNATURE-

Added: 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.sha512
==
--- 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.sha512
 (added)
+++ 
release/carbondata/1.6.1/apache-carbondata-1.6.1-bin-spark2.2.1-hadoop2.7.2.jar.sha512
 Thu Oct 24 09:47:09 2019
@@ -0,0 +1

[carbondata] branch master updated: [CARBONDATA-3454] optimized index server output for count(*)

2019-09-16 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 41ac71a  [CARBONDATA-3454] optimized index server output for count(*)
41ac71a is described below

commit 41ac71a7ef96a6725ee9b6a8f26bf4836bd535f9
Author: kunal642 
AuthorDate: Thu Jun 27 14:32:11 2019 +0530

[CARBONDATA-3454] optimized index server output for count(*)

Optimised the output for count(*) queries so that only a long is send back 
to the driver to reduce the network transfer cost for index server

This closes #3308
---
 .../apache/carbondata/core/datamap/DataMapJob.java |   2 +
 .../carbondata/core/datamap/DataMapUtil.java   |  13 ++-
 .../core/datamap/DistributableDataMapFormat.java   |  34 +--
 .../core/indexstore/ExtendedBlocklet.java  |  68 -
 .../core/indexstore/ExtendedBlockletWrapper.java   |  27 +++--
 .../ExtendedBlockletWrapperContainer.java  |  19 ++--
 .../carbondata/hadoop/api/CarbonInputFormat.java   |  52 --
 .../hadoop/api/CarbonTableInputFormat.java |  22 ++--
 .../carbondata/indexserver/DataMapJobs.scala   |  15 ++-
 .../indexserver/DistributedCountRDD.scala  | 111 +
 .../indexserver/DistributedPruneRDD.scala  |  29 ++
 .../indexserver/DistributedRDDUtils.scala  |  13 +++
 .../carbondata/indexserver/IndexServer.scala   |  19 
 13 files changed, 319 insertions(+), 105 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java 
b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java
index 9eafe7c..326282d 100644
--- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java
+++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapJob.java
@@ -35,4 +35,6 @@ public interface DataMapJob extends Serializable {
 
   List execute(DistributableDataMapFormat dataMapFormat);
 
+  Long executeCountJob(DistributableDataMapFormat dataMapFormat);
+
 }
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java 
b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java
index dd9debc..bca7409 100644
--- a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java
+++ b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapUtil.java
@@ -230,7 +230,7 @@ public class DataMapUtil {
   List validSegments, List invalidSegments, DataMapLevel 
level,
   List segmentsToBeRefreshed) throws IOException {
 return executeDataMapJob(carbonTable, resolver, dataMapJob, 
partitionsToPrune, validSegments,
-invalidSegments, level, false, segmentsToBeRefreshed);
+invalidSegments, level, false, segmentsToBeRefreshed, false);
   }
 
   /**
@@ -241,7 +241,8 @@ public class DataMapUtil {
   public static List executeDataMapJob(CarbonTable 
carbonTable,
   FilterResolverIntf resolver, DataMapJob dataMapJob, List 
partitionsToPrune,
   List validSegments, List invalidSegments, DataMapLevel 
level,
-  Boolean isFallbackJob, List segmentsToBeRefreshed) throws 
IOException {
+  Boolean isFallbackJob, List segmentsToBeRefreshed, boolean 
isCountJob)
+  throws IOException {
 List invalidSegmentNo = new ArrayList<>();
 for (Segment segment : invalidSegments) {
   invalidSegmentNo.add(segment.getSegmentNo());
@@ -250,9 +251,11 @@ public class DataMapUtil {
 DistributableDataMapFormat dataMapFormat =
 new DistributableDataMapFormat(carbonTable, resolver, validSegments, 
invalidSegmentNo,
 partitionsToPrune, false, level, isFallbackJob);
-List prunedBlocklets = dataMapJob.execute(dataMapFormat);
-// Apply expression on the blocklets.
-return prunedBlocklets;
+if (isCountJob) {
+  dataMapFormat.setCountStarJob();
+  dataMapFormat.setIsWriteToFile(false);
+}
+return dataMapJob.execute(dataMapFormat);
   }
 
   public static SegmentStatusManager.ValidAndInvalidSegmentsInfo 
getValidAndInvalidSegments(
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java
 
b/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java
index 8426fcb..b430c5d 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datamap/DistributableDataMapFormat.java
@@ -28,7 +28,6 @@ import java.util.UUID;
 
 import org.apache.carbondata.common.logging.LogServiceFactory;
 import org.apache.carbondata.core.constants.CarbonCommonConstants;
-import org.apache.carbondata.core.datamap.dev.DataMap;
 import org.apache.carbondata.core.datamap.dev.expr.DataMapDistributableWrapper;
 import org.apache.carbondata.core.datastore.impl.FileFactory;
 

[carbondata] branch master updated: [CARBONDATA-3515] Limit local dictionary size to 16MB and allow configuration.

2019-09-12 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new da525ec  [CARBONDATA-3515] Limit local dictionary size to 16MB and 
allow configuration.
da525ec is described below

commit da525ece20f6606f8b2113ca32b7acb82f0698fd
Author: ajantha-bhat 
AuthorDate: Tue Sep 10 10:48:26 2019 +0530

[CARBONDATA-3515] Limit local dictionary size to 16MB and allow 
configuration.

problem: currently local dictionary max size is 2GB, because of this, for 
varchar columns or long string columns,
local dictionary can be of 2GB size. so, as local dictionary is stored in 
blocklet. blocklet size will exceed 2 GB,
 even though configured maximum blocklet size is 64MB. some places inter 
overflow happens during casting.

solution: Limit local dictionary size to 16MB and allow configuration. 
default size is 4MB

This closes #3380
---
 .../core/constants/CarbonCommonConstants.java  | 11 ++
 .../dictionaryholder/MapBasedDictionaryStore.java  | 16 ++--
 .../carbondata/core/util/CarbonProperties.java | 43 ++
 docs/configuration-parameters.md   |  1 +
 4 files changed, 68 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index 67fa13f..ac77582 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -1209,6 +1209,17 @@ public final class CarbonCommonConstants {
 
   public static final String CARBON_ENABLE_RANGE_COMPACTION_DEFAULT = "true";
 
+  @CarbonProperty
+  /**
+   * size based threshold for local dictionary in mb.
+   */
+  public static final String CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB =
+  "carbon.local.dictionary.size.threshold.inmb";
+
+  public static final int CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB_DEFAULT 
= 4;
+
+  public static final int CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB_MAX = 
16;
+
   
//
   // Query parameter start here
   
//
diff --git 
a/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java
 
b/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java
index 7b8617a..0a50451 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java
@@ -20,7 +20,9 @@ import java.util.Map;
 import java.util.concurrent.ConcurrentHashMap;
 
 import org.apache.carbondata.core.cache.dictionary.DictionaryByteArrayWrapper;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
 import 
org.apache.carbondata.core.localdictionary.exception.DictionaryThresholdReachedException;
+import org.apache.carbondata.core.util.CarbonProperties;
 
 /**
  * Map based dictionary holder class, it will use map to hold
@@ -51,6 +53,11 @@ public class MapBasedDictionaryStore implements 
DictionaryStore {
   private int dictionaryThreshold;
 
   /**
+   * dictionary threshold size in bytes
+   */
+  private long dictionarySizeThresholdInBytes;
+
+  /**
* for checking threshold is reached or not
*/
   private boolean isThresholdReached;
@@ -62,6 +69,8 @@ public class MapBasedDictionaryStore implements 
DictionaryStore {
 
   public MapBasedDictionaryStore(int dictionaryThreshold) {
 this.dictionaryThreshold = dictionaryThreshold;
+this.dictionarySizeThresholdInBytes = 
Integer.parseInt(CarbonProperties.getInstance()
+
.getProperty(CarbonCommonConstants.CARBON_LOCAL_DICTIONARY_SIZE_THRESHOLD_IN_MB))
 << 20;
 this.dictionary = new ConcurrentHashMap<>();
 this.referenceDictionaryArray = new 
DictionaryByteArrayWrapper[dictionaryThreshold];
   }
@@ -93,7 +102,7 @@ public class MapBasedDictionaryStore implements 
DictionaryStore {
   value = ++lastAssignValue;
   currentSize += data.length;
   // if new value is greater than threshold
-  if (value > dictionaryThreshold || currentSize >= Integer.MAX_VALUE) 
{
+  if (value > dictionaryThreshold || currentSize > 
dictionarySizeThresholdInBytes) {
 // set the threshold boolean to true
 isThresholdReached = true;
 // throw exception
@@ -111,9 +120,10 @@ public class MapBasedDictionaryStore imp

[carbondata] branch master updated: [CARBONDATA-3506]Fix alter table failures on parition table with hive.metastore.disallow.incompatible.col.type.changes as true

2019-09-12 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 26f2c77  [CARBONDATA-3506]Fix alter table failures on parition table 
with hive.metastore.disallow.incompatible.col.type.changes as true
26f2c77 is described below

commit 26f2c778e5b8c10b2249862877250afdd0062a41
Author: akashrn5 
AuthorDate: Wed Aug 28 12:05:13 2019 +0530

[CARBONDATA-3506]Fix alter table failures on parition table with 
hive.metastore.disallow.incompatible.col.type.changes as true

Problem:
In case of spark2.2 and above and , when we call 
alterExternalCatalogForTableWithUpdatedSchema to update the new schema to 
external catalog
in case of add column, spark gets the catalog table and then it itself adds 
the partition columns if the table is partition table for all the
new data schema sent by carbon, so there will be duplicate partition 
columns, so validation fails in hive
When the table has only two columns and one of them is partition column, 
then dropping non partition column is invalid because,
 if we allow it is like table with all columns as partition columns. So 
with the above property as true, drop column will fail to update the hive 
metastore.
in spark2.2 and above if the datatype change is done on partition column, 
with the above property as true, it also fails,
 as we are not sending partition column for schema alter in hive

Solution:
when sending the new schema to spark to update in catalog, do not send the 
partition columns in case of spark2.2 and above,
as spark will take care of adding parition columns to new schema sent by us.
In the above scenario of drop, do not allow drop column, if after dropping 
the specific column, if table has only partition columns.
Block the operation on datatype change on partition column on spark2.2 and 
above.

This closes #3367
---
 .../StandardPartitionTableQueryTestCase.scala  | 29 +
 .../schema/CarbonAlterTableAddColumnCommand.scala  | 20 +---
 ...nAlterTableColRenameDataTypeChangeCommand.scala | 36 +++---
 .../schema/CarbonAlterTableDropColumnCommand.scala | 35 +
 4 files changed, 99 insertions(+), 21 deletions(-)

diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala
index c19c0b9..fb4b511 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableQueryTestCase.scala
@@ -21,8 +21,10 @@ import 
org.apache.spark.sql.execution.strategy.CarbonDataSourceScan
 import org.apache.spark.sql.test.Spark2TestQueryExecutor
 import org.apache.spark.sql.test.util.QueryTest
 import org.apache.spark.sql.{DataFrame, Row}
+import org.apache.spark.util.SparkUtil
 import org.scalatest.BeforeAndAfterAll
 
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
 import org.apache.carbondata.core.constants.CarbonCommonConstants
 import org.apache.carbondata.core.datastore.impl.FileFactory
 import org.apache.carbondata.core.util.CarbonProperties
@@ -439,18 +441,32 @@ test("Creation of partition table should fail if the 
colname in table schema and
 
   test("validate data in partition table after dropping and adding a column") {
 sql("drop table if exists par")
-sql("create table par(name string) partitioned by (age double) stored by " 
+
+sql("create table par(name string, add string) partitioned by (age double) 
stored by " +
   "'carbondata' TBLPROPERTIES('cache_level'='blocklet')")
-sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into 
table par options" +
-s"('header'='false')")
+sql("insert into par select 'joey','NY',32 union all select 
'chandler','NY',32")
 sql("alter table par drop columns(name)")
 sql("alter table par add columns(name string)")
-sql(s"load data local inpath '$resourcesPath/uniqwithoutheader.csv' into 
table par options" +
-s"('header'='false')")
-checkAnswer(sql("select name from par"), Seq(Row("a"),Row("b"), Row(null), 
Row(null)))
+sql("insert into par select 'joey','NY',32 union all select 
'joey','NY',32")
+checkAnswer(sql("select name from par"), Seq(Row("

[carbondata] branch master updated: [CARBONDATA-3505] Drop database cascade fix

2019-09-05 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new f3685a5  [CARBONDATA-3505] Drop database cascade fix
f3685a5 is described below

commit f3685a53ec70a0987f022bc1f479658810cf3755
Author: kunal642 
AuthorDate: Tue Aug 27 14:49:58 2019 +0530

[CARBONDATA-3505] Drop database cascade fix

Problem: When 2 databases are created on same location and one of them is 
dropped
then the folder is also deleted from backend. If we try to drop the 2nd 
database
then it would try to lookup the other table, but the schema file would not 
exist
in the backend and the drop will fail.

Solution: Add a check to call CarbonDropDatabaseCommand only if the database
location exists in the backend.

This closes #3365
---
 .../main/scala/org/apache/spark/sql/CarbonEnv.scala   | 19 ++-
 .../command/cache/CarbonShowCacheCommand.scala|  4 ++--
 .../spark/sql/execution/strategy/DDLStrategy.scala|  4 +++-
 .../apache/spark/sql/hive/CarbonFileMetastore.scala   |  4 ++--
 4 files changed, 25 insertions(+), 6 deletions(-)

diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala 
b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
index 1cbd156..f2a52d2 100644
--- a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
+++ b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
@@ -20,7 +20,7 @@ package org.apache.spark.sql
 import java.util.concurrent.ConcurrentHashMap
 
 import org.apache.spark.sql.catalyst.TableIdentifier
-import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
+import org.apache.spark.sql.catalyst.analysis.{NoSuchDatabaseException, 
NoSuchTableException}
 import org.apache.spark.sql.catalyst.catalog.SessionCatalog
 import org.apache.spark.sql.events.{MergeBloomIndexEventListener, 
MergeIndexEventListener}
 import org.apache.spark.sql.execution.command.cache._
@@ -267,6 +267,23 @@ object CarbonEnv {
   }
 
   /**
+   * Returns true with the database folder exists in file system. False in all 
other scenarios.
+   */
+  def databaseLocationExists(dbName: String,
+  sparkSession: SparkSession, ifExists: Boolean): Boolean = {
+try {
+  FileFactory.getCarbonFile(getDatabaseLocation(dbName, 
sparkSession)).exists()
+} catch {
+  case e: NoSuchDatabaseException =>
+if (ifExists) {
+  false
+} else {
+  throw e
+}
+}
+  }
+
+  /**
* The method returns the database location
* if carbon.storeLocation does  point to spark.sql.warehouse.dir then 
returns
* the database locationUri as database location else follows the old 
behaviour
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala
index 45e811a..4b7f680 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/cache/CarbonShowCacheCommand.scala
@@ -443,9 +443,9 @@ case class CarbonShowCacheCommand(tableIdentifier: 
Option[TableIdentifier],
   case (_, _, sum, provider) =>
 provider.toLowerCase match {
   case `bloomFilterIdentifier` =>
-allIndexSize += sum
-  case _ =>
 allDatamapSize += sum
+  case _ =>
+allIndexSize += sum
 }
 }
 (allIndexSize, allDatamapSize)
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala
index 4791687..3ef8cfa 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala
@@ -37,6 +37,7 @@ import org.apache.spark.util.{CarbonReflectionUtils, 
DataMapUtil, FileUtils, Spa
 
 import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
 import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.datastore.impl.FileFactory
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.core.util.{CarbonProperties, DataTypeUtil, 
ThreadLocalSessionInfo}
 import org.apache.carbondata.spark.util.Util
@@ -115,7 +116,8 @@ class DDLStrategy(sparkSession: SparkSession) extends 
SparkStrategy {
   
.setConfigurationToCurrentThread(sparkSession.sessionState.newHadoopConf())
 FileUtils.createDatabaseD

[carbondata] branch feature/DistributedIndexServer deleted (was 7f05e69)

2019-08-02 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a change to branch feature/DistributedIndexServer
in repository https://gitbox.apache.org/repos/asf/carbondata.git.


 was 7f05e69  [HOTFIX]fixed loading issue for legacy store

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



[carbondata] branch branch-1.6.0 deleted (was d7d70a8)

2019-08-02 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a change to branch branch-1.6.0
in repository https://gitbox.apache.org/repos/asf/carbondata.git.


 was d7d70a8  [HOTFIX] Removed the hive-exec and commons dependency from 
hive module

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



[carbondata] 03/03: [HOTFIX] Removed the hive-exec and commons dependency from hive module

2019-08-02 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch branch-1.6
in repository https://gitbox.apache.org/repos/asf/carbondata.git

commit 80438f75379cd3754cb31a42a372aeb36e4d61e7
Author: ravipesala 
AuthorDate: Fri Aug 2 11:15:05 2019 +0530

[HOTFIX] Removed the hive-exec and commons dependency from hive module

Removed the hive-exec and commons dependency from hive module as spark has 
its own hive-exec.
Because of external hive-exec dependency, some tests are failing.

This closes #3347
---
 integration/spark-common/pom.xml | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/integration/spark-common/pom.xml b/integration/spark-common/pom.xml
index df683e0..a12992d 100644
--- a/integration/spark-common/pom.xml
+++ b/integration/spark-common/pom.xml
@@ -39,6 +39,16 @@
   org.apache.carbondata
   carbondata-hive
   ${project.version}
+  
+
+  org.apache.commons
+  *
+
+
+  org.apache.hive
+  hive-exec
+
+  
 
 
   org.apache.carbondata



[carbondata] 01/03: [CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after alter operation

2019-08-02 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch branch-1.6
in repository https://gitbox.apache.org/repos/asf/carbondata.git

commit 2ebc0413ee03645659e49b8c4d41969ee444b9aa
Author: Indhumathi27 
AuthorDate: Fri Jul 26 16:51:32 2019 +0530

[CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on compaction after 
alter operation

Problem:
In case of alter add, drop, rename operation, restructuredBlockExists will 
be true.
Currently, to get RawResultIterator for a block, we check if block has 
ColumnDrift
or not, by comparing SegmentProperties and columndrift columns.
SegmentProperties will be formed based on restructuredBlockExists.
if restructuredBlockExists is true, we will take current column schema to 
form SegmentProperties,
else, we will use datafilefooter columnschema to form SegmentProperties.

In the example given in CARBONDATA-3478 for both blocks, we use current 
column
schema to form SegmentProperties, as restructuredBlockExists will be true.
Hence, while iterating block 1, it throws ArrayIndexOutOfBound exception,
as it uses RawResultIterator instead of ColumnDriftRawResultIterator

Solution:
Use schema from datafilefooter of each block to check if it has columndrift 
or not

This closes #3337
---
 .../AlterTableColumnRenameTestCase.scala   | 54 ++
 .../merger/CarbonCompactionExecutor.java   |  9 +++-
 2 files changed, 61 insertions(+), 2 deletions(-)

diff --git 
a/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala
 
b/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala
index d927724..dd1fa0f 100644
--- 
a/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala
+++ 
b/integration/spark2/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala
@@ -320,12 +320,66 @@ class AlterTableColumnRenameTestCase extends 
Spark2QueryTest with BeforeAndAfter
 }
   }
 
+  test("test compaction after table rename and alter set tblproerties") {
+sql("DROP TABLE IF EXISTS test_rename")
+sql("DROP TABLE IF EXISTS test_rename_compact")
+sql(
+  "CREATE TABLE test_rename (empno int, empname String, designation 
String, doj Timestamp, " +
+  "workgroupcategory int, workgroupcategoryname String, deptno int, 
deptname String, " +
+  "projectcode int, projectjoindate Timestamp, projectenddate 
Timestamp,attendance int," +
+  "utilization int,salary int) STORED BY 'org.apache.carbondata.format'")
+sql(
+  s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE 
test_rename OPTIONS
+ |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin)
+sql("alter table test_rename rename to test_rename_compact")
+sql("alter table test_rename_compact set 
tblproperties('sort_columns'='deptno,projectcode', 'sort_scope'='local_sort')")
+sql(
+  s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE 
test_rename_compact OPTIONS
+ |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin)
+val res1 = sql("select * from test_rename_compact")
+sql("alter table test_rename_compact compact 'major'")
+val res2 = sql("select * from test_rename_compact")
+assert(res1.collectAsList().containsAll(res2.collectAsList()))
+checkExistence(sql("show segments for table test_rename_compact"), true, 
"Compacted")
+sql("DROP TABLE IF EXISTS test_rename")
+sql("DROP TABLE IF EXISTS test_rename_compact")
+  }
+
+  test("test compaction after alter set tblproerties- add and drop") {
+sql("DROP TABLE IF EXISTS test_alter")
+sql(
+  "CREATE TABLE test_alter (empno int, empname String, designation String, 
doj Timestamp, " +
+  "workgroupcategory int, workgroupcategoryname String, deptno int, 
deptname String, " +
+  "projectcode int, projectjoindate Timestamp, projectenddate 
Timestamp,attendance int," +
+  "utilization int,salary int) STORED BY 'org.apache.carbondata.format'")
+sql(
+  s"""LOAD DATA LOCAL INPATH '$resourcesPath/data.csv' INTO TABLE 
test_alter OPTIONS
+ |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin)
+sql("alter table test_alter set 
tblproperties('sort_columns'='deptno,projectcode', 'sort_scope'='local_sort')")
+sql("alter table test_alter drop columns(deptno)")
+sql(
+  s"""LOAD DAT

[carbondata] branch branch-1.6 updated (917e041 -> 80438f7)

2019-08-02 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a change to branch branch-1.6
in repository https://gitbox.apache.org/repos/asf/carbondata.git.


from 917e041  [HOTFIX] CLI test case failed during release because of space 
differences
 new 2ebc041  [CARBONDATA-3478]Fix ArrayIndexOutOfBound Exception on 
compaction after alter operation
 new 575b711  [CARBONDATA-3481] Multi-thread pruning fails when datamaps 
count is just near numOfThreadsForPruning
 new 80438f7  [HOTFIX] Removed the hive-exec and commons dependency from 
hive module

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../carbondata/core/datamap/TableDataMap.java  | 12 +++--
 integration/spark-common/pom.xml   | 10 
 .../AlterTableColumnRenameTestCase.scala   | 54 ++
 .../merger/CarbonCompactionExecutor.java   |  9 +++-
 4 files changed, 80 insertions(+), 5 deletions(-)



[carbondata] 02/03: [CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just near numOfThreadsForPruning

2019-08-02 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch branch-1.6
in repository https://gitbox.apache.org/repos/asf/carbondata.git

commit 575b7116e5cc0a7c25e17794a462a6ecdf4afb24
Author: ajantha-bhat 
AuthorDate: Thu Jul 25 18:50:19 2019 +0530

[CARBONDATA-3481] Multi-thread pruning fails when datamaps count is just 
near numOfThreadsForPruning

Cause : When the datamaps count is just near numOfThreadsForPruning,
As code is checking '>= ', last thread may not get the datamaps for prune.
Hence array out of index exception is thrown in this scenario.
There is no issues with higher number of datamaps.

Solution: In this scenario launch threads based on the distribution value,
not on the hardcoded value

This closes #3336
---
 .../org/apache/carbondata/core/datamap/TableDataMap.java | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java 
b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
index 33fc3b1..ecdd586 100644
--- a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
+++ b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
@@ -207,9 +207,6 @@ public final class TableDataMap extends 
OperationEventListener {
  */
 
 int numOfThreadsForPruning = CarbonProperties.getNumOfThreadsForPruning();
-LOG.info(
-"Number of threads selected for multi-thread block pruning is " + 
numOfThreadsForPruning
-+ ". total files: " + totalFiles + ". total segments: " + 
segments.size());
 int filesPerEachThread = totalFiles / numOfThreadsForPruning;
 int prev;
 int filesCount = 0;
@@ -254,6 +251,15 @@ public final class TableDataMap extends 
OperationEventListener {
   // this should not happen
   throw new RuntimeException(" not all the files processed ");
 }
+if (datamapListForEachThread.size() < numOfThreadsForPruning) {
+  // If the total datamaps fitted in lesser number of threads than 
numOfThreadsForPruning.
+  // Launch only that many threads where datamaps are fitted while 
grouping.
+  LOG.info("Datamaps is distributed in " + datamapListForEachThread.size() 
+ " threads");
+  numOfThreadsForPruning = datamapListForEachThread.size();
+}
+LOG.info(
+"Number of threads selected for multi-thread block pruning is " + 
numOfThreadsForPruning
++ ". total files: " + totalFiles + ". total segments: " + 
segments.size());
 List> results = new ArrayList<>(numOfThreadsForPruning);
 final Map> prunedBlockletMap =
 new ConcurrentHashMap<>(segments.size());



[carbondata] branch branch-1.6.0 created (now d7d70a8)

2019-08-02 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a change to branch branch-1.6.0
in repository https://gitbox.apache.org/repos/asf/carbondata.git.


  at d7d70a8  [HOTFIX] Removed the hive-exec and commons dependency from 
hive module

No new revisions were added by this update.



[carbondata] branch master updated: [HOTFIX] Removed the hive-exec and commons dependency from hive module

2019-08-02 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new d7d70a8  [HOTFIX] Removed the hive-exec and commons dependency from 
hive module
d7d70a8 is described below

commit d7d70a83d68ac1578f611e9a6a8b3af1c426d5d7
Author: ravipesala 
AuthorDate: Fri Aug 2 11:15:05 2019 +0530

[HOTFIX] Removed the hive-exec and commons dependency from hive module

Removed the hive-exec and commons dependency from hive module as spark has 
its own hive-exec.
Because of external hive-exec dependency, some tests are failing.

This closes #3347
---
 integration/spark-common/pom.xml | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/integration/spark-common/pom.xml b/integration/spark-common/pom.xml
index df683e0..a12992d 100644
--- a/integration/spark-common/pom.xml
+++ b/integration/spark-common/pom.xml
@@ -39,6 +39,16 @@
   org.apache.carbondata
   carbondata-hive
   ${project.version}
+  
+
+  org.apache.commons
+  *
+
+
+  org.apache.hive
+  hive-exec
+
+  
 
 
   org.apache.carbondata



[carbondata] branch master updated: [CARBONDATA-3449] Synchronize the initialization of listeners in case of concuurent scenarios

2019-06-26 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new b98e183  [CARBONDATA-3449] Synchronize the initialization of listeners 
in case of concuurent scenarios
b98e183 is described below

commit b98e183f1546f577880c414b1c1649264ff2fd7d
Author: manishnalla1994 
AuthorDate: Sat Jun 22 11:27:41 2019 +0530

[CARBONDATA-3449] Synchronize the initialization of listeners in case of 
concuurent scenarios

Problem: Initialization of listeners in case of concurrent scenarios is not 
synchronized.

Solution: Changed the function to a val due to which the synchronization 
will be handled by scala and init will only occur once.

This closes #3304
---
 .../main/java/org/apache/carbondata/events/OperationListenerBus.java | 2 +-
 .../org/apache/spark/sql/hive/CarbonInMemorySessionState.scala   | 2 +-
 .../org/apache/spark/sql/hive/CarbonSessionState.scala   | 2 +-
 .../spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala   | 5 +++--
 .../main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala | 2 +-
 5 files changed, 7 insertions(+), 6 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/events/OperationListenerBus.java 
b/core/src/main/java/org/apache/carbondata/events/OperationListenerBus.java
index 5f9a05c..3f652e2 100644
--- a/core/src/main/java/org/apache/carbondata/events/OperationListenerBus.java
+++ b/core/src/main/java/org/apache/carbondata/events/OperationListenerBus.java
@@ -53,7 +53,7 @@ public class OperationListenerBus {
* @param eventClass
* @param operationEventListener
*/
-  public OperationListenerBus addListener(Class eventClass,
+  public synchronized OperationListenerBus addListener(Class 
eventClass,
   OperationEventListener operationEventListener) {
 
 String eventType = eventClass.getName();
diff --git 
a/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonInMemorySessionState.scala
 
b/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonInMemorySessionState.scala
index e286fba..5dfb16d 100644
--- 
a/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonInMemorySessionState.scala
+++ 
b/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonInMemorySessionState.scala
@@ -146,7 +146,7 @@ class InMemorySessionCatalog(
   }
 
   // Initialize all listeners to the Operation bus.
-  CarbonEnv.initListeners()
+  CarbonEnv.init
 
   def getThriftTableInfo(tablePath: String): TableInfo = {
 val tableMetadataFile = CarbonTablePath.getSchemaFilePath(tablePath)
diff --git 
a/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionState.scala
 
b/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionState.scala
index 08cf3cc..f991a78 100644
--- 
a/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionState.scala
+++ 
b/integration/spark2/src/main/commonTo2.2And2.3/org/apache/spark/sql/hive/CarbonSessionState.scala
@@ -83,7 +83,7 @@ class CarbonHiveSessionCatalog(
   }
 
   // Initialize all listeners to the Operation bus.
-  CarbonEnv.initListeners()
+  CarbonEnv.init
 
   override def lookupRelation(name: TableIdentifier): LogicalPlan = {
 val rtnRelation = super.lookupRelation(name)
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala 
b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
index 094d298..e7a6d65 100644
--- a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
+++ b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonEnv.scala
@@ -149,9 +149,10 @@ object CarbonEnv {
* Method
* 1. To initialize Listeners to their respective events in the 
OperationListenerBus
* 2. To register common listeners
-   *
+   * 3. Only initialize once for all the listeners in case of concurrent 
scenarios we have given
+   * val, as val initializes once
*/
-  def init(sparkSession: SparkSession): Unit = {
+  val init = {
 initListeners
   }
 
diff --git 
a/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala
 
b/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala
index 5caa4dd..26f778e 100644
--- 
a/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala
+++ 
b/integration/spark2/src/main/spark2.1/org/apache/spark/sql/hive/CarbonSessionState.scala
@@ -108,7 +108,7 @@ class CarbonHiveSessionCatalog(
   }
 
   // Initialize all listeners to the Operation bus.
-  CarbonEnv.init(sparkSession)
+  CarbonEnv.init
 
   /**
* This method will invalidate carbonrelation from cache if carbon table is 
updated in



[carbondata] branch master updated: [CARBONDATA-3448] Fix wrong results in preaggregate query with spark adaptive execution

2019-06-25 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 9b2ef53  [CARBONDATA-3448] Fix wrong results in preaggregate query 
with spark adaptive execution
9b2ef53 is described below

commit 9b2ef53aef9a823f8007b7d3b042f634e7d874ca
Author: ajantha-bhat 
AuthorDate: Fri Jun 21 10:35:06 2019 +0530

[CARBONDATA-3448] Fix wrong results in preaggregate query with spark 
adaptive execution

problem: Wrong results in preaggregate query with spark adaptive execution

Spark2TestQueryExecutor.conf.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, 
"true")

cause: For preaggreagate, segment info is set into threadLocal. when 
adaptive execution is called, spark is calling getInternalPartition in
another thread where updated segment conf is not set. Hence it is not using 
the updated segments.

solution: CarbonScanRdd is already having the sessionInfo, use it instead 
of taking session info from the current thread.

This closes #3303
---
 .../preaggregate/TestPreAggregateLoad.scala| 29 ++
 .../carbondata/spark/rdd/CarbonScanRDD.scala   | 16 +---
 2 files changed, 41 insertions(+), 4 deletions(-)

diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggregateLoad.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggregateLoad.scala
index 7ba8300..75d71ec 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggregateLoad.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggregateLoad.scala
@@ -18,6 +18,8 @@
 package org.apache.carbondata.integration.spark.testsuite.preaggregate
 
 import org.apache.spark.sql.Row
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.Spark2TestQueryExecutor
 import org.apache.spark.util.SparkUtil4Test
 import org.scalatest.{BeforeAndAfterAll, BeforeAndAfterEach}
 
@@ -298,6 +300,33 @@ class TestPreAggregateLoad extends SparkQueryTest with 
BeforeAndAfterAll with Be
 checkAnswer(sql("select * from maintable_preagg_sum"), Row(1, 52, "xyz"))
   }
 
+  test("test pregarregate with spark adaptive execution ") {
+if (Spark2TestQueryExecutor.spark.version.startsWith("2.3")) {
+  // enable adaptive execution
+  Spark2TestQueryExecutor.conf.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, 
"true")
+}
+sql("DROP TABLE IF EXISTS maintable")
+sql(
+  """
+| CREATE TABLE maintable(id int, name string, city string, age int)
+| STORED BY 'org.apache.carbondata.format'
+  """.stripMargin)
+sql(
+  s"""create datamap preagg_sum on table maintable using 'preaggregate' as 
select id, sum(age) from maintable group by id,name"""
+.stripMargin)
+sql(s"insert into maintable values(1, 'xyz', 'bengaluru', 20)")
+sql(s"insert into maintable values(1, 'xyz', 'bengaluru', 30)")
+
+checkAnswer(sql("select id, sum(age) from maintable group by id, name"), 
Row(1, 50))
+sql("drop datamap preagg_sum on table maintable")
+sql("drop table maintable")
+if (Spark2TestQueryExecutor.spark.version.startsWith("2.3")) {
+  // disable adaptive execution
+  Spark2TestQueryExecutor.conf.set(SQLConf.ADAPTIVE_EXECUTION_ENABLED.key, 
"false")
+}
+  }
+
+
 test("check load and select for avg double datatype") {
   sql("drop table if exists maintbl ")
   sql("create table maintbl(year int,month int,name string,salary double) 
stored by 'carbondata' 
tblproperties('sort_scope'='Global_sort','table_blocksize'='23','sort_columns'='month,year,name')")
diff --git 
a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
 
b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
index b62a7e2..f90d279 100644
--- 
a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
+++ 
b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
@@ -654,7 +654,6 @@ class CarbonScanRDD[T: ClassTag](
 CarbonInputFormat.setColumnProjection(conf, columnProjection)
 CarbonInputFormatUtil.setDataMapJobIfConfigured(conf)
 // when validate segments is disabled in thread local update it to 
CarbonTableInputFormat
-val carbonSessionInfo = ThreadLocalSessionInfo.getCarbo

[carbondata] branch master updated: [CARBONDATA-3427] Beautify DAG by showing less text

2019-06-25 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new ce2dde8  [CARBONDATA-3427] Beautify DAG by showing less text
ce2dde8 is described below

commit ce2dde84a09fb640058ad74b5257550fd370bb3a
Author: manhua 
AuthorDate: Wed Jun 12 09:47:17 2019 +0800

[CARBONDATA-3427] Beautify DAG by showing less text

beautify DAG by showing less text

This closes #3278
---
 .../scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala| 3 +--
 .../apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala | 2 ++
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala
index 09763fd..cfb6e6e 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDatasourceHadoopRelation.scala
@@ -195,8 +195,7 @@ case class CarbonDatasourceHadoopRelation(
   override def unhandledFilters(filters: Array[Filter]): Array[Filter] = new 
Array[Filter](0)
 
   override def toString: String = {
-"CarbonDatasourceHadoopRelation [ " + "Database name :" + 
identifier.getDatabaseName +
-", " + "Table name :" + identifier.getTableName + ", Schema :" + 
tableSchema + " ]"
+"CarbonDatasourceHadoopRelation"
   }
 
   override def sizeInBytes: Long = carbonRelation.sizeInBytes
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
index 0f706af..5d238de 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
@@ -55,6 +55,7 @@ import org.apache.carbondata.spark.rdd.CarbonScanRDD
  */
 private[sql] class CarbonLateDecodeStrategy extends SparkStrategy {
   val PUSHED_FILTERS = "PushedFilters"
+  val READ_SCHEMA = "ReadSchema"
 
   /*
   Spark 2.3.1 plan there can be case of multiple projections like below
@@ -274,6 +275,7 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
   if (pushedFilters.nonEmpty) {
 pairs += (PUSHED_FILTERS -> pushedFilters.mkString("[", ", ", "]"))
   }
+  pairs += (READ_SCHEMA -> 
projectSet.++(filterSet).toSeq.toStructType.catalogString)
   pairs.toMap
 }
 



[carbondata] branch master updated: [CARBONDATA-3444]Fix MV query failure when projection has cast expression with alias

2019-06-23 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 0d32c6b  [CARBONDATA-3444]Fix MV query failure when projection has 
cast expression with alias
0d32c6b is described below

commit 0d32c6b8303f15e433fa3494e106f7ec6fa03b33
Author: akashrn5 
AuthorDate: Wed Jun 19 13:38:20 2019 +0530

[CARBONDATA-3444]Fix MV query failure when projection has cast expression 
with alias

Problem:
MV datamap creation fails when the project column as cast expression with 
multiple arithmetic functions on
one of main table columns with alias. It throws the field does not exists 
error.
when create datamap DDL has DM provider name as capital letters, the query 
was not hitting the MV table

Solution:
When making fieldRelationMap, handling the above case was missed, added a 
case to handle this scenario.
When loading the datamapCatalogs, take care to convert to lower case

This closes #3298
---
 .../carbondata/core/datamap/DataMapStoreManager.java|  4 ++--
 .../scala/org/apache/carbondata/mv/datamap/MVUtil.scala | 17 +
 .../apache/carbondata/mv/rewrite/MVCreateTestCase.scala | 16 
 .../carbondata/mv/plans/modular/ModularRelation.scala   |  1 +
 4 files changed, 36 insertions(+), 2 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
 
b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
index 729c419..a6a2031 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
@@ -240,7 +240,7 @@ public final class DataMapStoreManager {
 if (dataMapCatalog == null) {
   dataMapCatalog = dataMapProvider.createDataMapCatalog();
   if (dataMapCatalog != null) {
-dataMapCatalogs.put(name, dataMapCatalog);
+dataMapCatalogs.put(name.toLowerCase(), dataMapCatalog);
 dataMapCatalog.registerSchema(dataMapSchema);
   }
 } else {
@@ -291,7 +291,7 @@ public final class DataMapStoreManager {
 if (null == dataMapCatalog) {
   throw new RuntimeException("Internal Error.");
 }
-dataMapCatalogs.put(schema.getProviderName(), dataMapCatalog);
+dataMapCatalogs.put(schema.getProviderName().toLowerCase(), 
dataMapCatalog);
   }
   try {
 dataMapCatalog.registerSchema(schema);
diff --git 
a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala 
b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala
index 048e22d..4e633a6 100644
--- 
a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala
+++ 
b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala
@@ -113,6 +113,23 @@ class MVUtil {
 }
   case a@Alias(_, name) =>
 checkIfComplexDataTypeExists(a)
+val arrayBuffer: ArrayBuffer[ColumnTableRelation] = new 
ArrayBuffer[ColumnTableRelation]()
+a.collect {
+  case attr: AttributeReference =>
+val carbonTable = getCarbonTable(logicalRelation, attr)
+if (null != carbonTable) {
+  val relation = getColumnRelation(attr.name,
+
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableId,
+
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getTableName,
+
carbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier.getDatabaseName,
+carbonTable)
+  if (null != relation) {
+arrayBuffer += relation
+  }
+}
+}
+fieldToDataMapFieldMap +=
+getFieldToDataMapFields(a.name, a.dataType, None, "arithmetic", 
arrayBuffer, "")
 }
 fieldToDataMapFieldMap
   }
diff --git 
a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala
 
b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala
index 535ddef..1d259c8 100644
--- 
a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala
+++ 
b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala
@@ -1153,6 +1153,22 @@ class MVCreateTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql("drop table IF EXISTS maintable")
   }
 
+  test("test cast expression with mv") {
+sql("drop table IF EXISTS maintable")
+sql("create table maintable (m_month bigint, c_code string, " +
+"c_country smallint, d_dollar_value double, q_quantity double, u_unit 
smallint, 

[carbondata] branch master updated: [CARBONDATA-3444]Fix MV query failure when column name and table name is same in case of join scenario

2019-06-21 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 2b0e79c  [CARBONDATA-3444]Fix MV query failure when column name and 
table name is same in case of join scenario
2b0e79c is described below

commit 2b0e79c66357fce671c5f421fd5c400e28f69fde
Author: akashrn5 
AuthorDate: Wed Jun 19 12:52:30 2019 +0530

[CARBONDATA-3444]Fix MV query failure when column name and table name is 
same in case of join scenario

Problem:
when there are columns with same in different table, after sql generation, 
the project column will be like gen_subsumer_0.product ,
it fails during logical plan generation from rewritten query, as column 
names will be ambigous

Solution:
update the outputlist when there are duplicate columns present in query. 
Here we can form the qualified name for the Attribute reference.
So when qualifier is defined for column, the qualified name wil be like 
_,
if qualifier is not defined, then it will be _. So 
update for all the nodes like groupby , select nodes,
so that it will be handled when there will be amguity in columns.

This closes #3297
---
 .../apache/carbondata/mv/datamap/MVHelper.scala|  4 +-
 .../org/apache/carbondata/mv/datamap/MVUtil.scala  | 41 
 .../carbondata/mv/rewrite/DefaultMatchMaker.scala  | 20 +++---
 .../apache/carbondata/mv/rewrite/Navigator.scala   | 16 +---
 .../carbondata/mv/rewrite/MVCreateTestCase.scala   | 45 ++
 5 files changed, 105 insertions(+), 21 deletions(-)

diff --git 
a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala
 
b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala
index 4d43088..c0831ae 100644
--- 
a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala
+++ 
b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVHelper.scala
@@ -583,7 +583,9 @@ object MVHelper {
 val relation =
   
s.dataMapTableRelation.get.asInstanceOf[MVPlanWrapper].plan.asInstanceOf[Select]
 val outputList = getUpdatedOutputList(relation.outputList, 
s.dataMapTableRelation)
-val mappings = s.outputList zip outputList
+// when the output list contains multiple projection of same column, 
but relation
+// contains distinct columns, mapping may go wrong with columns, so 
select distinct
+val mappings = s.outputList.distinct zip outputList
 val oList = for ((o1, o2) <- mappings) yield {
   if (o1.name != o2.name) Alias(o2, o1.name)(exprId = o1.exprId) else 
o2
 }
diff --git 
a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala 
b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala
index 8cb2f1f..4dff5b8 100644
--- 
a/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala
+++ 
b/datamap/mv/core/src/main/scala/org/apache/carbondata/mv/datamap/MVUtil.scala
@@ -310,4 +310,45 @@ object MVUtil {
 " are not allowed for this datamap")
 }
   }
+
+  def updateDuplicateColumns(outputList: Seq[NamedExpression]): 
Seq[NamedExpression] = {
+val duplicateNameCols = outputList.groupBy(_.name).filter(_._2.length > 
1).flatMap(_._2)
+  .toList
+val updatedOutList = outputList.map { col =>
+  val duplicateColumn = duplicateNameCols
+.find(a => a.semanticEquals(col))
+  val qualifiedName = col.qualifier.getOrElse(s"${ col.exprId.id }") + "_" 
+ col.name
+  if (duplicateColumn.isDefined) {
+val attributesOfDuplicateCol = duplicateColumn.get.collect {
+  case a: AttributeReference => a
+}
+val attributeOfCol = col.collect { case a: AttributeReference => a }
+// here need to check the whether the duplicate columns is of same 
tables,
+// since query with duplicate columns is valid, we need to make sure, 
not to change their
+// names with above defined qualifier name, for example in case of 
some expression like
+// cast((FLOOR((cast(col_name) as double))).., upper layer even exprid 
will be same,
+// we need to find the attribute ref(col_name) at lower level and 
check where expid is same
+// or of same tables, so doin the semantic equals
+val isStrictDuplicate = attributesOfDuplicateCol.forall(expr =>
+  attributeOfCol.exists(a => a.semanticEquals(expr)))
+if (!isStrictDuplicate) {
+  Alias(col, qualifiedName)(exprId = col.exprId)
+} else if (col.qualifier.isDefined) {
+  Alias(col, qualifiedName)(exprId = col.exprId)
+  // this check is added in scenario where the column is direct 
Attribute reference and
+  // since d

[carbondata] branch master updated: [CARBONDATA-3410] Add UDF, Hex/Base64 SQL functions for binary

2019-06-12 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new c497142  [CARBONDATA-3410] Add UDF, Hex/Base64 SQL functions for binary
c497142 is described below

commit c4971422f283288491cf6e8eea65b35d3a6af091
Author: xubo245 
AuthorDate: Fri May 31 20:33:25 2019 +0800

[CARBONDATA-3410] Add UDF, Hex/Base64 SQL functions for binary

Add UDF, Hex/Base64 SQL functions for binary

This closes # 3253
---
 .../testsuite/binary/TestBinaryDataType.scala  |  32 +
 .../SparkCarbonDataSourceBinaryTest.scala  | 140 +
 2 files changed, 117 insertions(+), 55 deletions(-)

diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/binary/TestBinaryDataType.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/binary/TestBinaryDataType.scala
index 15e3ee9..1b73aba 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/binary/TestBinaryDataType.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/binary/TestBinaryDataType.scala
@@ -65,6 +65,17 @@ class TestBinaryDataType extends QueryTest with 
BeforeAndAfterAll {
 }
 assert(flag)
 
+sqlContext.udf.register("decodeHex", (str: String) => 
Hex.decodeHex(str.toCharArray))
+sqlContext.udf.register("decodeBase64", (str: String) => 
Base64.decodeBase64(str.getBytes()))
+
+val udfHexResult = sql("SELECT decodeHex(binaryField) FROM 
binaryTable")
+val unhexResult = sql("SELECT unhex(binaryField) FROM binaryTable")
+checkAnswer(udfHexResult, unhexResult)
+
+val udfBase64Result = sql("SELECT decodeBase64(binaryField) FROM 
binaryTable")
+val unbase64Result = sql("SELECT unbase64(binaryField) FROM 
binaryTable")
+checkAnswer(udfBase64Result, unbase64Result)
+
 checkAnswer(sql("SELECT COUNT(*) FROM binaryTable"), Seq(Row(3)))
 try {
 val df = sql("SELECT * FROM binaryTable").collect()
@@ -614,6 +625,27 @@ class TestBinaryDataType extends QueryTest with 
BeforeAndAfterAll {
| 
OPTIONS('header'='false','DELIMITER'='|','bad_records_action'='fail')
  """.stripMargin)
 
+val hexHiveResult = sql("SELECT hex(binaryField) FROM hivetable")
+val hexCarbonResult = sql("SELECT hex(binaryField) FROM carbontable")
+checkAnswer(hexHiveResult, hexCarbonResult)
+hexCarbonResult.collect().foreach { each =>
+val result = new 
String(Hex.decodeHex((each.getAs[Array[Char]](0)).toString.toCharArray))
+assert("\u0001history\u0002".equals(result)
+|| "\u0001biology\u0002".equals(result)
+|| "\u0001education\u0002".equals(result))
+}
+
+val base64HiveResult = sql("SELECT base64(binaryField) FROM hivetable")
+val base64CarbonResult = sql("SELECT base64(binaryField) FROM 
carbontable")
+checkAnswer(base64HiveResult, base64CarbonResult)
+base64CarbonResult.collect().foreach { each =>
+val result = new 
String(Base64.decodeBase64((each.getAs[Array[Char]](0)).toString))
+assert("\u0001history\u0002".equals(result)
+|| "\u0001biology\u0002".equals(result)
+|| "\u0001education\u0002".equals(result))
+}
+
+
 val hiveResult = sql("SELECT * FROM hivetable")
 val carbonResult = sql("SELECT * FROM carbontable")
 checkAnswer(hiveResult, carbonResult)
diff --git 
a/integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceBinaryTest.scala
 
b/integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceBinaryTest.scala
index bdfc9dd..d234576 100644
--- 
a/integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceBinaryTest.scala
+++ 
b/integration/spark-datasource/src/test/scala/org/apache/spark/sql/carbondata/datasource/SparkCarbonDataSourceBinaryTest.scala
@@ -17,16 +17,14 @@
 package org.apache.spark.sql.carbondata.datasource
 
 import java.io.File
-
 import org.apache.carbondata.core.constants.CarbonCommonConstants
 import org.apache.carbondata.core.util.CarbonProperties
 import org.apache.carbondata.sdk.util.BinaryUtil
+import org.apache.commons.codec.binary.{Base64, Hex}
 import org.apache.commons.io.FileUtils
-
 import or

[carbondata] branch master updated: [CARBONDATA-3421] Fix create table without column with properties failed, but throw incorrect exception

2019-06-12 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new d2bc0a9  [CARBONDATA-3421] Fix create table without column with 
properties failed, but throw incorrect exception
d2bc0a9 is described below

commit d2bc0a9cf78b770f6507981e833799dcbfbb51d7
Author: jack86596 
AuthorDate: Mon Jun 10 09:53:49 2019 +0800

[CARBONDATA-3421] Fix create table without column with properties failed, 
but throw incorrect exception

Problem:
Create table without column with properties failed, but throw incorrect 
exception: Invalid table properties. The exception should be "create table 
without column."

Solution:
In CarbonSparkSqlParserUtil.createCarbonTable, we will do some validations 
like checking tblproperties, is column provided for external table so on.
 We can add one more validation here to check is column provided for normal 
table. If not, throw MalformedCarbonCommandException.

This closes #3268
---
 .../cluster/sdv/generated/SDKwriterTestCase.scala  |  2 +-
 .../testsuite/createTable/TestCreateTableIfNotExists.scala |  6 ++
 .../org/apache/carbondata/spark/util/CommonUtil.scala  | 14 --
 .../apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala |  6 --
 4 files changed, 19 insertions(+), 9 deletions(-)

diff --git 
a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SDKwriterTestCase.scala
 
b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SDKwriterTestCase.scala
index 619bfb3..499c478 100644
--- 
a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SDKwriterTestCase.scala
+++ 
b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SDKwriterTestCase.scala
@@ -333,7 +333,7 @@ class SDKwriterTestCase extends QueryTest with 
BeforeAndAfterEach {
|'carbondata' LOCATION
|'$writerPath' TBLPROPERTIES('sort_scope'='batch_sort') 
""".stripMargin)
 }
-assert(ex.message.contains("table properties are not supported for 
external table"))
+assert(ex.message.contains("Table properties are not supported for 
external table"))
   }
 
   test("Read sdk writer output file and test without carbondata and 
carbonindex files should fail")
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableIfNotExists.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableIfNotExists.scala
index b3fa0eb..35238dc 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableIfNotExists.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableIfNotExists.scala
@@ -86,6 +86,12 @@ class TestCreateTableIfNotExists extends QueryTest with 
BeforeAndAfterAll {
 }
   }
 
+  test("test create table without column specified") {
+val exception = intercept[MalformedCarbonCommandException] {
+  sql("create table TableWithoutColumn stored by 'carbondata' 
tblproperties('sort_columns'='')")
+}
+assert(exception.getMessage.contains("Creating table without column(s) is 
not supported"))
+  }
 
   override def afterAll {
 sql("use default")
diff --git 
a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
 
b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
index da42363..1c89a0c 100644
--- 
a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
+++ 
b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala
@@ -95,12 +95,14 @@ object CommonUtil {
 
   def validateTblProperties(tableProperties: Map[String, String], fields: 
Seq[Field]): Boolean = {
 var isValid: Boolean = true
-tableProperties.foreach {
-  case (key, value) =>
-if (!validateFields(key, fields)) {
-  isValid = false
-  throw new MalformedCarbonCommandException(s"Invalid table properties 
${ key }")
-}
+if (fields.nonEmpty) {
+  tableProperties.foreach {
+case (key, value) =>
+  if (!validateFields(key, fields)) {
+isValid = false
+throw new MalformedCarbonCommandException(s"Invalid table 
properties $key")
+  }
+  }
 }
 isValid
   }
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSparkSqlParserUtil.scala
 
b/i

[carbondata] branch master updated: [CARBONDATA-3336] Support configurable decode for loading binary data, support base64 and Hex decode.

2019-05-30 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 3dda02d  [CARBONDATA-3336] Support configurable decode for loading 
binary data, support base64 and Hex decode.
3dda02d is described below

commit 3dda02d44c4dca12c99e16df4f29dd3e8f2e6dc1
Author: xubo245 
AuthorDate: Tue Apr 23 15:45:25 2019 +0800

[CARBONDATA-3336] Support configurable decode for loading binary data, 
support base64 and Hex decode.

Support configurable decode for loading binary data, support base64 and Hex 
decode.
1. support configurable decode for loading
2. test datamap: mv, preaggregate, timeseries, bloomfilter, lucene
3. test datamap and configurable decode

Default non decoder for loading binary data, this PR support base64 and hex 
decoder

This closes #3188
---
 .../core/constants/CarbonLoadOptionConstants.java  |  13 ++
 .../carbondata/mv/rewrite/MVCreateTestCase.scala   |  59 +
 .../src/test/resources/binaryDataBase64.csv|   3 +
 .../{binarydata.csv => binaryDataHex.csv}  |   0
 .../testsuite/binary/TestBinaryDataType.scala  | 247 ++--
 .../preaggregate/TestPreAggStreaming.scala |  11 +
 .../testsuite/dataload/TestLoadDataFrame.scala |  42 
 .../testsuite/datamap/TestDataMapCommand.scala | 257 +++--
 .../spark/sql/catalyst/CarbonDDLSqlParser.scala|   1 +
 .../datasources/CarbonSparkDataSourceUtil.scala|   4 +
 .../SparkCarbonDataSourceBinaryTest.scala  |  37 ++-
 .../datasource/SparkCarbonDataSourceTest.scala |  69 +-
 .../apache/spark/sql/CarbonDataFrameWriter.scala   |   1 +
 .../processing/loading/DataLoadProcessBuilder.java |   2 +
 .../converter/impl/BinaryFieldConverterImpl.java   |  26 +--
 .../converter/impl/FieldEncoderFactory.java|  54 -
 .../loading/converter/impl/RowConverterImpl.java   |   9 +-
 .../converter/impl/binary/Base64BinaryDecoder.java |  42 
 .../converter/impl/binary/BinaryDecoder.java   |  29 +++
 .../impl/binary/DefaultBinaryDecoder.java  |  32 +++
 .../converter/impl/binary/HexBinaryDecoder.java|  34 +++
 .../processing/loading/model/CarbonLoadModel.java  |  15 ++
 .../loading/model/CarbonLoadModelBuilder.java  |  17 ++
 .../processing/util/CarbonLoaderUtil.java  |   9 +
 .../carbondata/sdk/file/CarbonWriterBuilder.java   |  10 +-
 .../org/apache/carbondata/sdk/file/ImageTest.java  | 108 -
 26 files changed, 1068 insertions(+), 63 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java
index 225a8aa..3bcb06f 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java
@@ -172,4 +172,17 @@ public final class CarbonLoadOptionConstants {
 
   public static final String CARBON_LOAD_SORT_MEMORY_SPILL_PERCENTAGE_DEFAULT 
= "0";
 
+
+  /**
+   * carbon binary decoder when writing string data to binary, like decode 
base64, Hex
+   */
+  @CarbonProperty
+  public static final String CARBON_OPTIONS_BINARY_DECODER = 
"carbon.binary.decoder";
+
+  public static final String CARBON_OPTIONS_BINARY_DECODER_DEFAULT = "";
+
+  public static final String CARBON_OPTIONS_BINARY_DECODER_BASE64 = "base64";
+
+  public static final String CARBON_OPTIONS_BINARY_DECODER_HEX = "hex";
+
 }
diff --git 
a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala
 
b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala
index 62e320e..5e12ad3 100644
--- 
a/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala
+++ 
b/datamap/mv/core/src/test/scala/org/apache/carbondata/mv/rewrite/MVCreateTestCase.scala
@@ -970,6 +970,65 @@ class MVCreateTestCase extends QueryTest with 
BeforeAndAfterAll {
 }
   }
 
+  test("test binary on mv") {
+val querySQL = "select x19,x20,sum(x18) from all_table group by x19, x20"
+val querySQL2 = "select x19,x20,sum(x18) from all_table where 
x20=cast('binary2' as binary ) group by x19, x20"
+
+sql("drop datamap if exists all_table_mv")
+sql("drop table if exists all_table")
+
+sql(
+  """
+| create table all_table(x1 bigint,x2 bigint,
+| x3 string,x4 bigint,x5 bigint,x6 int,x7 string,x8 int, x9 int,x10 
bigint,
+| x11 bigint, x12 bigint,x13 bigint,x14 bigint,x15 bigint,x16 bigint,
+| x17 bigint,x18 bigint,x19 bigint,x20 binary) stored by 
'carbondata'""

[carbondata] branch master updated: [CARBONDATA-3394]Clean files command optimization

2019-05-29 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 6817e77  [CARBONDATA-3394]Clean files command optimization
6817e77 is described below

commit 6817e77ad667dbb483c76812f0478044fc444c49
Author: akashrn5 
AuthorDate: Mon May 27 12:24:33 2019 +0530

[CARBONDATA-3394]Clean files command optimization

Problem
Clean files is taking of lot of time to finish, even though there are no 
segments to delete
Tested for 5000 segments, and clean files takes 15 minutes time to finish

Root cause and Solution
Lot of table status read operations are were happening during clean files
lot of listing operations are happening, even though they are not required.

Read and list operations are reduced to reduce overall time for clean files.
After changes, for the same store, it takes 35 seconds in same 3 node cluste

This closes #3227
---
 .../carbondata/core/mutate/CarbonUpdateUtil.java   | 160 +++--
 .../core/statusmanager/SegmentStatusManager.java   |  37 +++--
 .../statusmanager/SegmentUpdateStatusManager.java  |  21 +--
 .../org/apache/carbondata/api/CarbonStore.scala|   3 +-
 4 files changed, 105 insertions(+), 116 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java 
b/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
index beaf1a0..736def6 100644
--- a/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
+++ b/core/src/main/java/org/apache/carbondata/core/mutate/CarbonUpdateUtil.java
@@ -466,94 +466,96 @@ public class CarbonUpdateUtil {
   if (segment.getSegmentStatus() == SegmentStatus.SUCCESS
   || segment.getSegmentStatus() == 
SegmentStatus.LOAD_PARTIAL_SUCCESS) {
 
-// take the list of files from this segment.
-String segmentPath = CarbonTablePath.getSegmentPath(
-table.getAbsoluteTableIdentifier().getTablePath(), 
segment.getLoadName());
-CarbonFile segDir =
-FileFactory.getCarbonFile(segmentPath, 
FileFactory.getFileType(segmentPath));
-CarbonFile[] allSegmentFiles = segDir.listFiles();
-
-// scan through the segment and find the carbondatafiles and index 
files.
-SegmentUpdateStatusManager updateStatusManager = new 
SegmentUpdateStatusManager(table);
-
-boolean updateSegmentFile = false;
-// deleting of the aborted file scenario.
-if (deleteStaleCarbonDataFiles(segment, allSegmentFiles, 
updateStatusManager)) {
-  updateSegmentFile = true;
-}
-
-// get Invalid update  delta files.
-CarbonFile[] invalidUpdateDeltaFiles = updateStatusManager
-.getUpdateDeltaFilesList(segment.getLoadName(), false,
-CarbonCommonConstants.UPDATE_DELTA_FILE_EXT, true, 
allSegmentFiles,
-isInvalidFile);
-
-// now for each invalid delta file need to check the query execution 
time out
-// and then delete.
-for (CarbonFile invalidFile : invalidUpdateDeltaFiles) {
-  compareTimestampsAndDelete(invalidFile, forceDelete, false);
-}
-// do the same for the index files.
-CarbonFile[] invalidIndexFiles = updateStatusManager
-.getUpdateDeltaFilesList(segment.getLoadName(), false,
-CarbonCommonConstants.UPDATE_INDEX_FILE_EXT, true, 
allSegmentFiles,
-isInvalidFile);
-
-// now for each invalid index file need to check the query execution 
time out
-// and then delete.
-
-for (CarbonFile invalidFile : invalidIndexFiles) {
-  if (compareTimestampsAndDelete(invalidFile, forceDelete, false)) {
+// when there is no update operations done on table, then no need to 
go ahead. So
+// just check the update delta start timestamp and proceed if not empty
+if (!segment.getUpdateDeltaStartTimestamp().isEmpty()) {
+  // take the list of files from this segment.
+  String segmentPath = CarbonTablePath.getSegmentPath(
+  table.getAbsoluteTableIdentifier().getTablePath(), 
segment.getLoadName());
+  CarbonFile segDir =
+  FileFactory.getCarbonFile(segmentPath, 
FileFactory.getFileType(segmentPath));
+  CarbonFile[] allSegmentFiles = segDir.listFiles();
+
+  // scan through the segment and find the carbondatafiles and index 
files.
+  SegmentUpdateStatusManager updateStatusManager = new 
SegmentUpdateStatusManager(table);
+
+  boolean updateSegmentFile = false;
+  // deleting of the aborted file scenario.
+  if (deleteStaleCarbonDataFiles(segment, allSegmentFiles, 
updateStatusManager)) {
 updateSegmentFile = true

[carbondata] branch master updated: [CARBONDATA-3343] Compaction for Range Sort

2019-05-07 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new affb40f  [CARBONDATA-3343] Compaction for Range Sort
affb40f is described below

commit affb40f277f28ba362690f5d196b72392b267b3b
Author: manishnalla1994 
AuthorDate: Mon Apr 22 18:52:45 2019 +0530

[CARBONDATA-3343] Compaction for Range Sort

Problem: To support Compaction for Range Sort in correct way as earlier it 
was grouping the ranges/partitions based on taskId which was not correct.

Solution: Combine all the data and create new ranges using Spark's 
RangePartitioner and using them give each
range to one task and apply the filter query to get the compacted segment.

This closes #3182
---
 .../core/constants/CarbonCommonConstants.java  |   1 +
 .../core/metadata/schema/table/CarbonTable.java|  24 +-
 .../core/scan/expression/Expression.java   |  13 +
 .../scan/filter/FilterExpressionProcessor.java |   5 +-
 .../carbondata/core/scan/filter/FilterUtil.java|  52 +-
 .../resolver/ConditionalFilterResolverImpl.java|   2 +-
 .../resolver/RowLevelRangeFilterResolverImpl.java  |  40 +-
 .../core/scan/model/QueryModelBuilder.java |  18 +-
 .../core/scan/result/BlockletScannedResult.java|  62 +-
 .../scan/result/impl/FilterQueryScannedResult.java |  20 +-
 .../result/impl/NonFilterQueryScannedResult.java   |  59 +-
 .../dataload/TestRangeColumnDataLoad.scala | 669 -
 .../spark/load/DataLoadProcessBuilderOnSpark.scala |  43 +-
 .../carbondata/spark/rdd/CarbonMergerRDD.scala | 202 ++-
 .../carbondata/spark/rdd/CarbonScanRDD.scala   |   7 +-
 .../org/apache/spark/CarbonInputMetrics.scala  |   0
 .../apache/spark/DataSkewRangePartitioner.scala|  26 +-
 .../spark/sql/catalyst/CarbonDDLSqlParser.scala|  12 +-
 .../spark/sql/CarbonDatasourceHadoopRelation.scala |   1 -
 .../merger/CarbonCompactionExecutor.java   |  20 +-
 .../processing/merger/CarbonCompactionUtil.java| 140 +
 .../merger/RowResultMergerProcessor.java   |   6 +-
 22 files changed, 1274 insertions(+), 148 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index 608b5fb..ba8e20a 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -1759,6 +1759,7 @@ public final class CarbonCommonConstants {
   public static final String ARRAY = "array";
   public static final String STRUCT = "struct";
   public static final String MAP = "map";
+  public static final String DECIMAL = "decimal";
   public static final String FROM = "from";
 
   /**
diff --git 
a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
 
b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
index 54ea772..c66d1fc 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
@@ -1081,22 +1081,26 @@ public class CarbonTable implements Serializable {
 return dataSize + indexSize;
   }
 
-  public void processFilterExpression(Expression filterExpression,
-  boolean[] isFilterDimensions, boolean[] isFilterMeasures) {
-QueryModel.FilterProcessVO processVO =
-new QueryModel.FilterProcessVO(getDimensionByTableName(getTableName()),
-getMeasureByTableName(getTableName()), 
getImplicitDimensionByTableName(getTableName()));
-QueryModel.processFilterExpression(processVO, filterExpression, 
isFilterDimensions,
-isFilterMeasures, this);
-
+  public void processFilterExpression(Expression filterExpression, boolean[] 
isFilterDimensions,
+  boolean[] isFilterMeasures) {
+processFilterExpressionWithoutRange(filterExpression, isFilterDimensions, 
isFilterMeasures);
 if (null != filterExpression) {
   // Optimize Filter Expression and fit RANGE filters is conditions apply.
-  FilterOptimizer rangeFilterOptimizer =
-  new RangeFilterOptmizer(filterExpression);
+  FilterOptimizer rangeFilterOptimizer = new 
RangeFilterOptmizer(filterExpression);
   rangeFilterOptimizer.optimizeFilter();
 }
   }
 
+  public void processFilterExpressionWithoutRange(Expression filterExpression,
+  boolean[] isFilterDimensions, boolean[] isFilterMeasures) {
+QueryModel.FilterProcessVO processVO =
+new QueryModel.FilterProcessVO(getDimensionByTableName(getTableName()),
+getMeasureByTableName(getTableName

[carbondata] branch master updated: [CARBONDATA-3345]A growing streaming ROW_V1 carbondata file would be ingored some InputSplits

2019-05-07 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 0ab2412  [CARBONDATA-3345]A growing streaming ROW_V1 carbondata file 
would be ingored some InputSplits
0ab2412 is described below

commit 0ab2412b2f403392a7a17ce20a2327a35f4b8dd0
Author: junyan-zg <275620...@qq.com>
AuthorDate: Wed Apr 24 22:46:51 2019 +0800

[CARBONDATA-3345]A growing streaming ROW_V1 carbondata file would be 
ingored some InputSplits

After looking at carbondata segments, when the file grows to more than 150 
M (possibly 128M),
Presto initiates a query by separating several small files, including those 
in ROW_V1 format.
This bug causes some small files in ROW_V1 format to be ignored, resulting 
in inaccurate queries.
So for the carbondata ROW_V1 inputSplits MapKey(Java), I adjust concat 
'carbonInput.getStart()' to keeping the required inputSplit

This closes #3186
---
 .../org/apache/carbondata/presto/impl/CarbonTableReader.java | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git 
a/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java
 
b/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java
index 57d8d5e..7ffe053 100755
--- 
a/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java
+++ 
b/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java
@@ -46,6 +46,7 @@ import 
org.apache.carbondata.core.metadata.schema.table.CarbonTable;
 import org.apache.carbondata.core.metadata.schema.table.TableInfo;
 import org.apache.carbondata.core.reader.ThriftReader;
 import org.apache.carbondata.core.scan.expression.Expression;
+import org.apache.carbondata.core.statusmanager.FileFormat;
 import org.apache.carbondata.core.statusmanager.LoadMetadataDetails;
 import org.apache.carbondata.core.statusmanager.SegmentStatusManager;
 import org.apache.carbondata.core.util.CarbonProperties;
@@ -291,7 +292,13 @@ public class CarbonTableReader {
 // Use block distribution
 List> inputSplits = new ArrayList(
 result.stream().map(x -> (CarbonLocalInputSplit) 
x).collect(Collectors.groupingBy(
-carbonInput -> 
carbonInput.getSegmentId().concat(carbonInput.getPath(.values());
+carbonInput -> {
+  if (FileFormat.ROW_V1.equals(carbonInput.getFileFormat())) {
+return 
carbonInput.getSegmentId().concat(carbonInput.getPath())
+  .concat(carbonInput.getStart() + "");
+  }
+  return 
carbonInput.getSegmentId().concat(carbonInput.getPath());
+})).values());
 if (inputSplits != null) {
   for (int j = 0; j < inputSplits.size(); j++) {
 multiBlockSplitList.add(new 
CarbonLocalMultiBlockSplit(inputSplits.get(j),



[carbondata] branch master updated: [CARBONDATA-3359]Fix data mismatch issue for decimal column after delete operation

2019-05-02 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new eb7a833  [CARBONDATA-3359]Fix data mismatch issue for decimal column 
after delete operation
eb7a833 is described below

commit eb7a8335013957c9a615a48c7304b7968a2f7e24
Author: akashrn5 
AuthorDate: Thu Apr 25 15:16:35 2019 +0530

[CARBONDATA-3359]Fix data mismatch issue for decimal column after delete 
operation

Problem:
after delete operation is performed, the decimal column data is wrong. This 
is because, during filling vector for decimal column,
 we were not considering the deleted rows if present any, we were filling 
all the row data for decimal.

Solution
in case of decimal, get the vector from ColumnarVectorWrapperDirectFactory 
and then put data, which will take care of the deleted rows

This closes #3189
---
 .../metadata/datatype/DecimalConverterFactory.java | 55 +-
 .../src/test/resources/decimalData.csv |  4 ++
 .../testsuite/iud/DeleteCarbonTableTestCase.scala  | 17 +++
 3 files changed, 54 insertions(+), 22 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/metadata/datatype/DecimalConverterFactory.java
 
b/core/src/main/java/org/apache/carbondata/core/metadata/datatype/DecimalConverterFactory.java
index 9793c38..2e155f4 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/metadata/datatype/DecimalConverterFactory.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/metadata/datatype/DecimalConverterFactory.java
@@ -23,6 +23,7 @@ import java.util.BitSet;
 
 import org.apache.carbondata.core.scan.result.vector.CarbonColumnVector;
 import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo;
+import 
org.apache.carbondata.core.scan.result.vector.impl.directread.ColumnarVectorWrapperDirectFactory;
 import org.apache.carbondata.core.util.ByteUtil;
 import org.apache.carbondata.core.util.DataTypeUtil;
 
@@ -102,13 +103,13 @@ public final class DecimalConverterFactory {
   return BigDecimal.valueOf((Long) valueToBeConverted, scale);
 }
 
-@Override public void fillVector(Object valuesToBeConverted, int size, 
ColumnVectorInfo info,
-BitSet nullBitset, DataType pageType) {
+@Override public void fillVector(Object valuesToBeConverted, int size,
+ColumnVectorInfo vectorInfo, BitSet nullBitSet, DataType pageType) {
   // TODO we need to find way to directly set to vector with out 
conversion. This way is very
   // inefficient.
-  CarbonColumnVector vector = info.vector;
-  int precision = info.measure.getMeasure().getPrecision();
-  int newMeasureScale = info.measure.getMeasure().getScale();
+  CarbonColumnVector vector = getCarbonColumnVector(vectorInfo, 
nullBitSet);
+  int precision = vectorInfo.measure.getMeasure().getPrecision();
+  int newMeasureScale = vectorInfo.measure.getMeasure().getScale();
   if (!(valuesToBeConverted instanceof byte[])) {
 throw new UnsupportedOperationException("This object type " + 
valuesToBeConverted.getClass()
 + " is not supported in this method");
@@ -116,7 +117,7 @@ public final class DecimalConverterFactory {
   byte[] data = (byte[]) valuesToBeConverted;
   if (pageType == DataTypes.BYTE) {
 for (int i = 0; i < size; i++) {
-  if (nullBitset.get(i)) {
+  if (nullBitSet.get(i)) {
 vector.putNull(i);
   } else {
 BigDecimal value = BigDecimal.valueOf(data[i], scale);
@@ -128,7 +129,7 @@ public final class DecimalConverterFactory {
 }
   } else if (pageType == DataTypes.SHORT) {
 for (int i = 0; i < size; i++) {
-  if (nullBitset.get(i)) {
+  if (nullBitSet.get(i)) {
 vector.putNull(i);
   } else {
 BigDecimal value = BigDecimal
@@ -142,7 +143,7 @@ public final class DecimalConverterFactory {
 }
   } else if (pageType == DataTypes.SHORT_INT) {
 for (int i = 0; i < size; i++) {
-  if (nullBitset.get(i)) {
+  if (nullBitSet.get(i)) {
 vector.putNull(i);
   } else {
 BigDecimal value = BigDecimal
@@ -156,7 +157,7 @@ public final class DecimalConverterFactory {
 }
   } else if (pageType == DataTypes.INT) {
 for (int i = 0; i < size; i++) {
-  if (nullBitset.get(i)) {
+  if (nullBitSet.get(i)) {
 vector.putNull(i);
   } else {
 BigDecimal value = BigDecimal
@@ -170,7 +171,7 @@ public final class DecimalConverterFactory {
 }
   } else if (pageType == DataTypes.LONG) {
 for (int i = 0; i < size; i++) {
-  if (nullBitset.get(i)) {
+  if (nullBitSet.get(i)) {
 v

[carbondata] branch master updated: [CARBONDATA-3341] fixed invalid NULL result in filter query

2019-04-15 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new a6ab97c  [CARBONDATA-3341] fixed invalid NULL result in filter query
a6ab97c is described below

commit a6ab97ca40427af5225f12a063a0e44221a503e1
Author: kunal642 
AuthorDate: Thu Apr 4 11:53:05 2019 +0530

[CARBONDATA-3341] fixed invalid NULL result in filter query

Problem: When vector filter push down is true and the table contains a null 
value
then thegetNullBitSet method is giving an byte[]to represent null.
But there is no check for the value of the bitset.

Solution: Check if null bit set length is 0 then set the same to the 
chunkData.

This closes #3172
---
 .../core/datastore/chunk/store/ColumnPageWrapper.java  |  7 ++-
 .../spark/testsuite/sortcolumns/TestSortColumns.scala  | 14 ++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
index a1c4aec..f4d3fe4 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/ColumnPageWrapper.java
@@ -261,7 +261,12 @@ public class ColumnPageWrapper implements 
DimensionColumnPage {
   // if the compare value is null and the data is also null we can 
directly return 0
   return 0;
 } else {
-  byte[] chunkData = this.getChunkDataInBytes(rowId);
+  byte[] chunkData;
+  if (nullBitSet != null && nullBitSet.length == 0) {
+chunkData = nullBitSet;
+  } else {
+chunkData = this.getChunkDataInBytes(rowId);
+  }
   return ByteUtil.UnsafeComparer.INSTANCE.compareTo(chunkData, 
compareValue);
 }
   }
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
index df97d0f..bbd58c0 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/sortcolumns/TestSortColumns.scala
@@ -385,6 +385,17 @@ class TestSortColumns extends QueryTest with 
BeforeAndAfterAll {
 "sort_columns is unsupported for double datatype column: empno"))
   }
 
+  test("test if equal to 0 filter on sort column gives correct result") {
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR,
+  "true")
+sql("create table test1(a bigint) stored by 'carbondata' 
TBLPROPERTIES('sort_columns'='a')")
+sql("insert into test1 select 'k'")
+sql("insert into test1 select '1'")
+assert(sql("select * from test1 where a = 1 or a = 0").count() == 1)
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR,
+  CarbonCommonConstants.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT)
+  }
+
   override def afterAll = {
 dropTestTables
 CarbonProperties.getInstance().addProperty(
@@ -392,9 +403,12 @@ class TestSortColumns extends QueryTest with 
BeforeAndAfterAll {
 CarbonProperties.getInstance()
   .addProperty(CarbonCommonConstants.LOAD_SORT_SCOPE,
 CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT)
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR,
+  CarbonCommonConstants.CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT)
   }
 
   def dropTestTables = {
+sql("drop table if exists test1")
 sql("drop table if exists sortint")
 sql("drop table if exists sortint1")
 sql("drop table if exists sortlong")



[carbondata] branch master updated: [CARBONDATA-3302] [Spark-Integration] code cleaning related to CarbonCreateTable command

2019-03-20 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 66982f3  [CARBONDATA-3302] [Spark-Integration] code cleaning related 
to CarbonCreateTable command
66982f3 is described below

commit 66982f342865e7bd8c256630cf4b6d38ec62890a
Author: s71955 
AuthorDate: Sun Feb 24 21:45:16 2019 +0530

[CARBONDATA-3302] [Spark-Integration] code cleaning related to 
CarbonCreateTable command

What changes were proposed in this pull request?
Removed Extra check to validate whether the stream relation is not null , 
moreover condition can be optimized further,
currently the condition has path validation whether path is part of s3 file 
system and then system is checking
whether the stream relation is not null, this check can be added initially 
as this overall
condition has to be evaluated for stream table only if stream is not null.

This closes #3134
---
 .../spark/sql/execution/command/table/CarbonCreateTableCommand.scala   | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableCommand.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableCommand.scala
index 12eb420..1e17ffe 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableCommand.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/table/CarbonCreateTableCommand.scala
@@ -78,8 +78,7 @@ case class CarbonCreateTableCommand(
 path
   }
   val streaming = 
tableInfo.getFactTable.getTableProperties.get("streaming")
-  if (path.startsWith("s3") && streaming != null && streaming != null &&
-  streaming.equalsIgnoreCase("true")) {
+  if (streaming != null && streaming.equalsIgnoreCase("true") && 
path.startsWith("s3")) {
 throw new UnsupportedOperationException("streaming is not supported 
with s3 store")
   }
   tableInfo.setTablePath(tablePath)



[carbondata] branch master updated: [CARBONDATA-3297] Fix that the IndexoutOfBoundsException when creating table and dropping table are at the same time

2019-03-12 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 6840a18  [CARBONDATA-3297] Fix that the IndexoutOfBoundsException when 
creating table and dropping table are at the same time
6840a18 is described below

commit 6840a183689ad6acd86e1850dedff8665bf126ae
Author: qiuchenjian <807169...@qq.com>
AuthorDate: Wed Feb 20 17:16:34 2019 +0800

[CARBONDATA-3297] Fix that the IndexoutOfBoundsException when creating 
table and dropping table are at the same time

[Problem]
Throw the IndexoutOfBoundsException when creating table and dropping table 
are at the same time

[Solution]
The type of carbonTables in MetaData.class is ArrayBuffer, and the 
ArrayBuffer is not thread-safe,
so it throw this exception when creating table and dropping table are at 
the same time

Use read write lock to guarantee the thread-safe

This closes #3130
---
 .../spark/sql/hive/CarbonFileMetastore.scala   | 37 +++---
 1 file changed, 33 insertions(+), 4 deletions(-)

diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
index c1be154..ea3bba8 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
@@ -19,6 +19,7 @@ package org.apache.spark.sql.hive
 
 import java.io.IOException
 import java.net.URI
+import java.util.concurrent.locks.{Lock, ReentrantReadWriteLock}
 
 import scala.collection.mutable.ArrayBuffer
 
@@ -43,7 +44,8 @@ import 
org.apache.carbondata.core.fileoperations.FileWriteOperation
 import org.apache.carbondata.core.metadata.{AbsoluteTableIdentifier, 
CarbonMetadata, CarbonTableIdentifier}
 import 
org.apache.carbondata.core.metadata.converter.ThriftWrapperSchemaConverterImpl
 import org.apache.carbondata.core.metadata.schema
-import org.apache.carbondata.core.metadata.schema.{table, SchemaReader}
+import org.apache.carbondata.core.metadata.schema.SchemaReader
+import org.apache.carbondata.core.metadata.schema.table
 import org.apache.carbondata.core.metadata.schema.table.CarbonTable
 import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
 import org.apache.carbondata.core.util.path.CarbonTablePath
@@ -53,9 +55,16 @@ import org.apache.carbondata.format.{SchemaEvolutionEntry, 
TableInfo}
 import org.apache.carbondata.spark.util.CarbonSparkUtil
 
 case class MetaData(var carbonTables: ArrayBuffer[CarbonTable]) {
+  // use to lock the carbonTables
+  val lock : ReentrantReadWriteLock = new ReentrantReadWriteLock
+  val readLock: Lock = lock.readLock()
+  val writeLock: Lock = lock.writeLock()
+
   // clear the metadata
   def clear(): Unit = {
+writeLock.lock()
 carbonTables.clear()
+writeLock.unlock()
   }
 }
 
@@ -192,9 +201,12 @@ class CarbonFileMetastore extends CarbonMetaStore {
* @return
*/
   def getTableFromMetadataCache(database: String, tableName: String): 
Option[CarbonTable] = {
-metadata.carbonTables
+metadata.readLock.lock()
+val ret = metadata.carbonTables
   .find(table => table.getDatabaseName.equalsIgnoreCase(database) &&
 table.getTableName.equalsIgnoreCase(tableName))
+metadata.readLock.unlock()
+ret
   }
 
   def tableExists(
@@ -270,11 +282,14 @@ class CarbonFileMetastore extends CarbonMetaStore {
   }
 }
 
+
 wrapperTableInfo.map { tableInfo =>
   CarbonMetadata.getInstance().removeTable(tableUniqueName)
   CarbonMetadata.getInstance().loadTableMetadata(tableInfo)
   val carbonTable = 
CarbonMetadata.getInstance().getCarbonTable(tableUniqueName)
+  metadata.writeLock.lock()
   metadata.carbonTables += carbonTable
+  metadata.writeLock.unlock()
   carbonTable
 }
   }
@@ -413,8 +428,11 @@ class CarbonFileMetastore extends CarbonMetaStore {
 CarbonMetadata.getInstance.removeTable(tableInfo.getTableUniqueName)
 removeTableFromMetadata(identifier.getDatabaseName, 
identifier.getTableName)
 CarbonMetadata.getInstance().loadTableMetadata(tableInfo)
+metadata.writeLock.lock()
 metadata.carbonTables +=
   
CarbonMetadata.getInstance().getCarbonTable(identifier.getTableUniqueName)
+metadata.writeLock.unlock()
+metadata.carbonTables
   }
 
   /**
@@ -427,7 +445,9 @@ class CarbonFileMetastore extends CarbonMetaStore {
 val carbonTableToBeRemoved: Option[CarbonTable] = 
getTableFromMetadataCache(dbName, tableName)
 carbonTableToBeRemoved match {
   case Some(carbonTable) =>
+metadata.writeLock.lock()
 metadata.carbonTables -= carbonTable
+metadata.writeLock.un

[carbondata] branch master updated: [CARBONDATA-3307] Fix Performance Issue in No Sort

2019-03-12 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new f5e4793  [CARBONDATA-3307] Fix Performance Issue in No Sort
f5e4793 is described below

commit f5e4793bda2324f8417afc4fc7aaeb09acdea2a0
Author: shivamasn 
AuthorDate: Wed Mar 6 19:03:01 2019 +0530

[CARBONDATA-3307] Fix Performance Issue in No Sort

When creating the table without sort_columns and loading the data into it, 
it is generating more carbondata
files than expected. Now the no. of carbondata files is being generated 
based on the no. of threads launched.
Each thread is initialising its own writer and writing data.

Now we pass the same writer instance to all the threads, so all the threads 
will write the data to same file.

This closes #3140
---
 .../CarbonRowDataWriterProcessorStepImpl.java  | 61 ++
 1 file changed, 29 insertions(+), 32 deletions(-)

diff --git 
a/processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java
 
b/processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java
index f976abe..184248c 100644
--- 
a/processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java
+++ 
b/processing/src/main/java/org/apache/carbondata/processing/loading/steps/CarbonRowDataWriterProcessorStepImpl.java
@@ -18,9 +18,7 @@ package org.apache.carbondata.processing.loading.steps;
 
 import java.io.IOException;
 import java.util.Iterator;
-import java.util.List;
 import java.util.Map;
-import java.util.concurrent.CopyOnWriteArrayList;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
 import java.util.concurrent.Future;
@@ -83,16 +81,17 @@ public class CarbonRowDataWriterProcessorStepImpl extends 
AbstractDataLoadProces
 
   private Map localDictionaryGeneratorMap;
 
-  private List carbonFactHandlers;
+  private CarbonFactHandler dataHandler;
 
   private ExecutorService executorService = null;
 
+  private static final Object lock = new Object();
+
   public CarbonRowDataWriterProcessorStepImpl(CarbonDataLoadConfiguration 
configuration,
   AbstractDataLoadProcessorStep child) {
 super(configuration, child);
 this.localDictionaryGeneratorMap =
 
CarbonUtil.getLocalDictionaryModel(configuration.getTableSpec().getCarbonTable());
-this.carbonFactHandlers = new CopyOnWriteArrayList<>();
   }
 
   @Override public void initialize() throws IOException {
@@ -129,20 +128,31 @@ public class CarbonRowDataWriterProcessorStepImpl extends 
AbstractDataLoadProces
   
.recordDictionaryValue2MdkAdd2FileTime(CarbonTablePath.DEPRECATED_PARTITION_ID,
   System.currentTimeMillis());
 
+  //Creating a Instance of CarbonFacthandler that will be passed to all 
the threads
+  String[] storeLocation = getStoreLocation();
+  DataMapWriterListener listener = getDataMapWriterListener(0);
+  CarbonFactDataHandlerModel model = CarbonFactDataHandlerModel
+  .createCarbonFactDataHandlerModel(configuration, storeLocation, 0, 
0, listener);
+  model.setColumnLocalDictGenMap(localDictionaryGeneratorMap);
+  dataHandler = CarbonFactHandlerFactory.createCarbonFactHandler(model);
+  dataHandler.initialise();
+
   if (iterators.length == 1) {
-doExecute(iterators[0], 0);
+doExecute(iterators[0], 0, dataHandler);
   } else {
 executorService = Executors.newFixedThreadPool(iterators.length,
 new CarbonThreadFactory("NoSortDataWriterPool:" + 
configuration.getTableIdentifier()
 .getCarbonTableIdentifier().getTableName()));
 Future[] futures = new Future[iterators.length];
 for (int i = 0; i < iterators.length; i++) {
-  futures[i] = executorService.submit(new 
DataWriterRunnable(iterators[i], i));
+  futures[i] = executorService.submit(new 
DataWriterRunnable(iterators[i], i, dataHandler));
 }
 for (Future future : futures) {
   future.get();
 }
   }
+  finish(dataHandler, 0);
+  dataHandler = null;
 } catch (CarbonDataWriterException e) {
   LOGGER.error("Failed for table: " + tableName + " in 
DataWriterProcessorStepImpl", e);
   throw new CarbonDataLoadingException(
@@ -157,31 +167,15 @@ public class CarbonRowDataWriterProcessorStepImpl extends 
AbstractDataLoadProces
 return null;
   }
 
-  private void doExecute(Iterator iterator, int iteratorIndex) 
throws IOException {
-String[] storeLocation = getStoreLocation();
-DataMapWriterListener listener = getDataMapWriterListener(0);
-CarbonFactDataHandlerModel model = 
CarbonFactDataHandle

[carbondata] branch master updated: [CARBONDATA-3280] Fix the issue of SDK assert can't work

2019-01-30 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new ba139b6  [CARBONDATA-3280] Fix the issue of SDK assert can't work
ba139b6 is described below

commit ba139b642d266e0767fedd4fb53c16d198b26d35
Author: xubo245 
AuthorDate: Tue Jan 29 11:36:48 2019 +0800

[CARBONDATA-3280] Fix the issue of SDK assert can't work

After PR-3097 merged, the batch rule has been changed, but the test didn't 
work, such as:

org.apache.carbondata.sdk.file.CarbonReaderTest#testReadNextBatchRow

org.apache.carbondata.sdk.file.CarbonReaderTest#testReadNextBatchRowWithVectorReader
So this PR fixed the test error and add some assert

This closes #3112
---
 .../carbondata/core/util/CarbonProperties.java |  2 +-
 .../carbondata/sdk/file/CarbonReaderTest.java  | 90 +++---
 2 files changed, 63 insertions(+), 29 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java 
b/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
index 49388b7..b337e40 100644
--- a/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
+++ b/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
@@ -1572,7 +1572,7 @@ public final class CarbonProperties {
   try {
 batchSize = Integer.parseInt(batchSizeString);
 if (batchSize < DETAIL_QUERY_BATCH_SIZE_MIN || batchSize > 
DETAIL_QUERY_BATCH_SIZE_MAX) {
-  LOGGER.info("Invalid carbon.detail.batch.size.Using default value "
+  LOGGER.warn("Invalid carbon.detail.batch.size.Using default value "
   + DETAIL_QUERY_BATCH_SIZE_DEFAULT);
   carbonProperties.setProperty(DETAIL_QUERY_BATCH_SIZE,
   Integer.toString(DETAIL_QUERY_BATCH_SIZE_DEFAULT));
diff --git 
a/store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java 
b/store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java
index 28944da..871d51b 100644
--- 
a/store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java
+++ 
b/store/sdk/src/test/java/org/apache/carbondata/sdk/file/CarbonReaderTest.java
@@ -104,7 +104,7 @@ public class CarbonReaderTest extends TestCase {
 FileUtils.deleteDirectory(new File(path));
   }
 
-  @Test public void testReadWithZeroBatchSize() throws IOException, 
InterruptedException {
+  @Test public void testReadWithZeroBatchSize() throws Exception {
 String path = "./testWriteFiles";
 FileUtils.deleteDirectory(new File(path));
 
DataMapStoreManager.getInstance().clearDataMaps(AbsoluteTableIdentifier.from(path));
@@ -127,6 +127,30 @@ public class CarbonReaderTest extends TestCase {
 FileUtils.deleteDirectory(new File(path));
   }
 
+
+  @Test
+  public void testReadBatchWithZeroBatchSize() throws Exception {
+String path = "./testWriteFiles";
+FileUtils.deleteDirectory(new File(path));
+
DataMapStoreManager.getInstance().clearDataMaps(AbsoluteTableIdentifier.from(path));
+Field[] fields = new Field[2];
+fields[0] = new Field("name", DataTypes.STRING);
+fields[1] = new Field("age", DataTypes.INT);
+
+TestUtil.writeFilesAndVerify(10, new Schema(fields), path);
+CarbonReader reader;
+reader = 
CarbonReader.builder(path).withRowRecordReader().withBatch(0).build();
+
+int i = 0;
+while (reader.hasNext()) {
+  Object[] row = reader.readNextBatchRow();
+  Assert.assertEquals(row.length, 10);
+  i++;
+}
+Assert.assertEquals(i, 1);
+FileUtils.deleteDirectory(new File(path));
+  }
+
   @Test
   public void testReadWithFilterOfNonTransactionalSimple() throws IOException, 
InterruptedException {
 String path = "./testWriteFiles";
@@ -532,6 +556,7 @@ public class CarbonReaderTest extends TestCase {
   .withCsvInput(schema).writtenBy("CarbonReaderTest").build();
 } catch (InvalidLoadOptionException e) {
   e.printStackTrace();
+  Assert.fail(e.getMessage());
 }
 carbonWriter.write(new String[] { "MNO", "100" });
 carbonWriter.close();
@@ -546,22 +571,25 @@ public class CarbonReaderTest extends TestCase {
.withCsvInput(schema1).writtenBy("CarbonReaderTest").build();
 } catch (InvalidLoadOptionException e) {
   e.printStackTrace();
+  Assert.fail(e.getMessage());
 }
 carbonWriter1.write(new String[] { "PQR", "200" });
 carbonWriter1.close();
 
 try {
-   CarbonReader reader =
-   CarbonReader.builder(path1, "_temp").
-   projection(new String[] { "c1", "c3" })
-   .build();
-} catch (Exception e){
-   System.out.println("Success"

[carbondata] branch master updated: [HOTFIX] Upgraded jars to work S3 with presto

2019-01-29 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 3f63f91  [HOTFIX] Upgraded jars to work S3 with presto
3f63f91 is described below

commit 3f63f91915d5da9d94a5c912b5415f230be64c07
Author: ravipesala 
AuthorDate: Sun Jan 27 15:12:29 2019 +0530

[HOTFIX] Upgraded jars to work S3 with presto

There is a duplicate jar aws-java-sdk and low version jars avoid connecting 
to S3 in presto. Those jars are upgraded in this PR and updated doc.

This closes #3110
---
 .../statusmanager/SegmentUpdateStatusManager.java  |  3 ++-
 docs/presto-guide.md   | 18 ---
 integration/presto/pom.xml | 27 --
 3 files changed, 16 insertions(+), 32 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
 
b/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
index c5f5f74..a02e903 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/statusmanager/SegmentUpdateStatusManager.java
@@ -52,6 +52,7 @@ import org.apache.carbondata.core.util.CarbonUtil;
 import org.apache.carbondata.core.util.path.CarbonTablePath;
 
 import com.google.gson.Gson;
+import org.apache.commons.lang3.StringUtils;
 import org.apache.log4j.Logger;
 
 /**
@@ -655,7 +656,7 @@ public class SegmentUpdateStatusManager {
 // get the updated status file identifier from the table status.
 String tableUpdateStatusIdentifier = getUpdatedStatusIdentifier();
 
-if (null == tableUpdateStatusIdentifier) {
+if (StringUtils.isEmpty(tableUpdateStatusIdentifier)) {
   return new SegmentUpdateDetails[0];
 }
 
diff --git a/docs/presto-guide.md b/docs/presto-guide.md
index 054f29f..7389bc6 100644
--- a/docs/presto-guide.md
+++ b/docs/presto-guide.md
@@ -254,23 +254,15 @@ Now you can use the Presto CLI on the coordinator to 
query data sources in the c
```
 Required properties
 
-fs.s3a.access.key={value}
-fs.s3a.secret.key={value}
+hive.s3.aws-access-key={value}
+hive.s3.aws-secret-key={value}
 
 Optional properties
 
-fs.s3a.endpoint={value}
+hive.s3.endpoint={value}
```
- * In case you want to query carbonstore on s3 using S3 api put following 
additional properties inside $PRESTO_HOME$/etc/catalog/carbondata.properties 
-```
-  fs.s3.awsAccessKeyId={value}
-  fs.s3.awsSecretAccessKey={value}
-```
-  * In case You want to query carbonstore on s3 using S3N api put following 
additional properties inside $PRESTO_HOME$/etc/catalog/carbondata.properties 
-```
-fs.s3n.awsAccessKeyId={value}
-fs.s3n.awsSecretAccessKey={value}
- ```
+   
+   Please refer https://prestodb.io/docs/current/connector/hive.html 
for more details on S3 integration.
 
 ### Generate CarbonData file
 
diff --git a/integration/presto/pom.xml b/integration/presto/pom.xml
index d69515d..8a9c06d 100644
--- a/integration/presto/pom.xml
+++ b/integration/presto/pom.xml
@@ -32,6 +32,7 @@
 
   
 0.210
+4.4.9
 ${basedir}/../../dev
 true
   
@@ -376,7 +377,7 @@
 
   com.facebook.presto.hadoop
   hadoop-apache2
-  2.7.3-1
+  2.7.4-3
   
 
   org.antlr
@@ -522,23 +523,8 @@
   jackson-core
 
 
-  com.fasterxml.jackson.core
-  jackson-annotations
-
-
-  com.fasterxml.jackson.core
-  jackson-databind
-
-  
-
-
-  com.amazonaws
-  aws-java-sdk
-  1.7.4
-  
-
-  com.fasterxml.jackson.core
-  jackson-core
+  com.amazonaws
+  aws-java-sdk
 
 
   com.fasterxml.jackson.core
@@ -560,6 +546,11 @@
   httpcore
   ${httpcore.version}
 
+
+  org.apache.httpcomponents
+  httpclient
+  4.5.5
+
   
 
   



[carbondata] branch master updated: [CARBONDATA-3235] Fixed Alter Table Rename

2019-01-28 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 75d9eae  [CARBONDATA-3235] Fixed Alter Table Rename
75d9eae is described below

commit 75d9eae88dd9d2fba9814889d37a24f0b7cd9405
Author: namanrastogi 
AuthorDate: Wed Jan 23 17:57:35 2019 +0530

[CARBONDATA-3235] Fixed Alter Table Rename

Fixed negative scenario: Alter Table Rename Table Fail

Problem: When tabe rename is success in hive, for failed in carbon data 
store, it would throw exception, but would not go back and undo rename in hive.

Solution: A flag to keep check if hive rename has already executed, and of 
the code breaks after hive rename is done, go back and undo the hive rename.

This closes #3098
---
 .../schema/CarbonAlterTableRenameCommand.scala | 34 +++---
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
index 01698c9..33f3cd9 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
@@ -43,10 +43,12 @@ private[sql] case class CarbonAlterTableRenameCommand(
 
   override def processMetadata(sparkSession: SparkSession): Seq[Nothing] = {
 val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getCanonicalName)
-val oldTableIdentifier = alterTableRenameModel.oldTableIdentifier
-val newTableIdentifier = alterTableRenameModel.newTableIdentifier
-val oldDatabaseName = oldTableIdentifier.database
+val oldTableName = 
alterTableRenameModel.oldTableIdentifier.table.toLowerCase
+val newTableName = 
alterTableRenameModel.newTableIdentifier.table.toLowerCase
+val oldDatabaseName = alterTableRenameModel.oldTableIdentifier.database
   .getOrElse(sparkSession.catalog.currentDatabase)
+val oldTableIdentifier = TableIdentifier(oldTableName, 
Some(oldDatabaseName))
+val newTableIdentifier = TableIdentifier(newTableName, 
Some(oldDatabaseName))
 setAuditTable(oldDatabaseName, oldTableIdentifier.table)
 setAuditInfo(Map("newName" -> 
alterTableRenameModel.newTableIdentifier.table))
 val newDatabaseName = newTableIdentifier.database
@@ -59,8 +61,6 @@ private[sql] case class CarbonAlterTableRenameCommand(
   throw new MalformedCarbonCommandException(s"Table with name 
$newTableIdentifier " +
 s"already exists")
 }
-val oldTableName = oldTableIdentifier.table.toLowerCase
-val newTableName = newTableIdentifier.table.toLowerCase
 LOGGER.info(s"Rename table request has been received for 
$oldDatabaseName.$oldTableName")
 val metastore = CarbonEnv.getInstance(sparkSession).carbonMetaStore
 val relation: CarbonRelation =
@@ -108,8 +108,8 @@ private[sql] case class CarbonAlterTableRenameCommand(
 dataMapSchemaList.addAll(indexSchemas)
   }
   // invalid data map for the old table, see CARBON-1690
-  val oldTableIdentifier = carbonTable.getAbsoluteTableIdentifier
-  DataMapStoreManager.getInstance().clearDataMaps(oldTableIdentifier)
+  val oldAbsoluteTableIdentifier = carbonTable.getAbsoluteTableIdentifier
+  
DataMapStoreManager.getInstance().clearDataMaps(oldAbsoluteTableIdentifier)
   // get the latest carbon table and check for column existence
   val operationContext = new OperationContext
   // TODO: Pass new Table Path in pre-event.
@@ -125,7 +125,7 @@ private[sql] case class CarbonAlterTableRenameCommand(
   schemaEvolutionEntry.setTableName(newTableName)
   timeStamp = System.currentTimeMillis()
   schemaEvolutionEntry.setTime_stamp(timeStamp)
-  val newTableIdentifier = new CarbonTableIdentifier(oldDatabaseName,
+  val newCarbonTableIdentifier = new CarbonTableIdentifier(oldDatabaseName,
 newTableName, carbonTable.getCarbonTableIdentifier.getTableId)
   val oldIdentifier = TableIdentifier(oldTableName, Some(oldDatabaseName))
   val newIdentifier = TableIdentifier(newTableName, Some(oldDatabaseName))
@@ -133,17 +133,17 @@ private[sql] case class CarbonAlterTableRenameCommand(
   var partitions: Seq[CatalogTablePartition] = Seq.empty
   if (carbonTable.isHivePartitionTable) {
 partitions =
-  sparkSession.sessionState.catalog.listPartitions(oldIdentifier)
+  sparkSession.sessionState.catalog.listPartitions(oldTableIdentifier)
   }
-  sparkSession.cata

[carbondata] branch master updated: [CARBONDATA-3264] Added SORT_SCOPE in ALTER TABLE SET

2019-01-25 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 8e39ee1  [CARBONDATA-3264] Added SORT_SCOPE in ALTER TABLE SET
8e39ee1 is described below

commit 8e39ee113236b7c48b8a0a46777cafc771701d9f
Author: namanrastogi 
AuthorDate: Tue Jan 22 11:42:40 2019 +0530

[CARBONDATA-3264] Added SORT_SCOPE in ALTER TABLE SET

Added SORT_SCOPE in ALTER TABLE SET Command.
This command changes the SORT_SCOPE of table after table has been created.

Usage:

ALTER TABLE  SET TBLPROPERTIES('sort_scope'='no_sort')
Restrictions:

Cannot change SORT_SCOPE from NO_SORT to anything else when SORT_COLUMNS is 
empty.

This closes #3094
---
 docs/ddl-of-carbondata.md  | 58 +++---
 .../org/apache/spark/util/AlterTableUtil.scala | 33 +--
 .../restructure/AlterTableValidationTestCase.scala | 69 ++
 3 files changed, 134 insertions(+), 26 deletions(-)

diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md
index 4f9e47b..0d0e5bd 100644
--- a/docs/ddl-of-carbondata.md
+++ b/docs/ddl-of-carbondata.md
@@ -51,7 +51,7 @@ CarbonData DDL statements are documented here,which includes:
 * [RENAME COLUMN](#change-column-nametype)
 * [CHANGE COLUMN NAME/TYPE](#change-column-nametype)
 * [MERGE INDEXES](#merge-index)
-* [SET/UNSET Local Dictionary 
Properties](#set-and-unset-for-local-dictionary-properties)
+* [SET/UNSET](#set-and-unset)
   * [DROP TABLE](#drop-table)
   * [REFRESH TABLE](#refresh-table)
   * [COMMENTS](#table-and-column-comment)
@@ -634,7 +634,7 @@ CarbonData DDL statements are documented here,which 
includes:
 
   The following section introduce the commands to modify the physical or 
logical state of the existing table(s).
 
-   - # RENAME TABLE
+   -  RENAME TABLE

  This command is used to rename the existing table.
  ```
@@ -648,7 +648,7 @@ CarbonData DDL statements are documented here,which 
includes:
  ALTER TABLE test_db.carbon RENAME TO test_db.carbonTable
  ```
 
-   - # ADD COLUMNS
+   -  ADD COLUMNS

  This command is used to add a new column to the existing table.
  ```
@@ -676,7 +676,7 @@ Users can specify which columns to include and exclude for 
local dictionary gene
  ALTER TABLE carbon ADD COLUMNS (a1 STRING, b1 STRING) 
TBLPROPERTIES('LOCAL_DICTIONARY_INCLUDE'='a1','LOCAL_DICTIONARY_EXCLUDE'='b1')
  ```
 
-   - # DROP COLUMNS
+   -  DROP COLUMNS

  This command is used to delete the existing column(s) in a table.
 
@@ -696,7 +696,7 @@ Users can specify which columns to include and exclude for 
local dictionary gene
 
  **NOTE:** Drop Complex child column is not supported.
 
-   - # CHANGE COLUMN NAME/TYPE
+   -  CHANGE COLUMN NAME/TYPE

  This command is used to change column name and the data type from INT to 
BIGINT or decimal precision from lower to higher.
  Change of decimal data type from lower precision to higher precision will 
only be supported for cases where there is no data loss.
@@ -729,7 +729,8 @@ Users can specify which columns to include and exclude for 
local dictionary gene
  ```
 
  **NOTE:** Once the column is renamed, user has to take care about 
replacing the fileheader with the new name or changing the column header in csv 
file.
-- # MERGE INDEX
+   
+   -  MERGE INDEX
 
  This command is used to merge all the CarbonData index files 
(.carbonindex) inside a segment to a single CarbonData index merge file 
(.carbonindexmerge). This enhances the first query performance.
 
@@ -747,23 +748,36 @@ Users can specify which columns to include and exclude 
for local dictionary gene
 
  * Merge index is not supported on streaming table.
 
-- # SET and UNSET for Local Dictionary Properties
-
-   When set command is used, all the newly set properties will override the 
corresponding old properties if exists.
-  
-   Example to SET Local Dictionary Properties:
-   ```
-   ALTER TABLE tablename SET 
TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE'='false','LOCAL_DICTIONARY_THRESHOLD'='1000','LOCAL_DICTIONARY_INCLUDE'='column1','LOCAL_DICTIONARY_EXCLUDE'='column2')
-   ```
-   When Local Dictionary properties are unset, corresponding default values 
will be used for these properties.
+   -  SET and UNSET

-   Example to UNSET Local Dictionary Properties:
-   ```
-   ALTER TABLE tablename UNSET 
TBLPROPERTIES('LOCAL_DICTIONARY_ENABLE','LOCAL_DICTIONARY_THRESHOLD','LOCAL_DICTIONARY_INCLUDE','LOCAL_DICTIONARY_EXCLUDE')
-   ```
-   
-   **NOTE:** For old tables, by default, local dictionary is disabled. If user 
wants local dictionary for these tables, user can enable/disable local 
dictionary for new data at their discretion

[carbondata] branch master updated: [CARBONDATA-3257] Fix for NO_SORT load and describe formatted being in NO_SORT flow even with Sort Columns given

2019-01-23 Thread kumarvishal09
This is an automated email from the ASF dual-hosted git repository.

kumarvishal09 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git


The following commit(s) were added to refs/heads/master by this push:
 new 7916aa6  [CARBONDATA-3257] Fix for NO_SORT load and describe formatted 
being in NO_SORT flow even with Sort Columns given
7916aa6 is described below

commit 7916aa67f9cdbc171300f45137aed6e38e76d749
Author: manishnalla1994 
AuthorDate: Mon Jan 21 17:23:37 2019 +0530

[CARBONDATA-3257] Fix for NO_SORT load and describe formatted being in 
NO_SORT flow even with Sort Columns given

Problem: Data Load is in No sort flow when version is upgraded even if sort 
columns are given. Also describe formatted displays wrong sort scope after 
refresh.

Solution: Added a condition to check for the presence of Sort Columns.

This closes #3083
---
 .../core/constants/CarbonCommonConstants.java  |  1 +
 .../sdv/generated/SetParameterTestCase.scala   |  8 +++---
 .../command/carbonTableSchemaCommon.scala  | 12 -
 .../command/management/CarbonLoadDataCommand.scala | 31 +++---
 .../table/CarbonDescribeFormattedCommand.scala | 18 ++---
 5 files changed, 42 insertions(+), 28 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index b7d9761..86bf5f1 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -426,6 +426,7 @@ public final class CarbonCommonConstants {
*/
   public static final String DICTIONARY_PATH = "dictionary_path";
   public static final String SORT_COLUMNS = "sort_columns";
+  public static final String SORT_SCOPE = "sort_scope";
   public static final String RANGE_COLUMN = "range_column";
   public static final String PARTITION_TYPE = "partition_type";
   public static final String NUM_PARTITIONS = "num_partitions";
diff --git 
a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala
 
b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala
index 8c336d8..54d9e3f 100644
--- 
a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala
+++ 
b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/SetParameterTestCase.scala
@@ -209,11 +209,11 @@ class SetParameterTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql("SET carbon.options.sort.scope=local_sort")
 sql(
   "create table carbon_table(empno int, empname String, designation 
String, doj Timestamp," +
-  "workgroupcategory int) STORED BY 'org.apache.carbondata.format'")
-checkExistence(sql("DESC FORMATTED carbon_table"), true, "LOCAL_SORT")
-val sortscope=sql("DESC FORMATTED 
carbon_table").collect().filter(_.getString(1).trim.equals("LOCAL_SORT"))
+  "workgroupcategory int) STORED BY 'org.apache.carbondata.format' 
TBLPROPERTIES('SORT_COLUMNS'='empno,empname')")
+checkExistence(sql("DESC FORMATTED carbon_table"), true, "local_sort")
+val sortscope=sql("DESC FORMATTED 
carbon_table").collect().filter(_.getString(1).trim.equals("local_sort"))
 assertResult(1)(sortscope.length)
-assertResult("LOCAL_SORT")(sortscope(0).getString(1).trim)
+assertResult("local_sort")(sortscope(0).getString(1).trim)
   }
 
   test("TC_011-test SET property to Enable Unsafe Sort") {
diff --git 
a/integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala
 
b/integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala
index 2ce9d89..b6b4e8d 100644
--- 
a/integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala
+++ 
b/integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala
@@ -854,18 +854,6 @@ class TableNewProcessor(cm: TableModel) {
   tableSchema.getTableId,
   cm.databaseNameOp.getOrElse("default"))
 tablePropertiesMap.put("bad_record_path", badRecordsPath)
-if (tablePropertiesMap.get("sort_columns") != null) {
-  val sortCol = tablePropertiesMap.get("sort_columns")
-  if ((!sortCol.trim.isEmpty) && tablePropertiesMap.get("sort_scope") == 
null) {
-// If 

carbondata git commit: [CARBONDATA-3237] Fix presto carbon issues in dictionary include scenario

2019-01-09 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 1b45c41fe -> 8e6def9fa


[CARBONDATA-3237] Fix presto carbon issues in dictionary include scenario

problem1: Decimal column with dictionary include cannot be read in
presto
cause: int is typecasted to decimal for dictionary columns in decimal stream 
reader.
solution: keep original data type as well as new data type for decimal
stream reader.

problem2: Optimize presto query time for dictionary include string column
currently, for each query, presto carbon creates dictionary block for string 
columns.
cause: This happens for each query and if cardinality is more , it takes more 
time to build.
solution: dictionary block is not required. we can lookup using normal 
dictionary lookup.

This closes #3055


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/8e6def9f
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/8e6def9f
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/8e6def9f

Branch: refs/heads/master
Commit: 8e6def9facc6c51de58ee655961ac4710c252bc0
Parents: 1b45c41
Author: ajantha-bhat 
Authored: Mon Jan 7 14:50:11 2019 +0530
Committer: kumarvishal09 
Committed: Wed Jan 9 18:21:10 2019 +0530

--
 .../carbondata/presto/CarbonVectorBatch.java| 12 ++---
 .../readers/DecimalSliceStreamReader.java   |  9 ++--
 .../presto/readers/SliceStreamReader.java   | 53 
 .../CarbonDictionaryDecodeReadSupport.scala | 22 +---
 .../presto/util/CarbonDataStoreCreator.scala|  1 +
 5 files changed, 47 insertions(+), 50 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/8e6def9f/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java
--
diff --git 
a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java
 
b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java
index fb8300a..140e46b 100644
--- 
a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java
+++ 
b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonVectorBatch.java
@@ -37,8 +37,6 @@ import org.apache.carbondata.presto.readers.ShortStreamReader;
 import org.apache.carbondata.presto.readers.SliceStreamReader;
 import org.apache.carbondata.presto.readers.TimestampStreamReader;
 
-import com.facebook.presto.spi.block.Block;
-
 public class CarbonVectorBatch {
 
   private static final int DEFAULT_BATCH_SIZE = 4 * 1024;
@@ -63,8 +61,7 @@ public class CarbonVectorBatch {
 DataType[] dataTypes = readSupport.getDataTypes();
 
 for (int i = 0; i < schema.length; ++i) {
-  columns[i] = createDirectStreamReader(maxRows, dataTypes[i], schema[i], 
dictionaries[i],
-  readSupport.getDictionaryBlock(i));
+  columns[i] = createDirectStreamReader(maxRows, dataTypes[i], schema[i], 
dictionaries[i]);
 }
   }
 
@@ -79,7 +76,7 @@ public class CarbonVectorBatch {
   }
 
   private CarbonColumnVectorImpl createDirectStreamReader(int batchSize, 
DataType dataType,
-  StructField field, Dictionary dictionary, Block dictionaryBlock) {
+  StructField field, Dictionary dictionary) {
 if (dataType == DataTypes.BOOLEAN) {
   return new BooleanStreamReader(batchSize, field.getDataType(), 
dictionary);
 } else if (dataType == DataTypes.SHORT) {
@@ -93,9 +90,10 @@ public class CarbonVectorBatch {
 } else if (dataType == DataTypes.DOUBLE) {
   return new DoubleStreamReader(batchSize, field.getDataType(), 
dictionary);
 } else if (dataType == DataTypes.STRING) {
-  return new SliceStreamReader(batchSize, field.getDataType(), 
dictionaryBlock);
+  return new SliceStreamReader(batchSize, field.getDataType(), dictionary);
 } else if (DataTypes.isDecimal(dataType)) {
-  return new DecimalSliceStreamReader(batchSize, (DecimalType) 
field.getDataType(), dictionary);
+  return new DecimalSliceStreamReader(batchSize, field.getDataType(), 
(DecimalType) dataType,
+  dictionary);
 } else {
   return new ObjectStreamReader(batchSize, field.getDataType());
 }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/8e6def9f/integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java
--
diff --git 
a/integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java
 
b/integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java
index 2976ca7..ddc855a 100644
--- 
a/integration/presto/src/main/java/org/apache/carbondata/presto/readers/DecimalSliceStreamReader.java

carbondata git commit: [CARBONDATA-3200] No-Sort compaction

2019-01-09 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 3a5572ee4 -> 1b45c41fe


[CARBONDATA-3200] No-Sort compaction

When the data is loaded with SORT_SCOPE as NO_SORT, and done compaction upon, 
the data still remains unsorted. This does not affect much in query.
The major purpose of compaction, is better pack the data and improve query 
performance.

Now, the expected behaviour of compaction is sort to the data, so that after 
compaction, query performance becomes better.
 The columns to sort upon are provided by SORT_COLUMNS.

The new compaction works as follows:

Do sorting on unsorted & restructured data and store in temporary files
Pick a row from those temporary files, and already sorted carbondata files, 
according to a comparator on sort_columns.
Write data to a new segment (similar to old compaction flow).
Repeat steps 2 & 3 until no more rows are left.

This closes #3029


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/1b45c41f
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/1b45c41f
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/1b45c41f

Branch: refs/heads/master
Commit: 1b45c41fe294a7a33ef748d13747c29cd3142670
Parents: 3a5572e
Author: namanrastogi 
Authored: Wed Jan 2 16:26:09 2019 +0530
Committer: kumarvishal09 
Committed: Wed Jan 9 18:05:35 2019 +0530

--
 .../core/datastore/block/TableBlockInfo.java|  11 +
 .../blockletindex/BlockletDataMap.java  |   1 +
 .../core/metadata/blocklet/BlockletInfo.java|  19 ++
 .../core/metadata/blocklet/DataFileFooter.java  |  13 +
 .../executor/impl/AbstractQueryExecutor.java|   3 +
 .../util/AbstractDataFileFooterConverter.java   |   7 +
 .../core/util/DataFileFooterConverterV3.java|   5 +
 format/src/main/thrift/carbondata_index.thrift  |   1 +
 .../compaction/TestHybridCompaction.scala   | 262 +++
 .../carbondata/spark/rdd/CarbonMergerRDD.scala  |  45 ++--
 .../carbondata/spark/rdd/StreamHandoffRDD.scala |   4 +-
 .../merger/AbstractResultProcessor.java |   6 +-
 .../merger/CarbonCompactionExecutor.java|  35 ++-
 .../processing/merger/CarbonCompactionUtil.java |  88 +--
 .../merger/CompactionResultSortProcessor.java   |  23 +-
 .../merger/RowResultMergerProcessor.java|  11 +-
 .../sortdata/InMemorySortTempChunkHolder.java   | 147 +++
 .../SingleThreadFinalSortFilesMerger.java   |  52 ++--
 .../sort/sortdata/SortTempFileChunkHolder.java  |  18 +-
 .../store/CarbonFactDataHandlerModel.java   |   1 +
 .../store/writer/AbstractFactDataWriter.java|   3 +
 .../writer/v3/CarbonFactDataWriterImplV3.java   |   5 +-
 22 files changed, 682 insertions(+), 78 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/1b45c41f/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java
index c38124d..8ef2198 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/block/TableBlockInfo.java
@@ -29,6 +29,7 @@ import org.apache.carbondata.core.datamap.Segment;
 import org.apache.carbondata.core.datastore.impl.FileFactory;
 import org.apache.carbondata.core.indexstore.BlockletDetailInfo;
 import org.apache.carbondata.core.metadata.ColumnarFormatVersion;
+import org.apache.carbondata.core.metadata.blocklet.DataFileFooter;
 import org.apache.carbondata.core.util.ByteUtil;
 import org.apache.carbondata.core.util.path.CarbonTablePath;
 import org.apache.carbondata.core.util.path.CarbonTablePath.DataFileUtil;
@@ -101,6 +102,8 @@ public class TableBlockInfo implements Distributable, 
Serializable {
 
   private String dataMapWriterPath;
 
+  private transient DataFileFooter dataFileFooter;
+
   /**
* comparator to sort by block size in descending order.
* Since each line is not exactly the same, the size of a InputSplit may 
differs,
@@ -462,6 +465,14 @@ public class TableBlockInfo implements Distributable, 
Serializable {
 this.dataMapWriterPath = dataMapWriterPath;
   }
 
+  public DataFileFooter getDataFileFooter() {
+return dataFileFooter;
+  }
+
+  public void setDataFileFooter(DataFileFooter dataFileFooter) {
+this.dataFileFooter = dataFileFooter;
+  }
+
   @Override
   public String toString() {
 final StringBuilder sb = new StringBuilder("TableBlockInfo{");

http://git-wip-us.apache.org/repos/asf/carbondata/blob/1b45c41f/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex

carbondata git commit: [CARBONDATA-3236] Fix for JVM Crash for insert into new table from old table

2019-01-09 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master dd2fff269 -> 3a5572ee4


[CARBONDATA-3236] Fix for JVM Crash for insert into new table from old table

Problem: Insert into new table from old table fails with JVM crash for file 
format(Using carbondata).
This happened because both the query and load flow were assigned the same 
taskId and once query finished
it freed the unsafe memory while the insert still in progress.

Solution: As the flow for file format is direct flow and uses on-heap(safe) so 
no need to free the unsafe memory in query.

This closes #3056


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/3a5572ee
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/3a5572ee
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/3a5572ee

Branch: refs/heads/master
Commit: 3a5572ee4d0b472e0a37aebf7c6d38e779c8eacb
Parents: dd2fff2
Author: manishnalla1994 
Authored: Tue Jan 8 16:12:55 2019 +0530
Committer: kumarvishal09 
Committed: Wed Jan 9 17:08:51 2019 +0530

--
 .../execution/datasources/SparkCarbonFileFormat.scala  | 13 +++--
 .../tasklisteners/CarbonTaskCompletionListener.scala   |  2 +-
 2 files changed, 4 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/3a5572ee/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
--
diff --git 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
index 8cb2ca4..f725de3 100644
--- 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
+++ 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
@@ -410,15 +410,8 @@ class SparkCarbonFileFormat extends FileFormat
 val model = format.createQueryModel(split, hadoopAttemptContext)
 model.setConverter(new SparkDataTypeConverterImpl)
 model.setPreFetchData(false)
-var isAdded = false
-Option(TaskContext.get()).foreach { context =>
-  val onCompleteCallbacksField = 
context.getClass.getDeclaredField("onCompleteCallbacks")
-  onCompleteCallbacksField.setAccessible(true)
-  val listeners = onCompleteCallbacksField.get(context)
-.asInstanceOf[ArrayBuffer[TaskCompletionListener]]
-  isAdded = listeners.exists(p => 
p.isInstanceOf[CarbonLoadTaskCompletionListener])
-  model.setFreeUnsafeMemory(!isAdded)
-}
+// As file format uses on heap, no need to free unsafe memory
+model.setFreeUnsafeMemory(false)
 val carbonReader = if (readVector) {
   model.setDirectVectorFill(true)
   val vectorizedReader = new VectorizedCarbonRecordReader(model,
@@ -439,7 +432,7 @@ class SparkCarbonFileFormat extends FileFormat
 Option(TaskContext.get()).foreach{context =>
   context.addTaskCompletionListener(
   CarbonQueryTaskCompletionListenerImpl(
-iter.asInstanceOf[RecordReaderIterator[InternalRow]], !isAdded))
+iter.asInstanceOf[RecordReaderIterator[InternalRow]]))
 }
 
 if (carbonReader.isInstanceOf[VectorizedCarbonRecordReader] && 
readVector) {

http://git-wip-us.apache.org/repos/asf/carbondata/blob/3a5572ee/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/tasklisteners/CarbonTaskCompletionListener.scala
--
diff --git 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/tasklisteners/CarbonTaskCompletionListener.scala
 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/tasklisteners/CarbonTaskCompletionListener.scala
index eb3e42a..5547228 100644
--- 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/tasklisteners/CarbonTaskCompletionListener.scala
+++ 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/tasklisteners/CarbonTaskCompletionListener.scala
@@ -40,7 +40,7 @@ trait CarbonQueryTaskCompletionListener extends 
TaskCompletionListener
 trait CarbonLoadTaskCompletionListener extends TaskCompletionListener
 
 case class CarbonQueryTaskCompletionListenerImpl(iter: 
RecordReaderIterator[InternalRow],
-  

carbondata git commit: [CARBONDATA-3235] Fix Rename-Fail & Datamap-creation-Fail

2019-01-09 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 3a41ee5df -> dd2fff269


[CARBONDATA-3235] Fix Rename-Fail & Datamap-creation-Fail

1. Alter Table Rename Table Fail

Problem: When tabe rename is success in hive, for failed in carbon data store, 
it would throw exception, but would not go back and undo rename in hive.

Solution: A flag to keep check if hive rename has already executed, and of the 
code breaks after hive rename is done, go back and undo the hive rename.

2. Create-Preagregate-Datamap Fail

Problem: When (preaggregate) datamap schema is written, but table updation is 
failed call CarbonDropDataMapCommand.processMetadata()
call dropDataMapFromSystemFolder() -> this is supposed to delete the folder on 
disk, but doesnt as the datamap is not yet updated in table,
and throws NoSuchDataMapException

Solution: Call CarbonDropTableCommand.run() instead of 
CarbonDropTableCommand.processDatamap().
as CarbonDropTableCommand.processData() deletes actual folders from disk.

This closes #2996


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/dd2fff26
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/dd2fff26
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/dd2fff26

Branch: refs/heads/master
Commit: dd2fff269a6b416cbe0af8bd1a9e7108a02fd600
Parents: 3a41ee5
Author: namanrastogi 
Authored: Thu Dec 13 16:09:58 2018 +0530
Committer: kumarvishal09 
Committed: Wed Jan 9 14:16:20 2019 +0530

--
 .../command/datamap/CarbonDropDataMapCommand.scala|  2 +-
 .../command/schema/CarbonAlterTableRenameCommand.scala| 10 +-
 2 files changed, 10 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/dd2fff26/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala
--
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala
index 54096ca..0bafe04 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDropDataMapCommand.scala
@@ -103,7 +103,7 @@ case class CarbonDropDataMapCommand(
 Some(childCarbonTable.get.getDatabaseName),
 childCarbonTable.get.getTableName,
 dropChildTable = true)
-  commandToRun.processMetadata(sparkSession)
+  commandToRun.run(sparkSession)
 }
 dropDataMapFromSystemFolder(sparkSession)
 return Seq.empty

http://git-wip-us.apache.org/repos/asf/carbondata/blob/dd2fff26/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
--
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
index dbf665a..01698c9 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala
@@ -87,6 +87,7 @@ private[sql] case class CarbonAlterTableRenameCommand(
 
 var timeStamp = 0L
 var carbonTable: CarbonTable = null
+var hiveRenameSuccess = false
 // lock file path to release locks after operation
 var carbonTableLockFilePath: String = null
 try {
@@ -139,6 +140,7 @@ private[sql] case class CarbonAlterTableRenameCommand(
   oldIdentifier,
   newIdentifier,
 oldTableIdentifier.getTablePath)
+  hiveRenameSuccess = true
 
   metastore.updateTableSchemaForAlter(
 newTableIdentifier,
@@ -165,6 +167,12 @@ private[sql] case class CarbonAlterTableRenameCommand(
   case e: ConcurrentOperationException =>
 throw e
   case e: Exception =>
+if (hiveRenameSuccess) {
+  
sparkSession.sessionState.catalog.asInstanceOf[CarbonSessionCatalog].alterTableRename(
+newTableIdentifier,
+oldTableIdentifier,
+carbonTable.getAbsoluteTableIdentifier.getTableName)
+}
 if (carbonTable != null) {
   AlterTableUtil.revertRenameTableChanges(
 newTableName,
@@ -173,7 +181,7 @@ private[sql] case class CarbonAlterTa

carbondata git commit: [CARBONDATA-3201] Added load level SORT_SCOPE Added SORT_SCOPE in Load Options & in SET Command

2019-01-09 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 4e27b86df -> 77d2b4e8d


[CARBONDATA-3201] Added load level SORT_SCOPE
Added SORT_SCOPE in Load Options & in SET Command

1. Added load level SORT_SCOPE
2. Added Sort_Scope for PreAgg
3. Added sort_scope msg for LoadDataCommand
4. Added property CARBON.TABLE.LOAD.SORT.SCOPE.. to set table 
level sort_scope property
5. Removed test case veryfying LOAD_OPTIONS with SORT_SCOPE

Load level SORT_SCOPE
LOAD DATA INPATH 'path/to/data.csv'
INTO TABLE my_table
OPTIONS (
   'sort_scope'='no_sort'
)
Priority of SORT_SCOPE
Load Level (if provided)
Table level (if provided)
Default

This closes #3014


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/77d2b4e8
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/77d2b4e8
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/77d2b4e8

Branch: refs/heads/master
Commit: 77d2b4e8d132f768b83438845f6fb9660a74fe1f
Parents: 4e27b86
Author: namanrastogi 
Authored: Fri Dec 21 13:03:30 2018 +0530
Committer: kumarvishal09 
Committed: Wed Jan 9 14:06:17 2019 +0530

--
 .../constants/CarbonLoadOptionConstants.java|  6 
 .../carbondata/core/util/SessionParams.java |  8 -
 .../TestCreateTableWithSortScope.scala  | 19 ---
 .../streaming/StreamSinkFactory.scala   |  2 +-
 .../spark/sql/catalyst/CarbonDDLSqlParser.scala |  3 +-
 .../CarbonAlterTableCompactionCommand.scala |  4 +--
 .../management/CarbonLoadDataCommand.scala  | 35 +---
 .../preaaggregate/PreAggregateListeners.scala   |  7 ++--
 .../preaaggregate/PreAggregateTableHelper.scala |  3 +-
 .../preaaggregate/PreAggregateUtil.scala|  2 ++
 .../execution/command/CarbonHiveCommands.scala  | 19 ---
 .../commands/SetCommandTestCase.scala   | 28 
 .../processing/loading/events/LoadEvents.java   | 13 +++-
 13 files changed, 111 insertions(+), 38 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/77d2b4e8/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java
index 5cf6163..eef2bef 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonLoadOptionConstants.java
@@ -81,6 +81,12 @@ public final class CarbonLoadOptionConstants {
   "carbon.options.sort.scope";
 
   /**
+   * option to specify table level sort_scope
+   */
+  @CarbonProperty(dynamicConfigurable = true)
+  public static final String CARBON_TABLE_LOAD_SORT_SCOPE = 
"carbon.table.load.sort.scope.";
+
+  /**
* option to specify the batch sort size inmb
*/
   @CarbonProperty(dynamicConfigurable = true)

http://git-wip-us.apache.org/repos/asf/carbondata/blob/77d2b4e8/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java 
b/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
index f49747f..d9aa214 100644
--- a/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
+++ b/core/src/main/java/org/apache/carbondata/core/util/SessionParams.java
@@ -161,7 +161,7 @@ public class SessionParams implements Serializable, 
Cloneable {
 isValid = CarbonUtil.isValidSortOption(value);
 if (!isValid) {
   throw new InvalidConfigurationException("The sort scope " + key
-  + " can have only either BATCH_SORT or LOCAL_SORT or NO_SORT.");
+  + " can have only either NO_SORT, BATCH_SORT, LOCAL_SORT or 
GLOBAL_SORT.");
 }
 break;
   case CARBON_OPTIONS_BATCH_SORT_SIZE_INMB:
@@ -229,6 +229,12 @@ public class SessionParams implements Serializable, 
Cloneable {
   if (!isValid) {
 throw new InvalidConfigurationException("Invalid value " + value + 
" for key " + key);
   }
+} else if 
(key.startsWith(CarbonLoadOptionConstants.CARBON_TABLE_LOAD_SORT_SCOPE)) {
+  isValid = CarbonUtil.isValidSortOption(value);
+  if (!isValid) {
+throw new InvalidConfigurationException("The sort scope " + key
++ " can have only either NO_SORT, BATCH_SORT, LOCAL_SORT or 
GLOBAL_SORT.");
+  }
 } else {
   throw new InvalidConfiguratio

carbondata git commit: [CARBONDATA-3189] Fix PreAggregate Datamap Issue

2019-01-07 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 72da33495 -> aad9aabf9


[CARBONDATA-3189] Fix PreAggregate Datamap Issue

Problem -
Load and Select query was failing on table with preaggregate datamap.

Cause -
Previously if query on datamap was not enabled in thread, there was no check 
afterwards.

Solution -
After checking whether thread param for Direct Query On Datamap is enable. If 
not enable, we check in session params and then global.

This closes #3010


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/aad9aabf
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/aad9aabf
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/aad9aabf

Branch: refs/heads/master
Commit: aad9aabf960dce5227ef8e59a56c25c0972d221c
Parents: 72da334
Author: Shubh18s 
Authored: Thu Dec 20 16:47:32 2018 +0530
Committer: kumarvishal09 
Committed: Mon Jan 7 14:13:37 2019 +0530

--
 .../core/constants/CarbonCommonConstants.java   |  6 ---
 docs/configuration-parameters.md|  3 +-
 .../preaggregate/TestPreAggCreateCommand.scala  | 42 
 .../apache/spark/sql/test/util/QueryTest.scala  |  2 +-
 .../preaaggregate/PreAggregateUtil.scala|  8 ++--
 .../sql/optimizer/CarbonLateDecodeRule.scala| 23 ++-
 6 files changed, 21 insertions(+), 63 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/aad9aabf/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index 8d0a4d9..c1ef940 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -1450,12 +1450,6 @@ public final class CarbonCommonConstants {
   public static final String SUPPORT_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE = 
"false";
 
   @CarbonProperty
-  public static final String VALIDATE_DIRECT_QUERY_ON_DATAMAP =
-  "carbon.query.validate.direct.query.on.datamap";
-
-  public static final String VALIDATE_DIRECT_QUERY_ON_DATAMAP_DEFAULTVALUE = 
"true";
-
-  @CarbonProperty
   public static final String CARBON_SHOW_DATAMAPS = 
"carbon.query.show.datamaps";
 
   public static final String CARBON_SHOW_DATAMAPS_DEFAULT = "true";

http://git-wip-us.apache.org/repos/asf/carbondata/blob/aad9aabf/docs/configuration-parameters.md
--
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index db21c6a..105b768 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -135,7 +135,6 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.custom.block.distribution | false | CarbonData has its own scheduling 
algorithm to suggest to Spark on how many tasks needs to be launched and how 
much work each task need to do in a Spark cluster for any query on CarbonData. 
When this configuration is true, CarbonData would distribute the available 
blocks to be scanned among the available number of cores. For Example:If there 
are 10 blocks to be scanned and only 3 tasks can be run(only 3 executor cores 
available in the cluster), CarbonData would combine blocks as 4,3,3 and give it 
to 3 tasks to run. **NOTE:** When this configuration is false, as per the 
***carbon.task.distribution*** configuration, each block/blocklet would be 
given to each task. |
 | enable.query.statistics | false | CarbonData has extensive logging which 
would be useful for debugging issues related to performance or hard to locate 
issues. This configuration when made ***true*** would log additional query 
statistics information to more accurately locate the issues being 
debugged.**NOTE:** Enabling this would log more debug information to log files, 
there by increasing the log files size significantly in short span of time. It 
is advised to configure the log files size, retention of log files parameters 
in log4j properties appropriately. Also extensive logging is an increased IO 
operation and hence over all query performance might get reduced. Therefore it 
is recommended to enable this configuration only for the duration of debugging. 
|
 | enable.unsafe.in.query.processing | false | CarbonData supports unsafe 
operations of Java to avoid GC overhead for certain operations. This 
configuration enables to use unsafe functions in CarbonData while scanning the  
data during query. |
-|

carbondata git commit: [CARBONDATA-3217] Optimize implicit filter expression performance by removing extra serialization

2019-01-04 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 9fa045d40 -> bc1e94472


[CARBONDATA-3217] Optimize implicit filter expression performance by removing 
extra serialization

Fixed performance issue for Implicit filter column
1. Removed serialization all the implicit filter values in each task. Instead 
serialized values only for the blocks going to particular task
2. Removed 2 times deserialization of implicit filter values in executor for 
each task. 1 time is sufficient

This closes #3039


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/bc1e9447
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/bc1e9447
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/bc1e9447

Branch: refs/heads/master
Commit: bc1e94472d845cca548c59c5198ffdcd5c78b571
Parents: 9fa045d
Author: manishgupta88 
Authored: Thu Dec 27 15:18:07 2018 +0530
Committer: kumarvishal09 
Committed: Fri Jan 4 16:37:54 2019 +0530

--
 .../indexstore/blockletindex/BlockDataMap.java  |   3 +-
 .../conditional/ImplicitExpression.java | 109 +
 .../core/scan/filter/ColumnFilterInfo.java  |  43 ++-
 .../carbondata/core/scan/filter/FilterUtil.java |  73 ++--
 .../ImplicitIncludeFilterExecutorImpl.java  |  23 +++-
 .../core/scan/filter/intf/ExpressionType.java   |   3 +-
 .../visitor/ImplicitColumnVisitor.java  |  24 ++--
 .../carbondata/hadoop/CarbonInputSplit.java |  28 +
 .../hadoop/api/CarbonInputFormat.java   |  43 ++-
 .../TestImplicitFilterExpression.scala  | 117 +++
 .../carbondata/spark/rdd/CarbonScanRDD.scala|  31 -
 .../spark/sql/optimizer/CarbonFilters.scala |  15 ++-
 12 files changed, 443 insertions(+), 69 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/bc1e9447/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java
 
b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java
index 6b04cf7..e29dfef 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockDataMap.java
@@ -32,6 +32,7 @@ import 
org.apache.carbondata.core.datamap.dev.cgdatamap.CoarseGrainDataMap;
 import org.apache.carbondata.core.datastore.block.SegmentProperties;
 import 
org.apache.carbondata.core.datastore.block.SegmentPropertiesAndSchemaHolder;
 import org.apache.carbondata.core.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.datastore.impl.FileFactory;
 import org.apache.carbondata.core.indexstore.AbstractMemoryDMStore;
 import org.apache.carbondata.core.indexstore.BlockMetaInfo;
 import org.apache.carbondata.core.indexstore.Blocklet;
@@ -485,7 +486,7 @@ public class BlockDataMap extends CoarseGrainDataMap
 String fileName = filePath + CarbonCommonConstants.FILE_SEPARATOR + new 
String(
 dataMapRow.getByteArray(FILE_PATH_INDEX), 
CarbonCommonConstants.DEFAULT_CHARSET_CLASS)
 + CarbonTablePath.getCarbonDataExtension();
-return fileName;
+return FileFactory.getUpdatedFilePath(fileName);
   }
 
   private void addTaskSummaryRowToUnsafeMemoryStore(CarbonRowSchema[] 
taskSummarySchema,

http://git-wip-us.apache.org/repos/asf/carbondata/blob/bc1e9447/core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
new file mode 100644
index 000..eab564e
--- /dev/null
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/expression/conditional/ImplicitExpression.java
@@ -0,0 +1,109 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, ei

carbondata git commit: [CARBONDATA-3212] Fixed NegativeArraySizeException while querying in specific scenario

2019-01-02 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master f8697b106 -> deb08c329


[CARBONDATA-3212] Fixed NegativeArraySizeException while querying in specific 
scenario

Problem:In Local Dictionary, page size was not getting updated for complex 
children columns. So during fallback,
new page was being created with less records giving NegativeArraySizeException 
while querying data.

Solution:Updated the page size in Local Dictionary.

This closes#3031


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/deb08c32
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/deb08c32
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/deb08c32

Branch: refs/heads/master
Commit: deb08c329287dc7bcdf96af6a6611f7c4b0fc83a
Parents: f8697b1
Author: shivamasn 
Authored: Wed Jan 2 16:19:22 2019 +0530
Committer: kumarvishal09 
Committed: Thu Jan 3 11:06:59 2019 +0530

--
 .../carbondata/core/datastore/page/LocalDictColumnPage.java   | 3 +++
 1 file changed, 3 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/deb08c32/core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java
index 5cf2130..0e34d72 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/LocalDictColumnPage.java
@@ -140,6 +140,9 @@ public class LocalDictColumnPage extends ColumnPage {
 } else {
   actualDataColumnPage.putBytes(rowId, bytes);
 }
+if (pageSize <= rowId) {
+  pageSize = rowId + 1;
+}
   }
 
   @Override public void disableLocalDictEncoding() {



carbondata git commit: [CARBONDATA-3218] Fix schema refresh and wrong query result issues in presto.

2019-01-02 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 7477527e9 -> f8697b106


[CARBONDATA-3218] Fix schema refresh and wrong query result issues in presto.

Problem:
Schema which is updated in spark is not reflecting in presto. which results in 
wrong query result in presto.

Solution:
Update the schema in presto whenever the schema changed in spark. And also 
override the putNulls method in all presto readers to work for null data 
scenarios.

This closes #3041


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/f8697b10
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/f8697b10
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/f8697b10

Branch: refs/heads/master
Commit: f8697b1065cd76e3b96be571fd78761a44a58e7e
Parents: 7477527
Author: ravipesala 
Authored: Mon Dec 31 17:20:24 2018 +0530
Committer: kumarvishal09 
Committed: Wed Jan 2 18:59:05 2019 +0530

--
 .../presto/CarbondataPageSourceProvider.java|   7 +-
 .../presto/CarbondataSplitManager.java  |  65 +-
 .../presto/impl/CarbonTableCacheModel.java  |  29 -
 .../presto/impl/CarbonTableReader.java  | 119 ---
 .../presto/readers/BooleanStreamReader.java |   6 +
 .../readers/DecimalSliceStreamReader.java   |  12 ++
 .../presto/readers/DoubleStreamReader.java  |  12 ++
 .../presto/readers/IntegerStreamReader.java |  12 ++
 .../presto/readers/LongStreamReader.java|  12 ++
 .../presto/readers/ObjectStreamReader.java  |   6 +
 .../presto/readers/ShortStreamReader.java   |  12 ++
 .../presto/readers/SliceStreamReader.java   |  24 +++-
 .../presto/readers/TimestampStreamReader.java   |  12 ++
 13 files changed, 215 insertions(+), 113 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/f8697b10/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java
--
diff --git 
a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java
 
b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java
index d7b7266..c81e0c3 100644
--- 
a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java
+++ 
b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataPageSourceProvider.java
@@ -230,10 +230,11 @@ public class CarbondataPageSourceProvider extends 
HivePageSourceProvider {
 .getCarbonCache(new SchemaTableName(carbonSplit.getDatabase(), 
carbonSplit.getTable()),
 carbonSplit.getSchema().getProperty("tablePath"), configuration);
 checkNotNull(tableCacheModel, "tableCacheModel should not be null");
-checkNotNull(tableCacheModel.carbonTable, "tableCacheModel.carbonTable 
should not be null");
-checkNotNull(tableCacheModel.carbonTable.getTableInfo(),
+checkNotNull(tableCacheModel.getCarbonTable(),
+"tableCacheModel.carbonTable should not be null");
+checkNotNull(tableCacheModel.getCarbonTable().getTableInfo(),
 "tableCacheModel.carbonTable.tableInfo should not be null");
-return tableCacheModel.carbonTable;
+return tableCacheModel.getCarbonTable();
   }
 
 }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/f8697b10/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataSplitManager.java
--
diff --git 
a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataSplitManager.java
 
b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataSplitManager.java
index ded00fc..6efef93 100755
--- 
a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataSplitManager.java
+++ 
b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbondataSplitManager.java
@@ -119,45 +119,40 @@ public class CarbondataSplitManager extends 
HiveSplitManager {
 configuration = carbonTableReader.updateS3Properties(configuration);
 CarbonTableCacheModel cache =
 carbonTableReader.getCarbonCache(schemaTableName, location, 
configuration);
-if (null != cache) {
-  Expression filters = PrestoFilterUtil.parseFilterExpression(predicate);
-  try {
-
-List splits =
-carbonTableReader.getInputSplits2(cache, filters, predicate, 
configuration);
-
-ImmutableList.Builder cSplits = 
ImmutableList.builder();
-long index = 0;
-for (CarbonLocalMultiBlockSplit split : splits) {
-  index++;
-  Properties properties = new Properties();
-  for (Map.Entry entry : 
table.getSt

carbondata git commit: [CARBONDATA-3195]Added validation for Inverted Index columns and added a test case in case of varchar

2018-12-28 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master d85d54324 -> f5c1b7bbd


[CARBONDATA-3195]Added validation for Inverted Index columns and added a test 
case in case of varchar

This PR is to add a validation for inverted index when inverted index columns
are not present in the sort columns they should throw a exception.
Also added a test case in case when varchar columns are passed as inverted 
index.

This closes #3020


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/f5c1b7bb
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/f5c1b7bb
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/f5c1b7bb

Branch: refs/heads/master
Commit: f5c1b7bbd2485e1186e3a7c718d3f539599905a5
Parents: d85d543
Author: shardul-cr7 
Authored: Mon Dec 24 12:51:16 2018 +0530
Committer: kumarvishal09 
Committed: Fri Dec 28 17:01:56 2018 +0530

--
 docs/ddl-of-carbondata.md  |  4 +++-
 .../dataload/TestNoInvertedIndexLoadAndQuery.scala |  8 
 .../longstring/VarcharDataTypesBasicTestCase.scala | 13 +
 .../apache/spark/sql/catalyst/CarbonDDLSqlParser.scala | 10 ++
 4 files changed, 30 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/f5c1b7bb/docs/ddl-of-carbondata.md
--
diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md
index 3d3db1e..d1a4794 100644
--- a/docs/ddl-of-carbondata.md
+++ b/docs/ddl-of-carbondata.md
@@ -126,9 +126,11 @@ CarbonData DDL statements are documented here,which 
includes:
 
  By default inverted index is disabled as store size will be reduced, it 
can be enabled by using a table property. It might help to improve compression 
ratio and query speed, especially for low cardinality columns which are in 
reward position.
  Suggested use cases : For high cardinality columns, you can disable the 
inverted index for improving the data loading performance.
+ 
+ **NOTE**: Columns specified in INVERTED_INDEX should also be present in 
SORT_COLUMNS.
 
  ```
- TBLPROPERTIES ('NO_INVERTED_INDEX'='column1', 'INVERTED_INDEX'='column2, 
column3')
+ TBLPROPERTIES 
('SORT_COLUMNS'='column2,column3','NO_INVERTED_INDEX'='column1', 
'INVERTED_INDEX'='column2, column3')
  ```
 
- # Sort Columns Configuration

http://git-wip-us.apache.org/repos/asf/carbondata/blob/f5c1b7bb/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala
index 13f8adb..f483827 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/dataload/TestNoInvertedIndexLoadAndQuery.scala
@@ -305,7 +305,7 @@ class TestNoInvertedIndexLoadAndQuery extends QueryTest 
with BeforeAndAfterAll {
CREATE TABLE IF NOT EXISTS index1
(id Int, name String, city String)
STORED BY 'org.apache.carbondata.format'
-   
TBLPROPERTIES('DICTIONARY_INCLUDE'='id','INVERTED_INDEX'='city,name')
+   
TBLPROPERTIES('DICTIONARY_INCLUDE'='id','INVERTED_INDEX'='city,name', 
'SORT_COLUMNS'='city,name')
   """)
 sql(
   s"""
@@ -333,14 +333,14 @@ class TestNoInvertedIndexLoadAndQuery extends QueryTest 
with BeforeAndAfterAll {
CREATE TABLE IF NOT EXISTS index1
(id Int, name String, city String)
STORED BY 'org.apache.carbondata.format'
-   TBLPROPERTIES('INVERTED_INDEX'='city,name,id')
+   
TBLPROPERTIES('INVERTED_INDEX'='city,name,id','SORT_COLUMNS'='city,name,id')
   """)
 val carbonTable = CarbonMetadata.getInstance().getCarbonTable("default", 
"index1")
 assert(carbonTable.getColumnByName("index1", 
"city").getColumnSchema.getEncodingList
   .contains(Encoding.INVERTED_INDEX))
 assert(carbonTable.getColumnByName("index1", 
"name").getColumnSchema.getEncodingList
   .contains(Encoding.INVERTED_INDEX))
-assert(!carbonTable.getColumnByName("index1", 
"id").getColumnSchema.getEncodingList
+assert(carbonTable.get

carbondata git commit: [CARBONDATA-3192] Fix for compaction compatibilty issue

2018-12-24 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 10bc5c2ec -> f4c1c672b


[CARBONDATA-3192] Fix for compaction compatibilty issue

Problem: Table Created, Loaded and Altered(Column added) in 1.5.1 version and 
Refreshed, Altered(Added Column dropped) ,
 Loaded and Compacted with Varchar Columns in new version giving error.

Solution: Corrected the Varchar Dimension index calculation by calculating it 
based on the columns
 which have been deleted (invisibleColumns). Hence giving the correct ordinals 
after deletion.

This closes #3016


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/f4c1c672
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/f4c1c672
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/f4c1c672

Branch: refs/heads/master
Commit: f4c1c672be19201c2c98fe84f6143f1323a60bbf
Parents: 10bc5c2
Author: manishnalla1994 
Authored: Fri Dec 21 19:11:46 2018 +0530
Committer: kumarvishal09 
Committed: Mon Dec 24 13:29:28 2018 +0530

--
 .../processing/store/CarbonFactDataHandlerModel.java| 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/f4c1c672/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java
--
diff --git 
a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java
 
b/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java
index e759c02..c60da45 100644
--- 
a/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java
+++ 
b/processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerModel.java
@@ -314,18 +314,23 @@ public class CarbonFactDataHandlerModel {
 
 // for dynamic page size in write step if varchar columns exist
 List varcharDimIdxInNoDict = new ArrayList<>();
-List allDimensions = carbonTable.getDimensions();
+List allDimensions = carbonTable.getAllDimensions();
 int dictDimCount = allDimensions.size() - 
segmentProperties.getNumberOfNoDictionaryDimension()
 - segmentProperties.getComplexDimensions().size();
 CarbonColumn[] noDicAndComplexColumns =
 new CarbonColumn[segmentProperties.getNumberOfNoDictionaryDimension() 
+ segmentProperties
 .getComplexDimensions().size()];
 int noDicAndComp = 0;
+int invisibleCount = 0;
 for (CarbonDimension dim : allDimensions) {
+  if (dim.isInvisible()) {
+invisibleCount++;
+continue;
+  }
   if (!dim.isComplex() && !dim.hasEncoding(Encoding.DICTIONARY) &&
   dim.getDataType() == DataTypes.VARCHAR) {
 // ordinal is set in CarbonTable.fillDimensionsAndMeasuresForTables()
-varcharDimIdxInNoDict.add(dim.getOrdinal() - dictDimCount);
+varcharDimIdxInNoDict.add(dim.getOrdinal() - dictDimCount - 
invisibleCount);
   }
   if (!dim.hasEncoding(Encoding.DICTIONARY)) {
 noDicAndComplexColumns[noDicAndComp++] =



carbondata git commit: [CARBONDATA-3186]Avoid creating empty carbondata file when all the records are bad record with action redirect.

2018-12-23 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master bd752e9d5 -> 10bc5c2ec


[CARBONDATA-3186]Avoid creating empty carbondata file when all the records are 
bad record with action redirect.

problem: In the no_sort flow, writer will be open as there is no blocking sort 
step.
So, when all the record goes as bad record with redirect in converted step.
writer is closing the empty .carbondata file.
when this empty carbondata file is queried , we get multiple issues including 
NPE.

solution: When the file size is 0 bytes. do the following
a) If one data and one index file -- delete carbondata file and avoid index 
file creation
b) If multiple data and one index file (with few data file is full of bad recod)
-- delete carbondata files, remove them from blockIndexInfoList, so index file 
not will not have that info of empty carbon files
c) In case direct write to store path is enable. need to delete data file from 
there and avoid writing index file with that carbondata in info.

[HOTFIX] Presto NPE when non-transactional table is cached for s3a/HDFS.
cause: for non-transactional table, schema must not be read.

solution: use inferred schema, instead of checking schema file.

This closes #3003


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/10bc5c2e
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/10bc5c2e
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/10bc5c2e

Branch: refs/heads/master
Commit: 10bc5c2ec69711c12bc379e9f0997d3363543364
Parents: bd752e9
Author: ajantha-bhat 
Authored: Wed Dec 19 18:27:53 2018 +0530
Committer: kumarvishal09 
Committed: Mon Dec 24 13:22:41 2018 +0530

--
 .../presto/impl/CarbonTableReader.java  |  4 +-
 .../TestNonTransactionalCarbonTable.scala   | 29 -
 .../store/writer/AbstractFactDataWriter.java| 45 
 3 files changed, 65 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/10bc5c2e/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java
--
diff --git 
a/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java
 
b/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java
index 9677839..363f3f5 100755
--- 
a/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java
+++ 
b/integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java
@@ -288,8 +288,8 @@ public class CarbonTableReader {
 }
 if (isKeyExists) {
   CarbonTableCacheModel carbonTableCacheModel = 
carbonCache.get().get(schemaTableName);
-  if (carbonTableCacheModel != null
-  && carbonTableCacheModel.carbonTable.getTableInfo() != null) {
+  if (carbonTableCacheModel != null && 
carbonTableCacheModel.carbonTable.getTableInfo() != null
+  && carbonTableCacheModel.carbonTable.isTransactionalTable()) {
 Long latestTime = FileFactory.getCarbonFile(CarbonTablePath
 
.getSchemaFilePath(carbonCache.get().get(schemaTableName).carbonTable.getTablePath()))
 .getLastModifiedTime();

http://git-wip-us.apache.org/repos/asf/carbondata/blob/10bc5c2e/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
index a166789..1c211e3 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
@@ -34,7 +34,7 @@ import org.apache.avro.generic.{GenericDatumReader, 
GenericDatumWriter, GenericR
 import org.apache.avro.io.{DecoderFactory, Encoder}
 import org.apache.commons.io.FileUtils
 import org.apache.spark.sql.test.util.QueryTest
-import org.apache.spark.sql.{CarbonEnv, Row}
+import org.apache.spark.sql.{AnalysisException, CarbonEnv, Row}
 import org.junit.Assert
 import org.scalatest.BeforeAndAfterAll
 
@@ -119,6 +119,13 @@ class TestNonTransactionalCarbonTable extends QueryTest 
with BeforeAndAfterAll {
 buildTestData(rows, options, List("name"))
   }
 
+  def buildTestDataWithOptionsAndEmptySortColumn(rows: Int,
+  op

carbondata git commit: [CARBONDATA-3179] Map Data Load Failure and Struct Projection Pushdown Issue

2018-12-20 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 34923db0e -> 96b2ea364


[CARBONDATA-3179] Map Data Load Failure and Struct Projection Pushdown Issue

Problem1 : Data Load failing for Insert into Select from same table in 
containing Map datatype.
Solution: Map type was not handled for this scenario. Handled it now.

Problem2 : Projection Pushdown not supported for table containing Struct of Map.
Solution: Pass the parent column only for projection pushdown if table contains 
MapType.

This closes #2993


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/96b2ea36
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/96b2ea36
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/96b2ea36

Branch: refs/heads/master
Commit: 96b2ea3646a2a768133880bb2e4c1318d366b482
Parents: 34923db
Author: manishnalla1994 
Authored: Fri Dec 14 17:20:15 2018 +0530
Committer: kumarvishal09 
Committed: Thu Dec 20 22:33:58 2018 +0530

--
 .../TestCreateDDLForComplexMapType.scala| 71 +++-
 .../spark/rdd/CarbonGlobalDictionaryRDD.scala   |  6 +-
 .../spark/rdd/NewCarbonDataLoadRDD.scala| 12 ++--
 .../carbondata/spark/util/CarbonScalaUtil.scala | 25 ---
 .../spark/rdd/CarbonDataRDDFactory.scala|  5 +-
 .../sql/CarbonDatasourceHadoopRelation.scala| 37 ++
 .../streaming/parser/FieldConverter.scala   | 44 ++--
 .../streaming/parser/RowStreamParserImp.scala   | 16 +++--
 8 files changed, 150 insertions(+), 66 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/96b2ea36/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
index 09f23e5..9006b61 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
@@ -27,7 +27,6 @@ import org.apache.spark.sql.test.util.QueryTest
 import org.scalatest.BeforeAndAfterAll
 
 import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk
-
 import scala.collection.JavaConversions._
 
 class TestCreateDDLForComplexMapType extends QueryTest with BeforeAndAfterAll {
@@ -471,4 +470,74 @@ class TestCreateDDLForComplexMapType extends QueryTest 
with BeforeAndAfterAll {
 "sort_columns is unsupported for map datatype column: mapfield"))
   }
 
+  test("Data Load Fail Issue") {
+sql("DROP TABLE IF EXISTS carbon")
+sql(
+  s"""
+ | CREATE TABLE carbon(
+ | mapField map
+ | )
+ | STORED BY 'carbondata'
+ | """
+.stripMargin)
+sql(
+  s"""
+ | LOAD DATA LOCAL INPATH '$path'
+ | INTO TABLE carbon OPTIONS(
+ | 'header' = 'false')
+   """.stripMargin)
+sql("INSERT INTO carbon SELECT * FROM carbon")
+checkAnswer(sql("select * from carbon"), Seq(
+  Row(Map(1 -> "Nalla", 2 -> "Singh", 4 -> "Kumar")),
+  Row(Map(1 -> "Nalla", 2 -> "Singh", 4 -> "Kumar")),
+  Row(Map(10 -> "Nallaa", 20 -> "Sissngh", 100 -> "Gusspta", 40 -> 
"Kumar")),
+  Row(Map(10 -> "Nallaa", 20 -> "Sissngh", 100 -> "Gusspta", 40 -> 
"Kumar"))
+  ))
+  }
+
+  test("Struct inside map") {
+sql("DROP TABLE IF EXISTS carbon")
+sql(
+  s"""
+ | CREATE TABLE carbon(
+ | mapField map>
+ | )
+ | STORED BY 'carbondata'
+ | """
+.stripMargin)
+sql("INSERT INTO carbon values('1\002man\003nan\0012\002kands\003dsnknd')")
+sql("INSERT INTO carbon SELECT * FROM carbon")
+checkAnswer(sql("SELECT * FROM carbon limit 1"),
+  Seq(Row(Map(1 -> Row("man", "nan"), (2 -> Row("kands", "dsnknd"))
+  }
+
+  test("Struct inside map pushdown") {
+sql("DROP TABLE IF EXISTS carbon")
+sql(
+  s"""
+  

carbondata git commit: [CARBONDATA-3187] Supported Global Dictionary For Map

2018-12-20 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 96ce00758 -> 5f0549a81


[CARBONDATA-3187] Supported Global Dictionary For Map

Problem: Global Dictionary was not working for Map datatype and giving Null 
values.

Solution:Added the case for Global Dictionary to be created in case the 
datatype is Complex Map.

This closes #3006


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/5f0549a8
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/5f0549a8
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/5f0549a8

Branch: refs/heads/master
Commit: 5f0549a81e2f232e927ed824db4a6791a633c95f
Parents: 96ce007
Author: manishnalla1994 
Authored: Thu Dec 20 11:23:46 2018 +0530
Committer: kumarvishal09 
Committed: Thu Dec 20 16:53:08 2018 +0530

--
 .../createTable/TestCreateDDLForComplexMapType.scala  | 10 +-
 .../carbondata/spark/util/GlobalDictionaryUtil.scala  |  2 +-
 2 files changed, 6 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/5f0549a8/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
index b8f7549..09f23e5 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateDDLForComplexMapType.scala
@@ -226,22 +226,22 @@ class TestCreateDDLForComplexMapType extends QueryTest 
with BeforeAndAfterAll {
   Row(Map(1 -> "Nalla", 2 -> "", 3 -> "Gupta", 4 -> "Kumar"
   }
 
-  // Support this for Map type
+  // Global Dictionary for Map type
   test("Test Load data in map with dictionary include") {
 sql("DROP TABLE IF EXISTS carbon")
 sql(
   s"""
  | CREATE TABLE carbon(
- | mapField map
+ | mapField map
  | )
  | STORED BY 'carbondata'
  | TBLPROPERTIES('DICTIONARY_INCLUDE'='mapField')
  | """
 .stripMargin)
-sql("insert into carbon values('1\002Nalla\0012\002Singh\0013\002Gupta')")
+sql("insert into carbon 
values('vi\002Nalla\001sh\002Singh\001al\002Gupta')")
 sql("select * from carbon").show(false)
-//checkAnswer(sql("select * from carbon"), Seq(
-//Row(Map(1 -> "Nalla", 2 -> "Singh", 3 -> "Gupta", 4 -> "Kumar"
+checkAnswer(sql("select * from carbon"), Seq(
+  Row(Map("vi" -> "Nalla", "sh" -> "Singh", "al" -> "Gupta"
   }
 
   test("Test Load data in map with partition columns") {

http://git-wip-us.apache.org/repos/asf/carbondata/blob/5f0549a8/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
--
diff --git 
a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
 
b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
index 704382f..922eadb 100644
--- 
a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
+++ 
b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/GlobalDictionaryUtil.scala
@@ -182,7 +182,7 @@ object GlobalDictionaryUtil {
   case None =>
 None
   case Some(dim) =>
-if (DataTypes.isArrayType(dim.getDataType)) {
+if (DataTypes.isArrayType(dim.getDataType) || 
DataTypes.isMapType(dim.getDataType)) {
   val arrDim = ArrayParser(dim, format)
   generateParserForChildrenDimension(dim, format, 
mapColumnValuesWithId, arrDim)
   Some(arrDim)



carbondata git commit: [CARBONDATA-3005]Support Gzip as column compressor

2018-12-11 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master c7d2acb89 -> fd0885b03


[CARBONDATA-3005]Support Gzip as column compressor

This PR is to add a new compressor "Gzip" and enhance the compressing 
capabilities offered by CarbonData.
User can now use gzip as the compressor for loading the data.
Gzip can be set at System Properties level or also for particular table.

This closes #2847


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/fd0885b0
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/fd0885b0
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/fd0885b0

Branch: refs/heads/master
Commit: fd0885b03c5e24c7f78851a9fdc80a0cea0e5980
Parents: c7d2acb
Author: shardul-cr7 
Authored: Tue Oct 23 17:27:47 2018 +0530
Committer: kumarvishal09 
Committed: Tue Dec 11 14:55:41 2018 +0530

--
 .../compression/AbstractCompressor.java |   1 +
 .../compression/CompressorFactory.java  |   3 +-
 .../datastore/compression/GzipCompressor.java   | 134 +++
 .../datastore/compression/ZstdCompressor.java   |   5 -
 .../dataload/TestLoadDataWithCompression.scala  |  94 ++---
 .../TestLoadWithSortTempCompressed.scala|  20 +++
 6 files changed, 236 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/fd0885b0/core/src/main/java/org/apache/carbondata/core/datastore/compression/AbstractCompressor.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/compression/AbstractCompressor.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/compression/AbstractCompressor.java
index 0724bdc..c554dc6 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/compression/AbstractCompressor.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/compression/AbstractCompressor.java
@@ -123,4 +123,5 @@ public abstract class AbstractCompressor implements 
Compressor {
 return false;
   }
 
+  @Override public boolean supportUnsafe() { return false; }
 }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/fd0885b0/core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java
index f7d4e06..b7779ba 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/compression/CompressorFactory.java
@@ -36,7 +36,8 @@ public class CompressorFactory {
 
   public enum NativeSupportedCompressor {
 SNAPPY("snappy", SnappyCompressor.class),
-ZSTD("zstd", ZstdCompressor.class);
+ZSTD("zstd", ZstdCompressor.class),
+GZIP("gzip", GzipCompressor.class);
 
 private String name;
 private Class compressorClass;

http://git-wip-us.apache.org/repos/asf/carbondata/blob/fd0885b0/core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
new file mode 100644
index 000..b386913
--- /dev/null
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/compression/GzipCompressor.java
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.compression;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+
+import org.ap

carbondata git commit: [CARBONDATA-3145] Avoid duplicate decoding for complex column pages while querying

2018-12-10 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 4c9f08217 -> 0c94559e2


[CARBONDATA-3145] Avoid duplicate decoding for complex column pages while 
querying

Problem:
Column page is decoded for getting each row of a complex primitive column.

Solution:
Decode a page it once then use the same.

This closes #2975


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/0c94559e
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/0c94559e
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/0c94559e

Branch: refs/heads/master
Commit: 0c94559e2feaf3d5a001665c3da2bfc3bf941043
Parents: 4c9f082
Author: dhatchayani 
Authored: Wed Dec 5 12:40:56 2018 +0530
Committer: kumarvishal09 
Committed: Mon Dec 10 19:31:12 2018 +0530

--
 .../core/scan/complextypes/ArrayQueryType.java  | 11 ++--
 .../scan/complextypes/ComplexQueryType.java | 14 +++-
 .../scan/complextypes/PrimitiveQueryType.java   | 11 ++--
 .../core/scan/complextypes/StructQueryType.java | 14 ++--
 .../core/scan/filter/GenericQueryType.java  |  4 +-
 .../executer/RowLevelFilterExecuterImpl.java|  7 +-
 .../core/scan/result/BlockletScannedResult.java | 68 +---
 7 files changed, 86 insertions(+), 43 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/0c94559e/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
index a5f4234..8538edb 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ArrayQueryType.java
@@ -22,6 +22,7 @@ import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.util.Map;
 
+import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage;
 import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk;
 import org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
 import org.apache.carbondata.core.scan.filter.GenericQueryType;
@@ -62,17 +63,17 @@ public class ArrayQueryType extends ComplexQueryType 
implements GenericQueryType
   }
 
   public void 
parseBlocksAndReturnComplexColumnByteArray(DimensionRawColumnChunk[] 
rawColumnChunks,
-  int rowNumber, int pageNumber, DataOutputStream dataOutputStream) throws 
IOException {
-byte[] input = copyBlockDataChunk(rawColumnChunks, rowNumber, pageNumber);
+  DimensionColumnPage[][] dimensionColumnPages, int rowNumber, int 
pageNumber,
+  DataOutputStream dataOutputStream) throws IOException {
+byte[] input = copyBlockDataChunk(rawColumnChunks, dimensionColumnPages, 
rowNumber, pageNumber);
 ByteBuffer byteArray = ByteBuffer.wrap(input);
 int dataLength = byteArray.getInt();
 dataOutputStream.writeInt(dataLength);
 if (dataLength > 0) {
   int dataOffset = byteArray.getInt();
   for (int i = 0; i < dataLength; i++) {
-children
-.parseBlocksAndReturnComplexColumnByteArray(rawColumnChunks, 
dataOffset++, pageNumber,
-dataOutputStream);
+children.parseBlocksAndReturnComplexColumnByteArray(rawColumnChunks, 
dimensionColumnPages,
+dataOffset++, pageNumber, dataOutputStream);
   }
 }
   }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/0c94559e/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
index 98f0715..704af89 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/complextypes/ComplexQueryType.java
@@ -19,6 +19,7 @@ package org.apache.carbondata.core.scan.complextypes;
 
 import java.io.IOException;
 
+import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage;
 import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk;
 import org.apache.carbondata.core.scan.processor.RawBlockletColumnChunks;
 
@@ -40,9 +41,10 @@ public class ComplexQueryType {
* This method is also used by child.
*/
   protected byte[] copyBlockDataChunk(DimensionRawColumnChunk[] 
rawColumnChunks,
-  int rowNumber, int pageNumber) {
+  DimensionColumnPage[][] dimensionColumnPages, int rowNumber, int 
pageNumber) {
 by

carbondata git commit: [CARBONDATA-3143] Fixed local dictionary in presto

2018-12-10 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master d9f1a8115 -> 4c9f08217


[CARBONDATA-3143] Fixed local dictionary in presto

Problem:
Currently, local dictionary columns are not working for presto as it is not 
handled in the integration layer.

Solution:
Add local dictionary support to presto integration layer.

This closes #2972


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/4c9f0821
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/4c9f0821
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/4c9f0821

Branch: refs/heads/master
Commit: 4c9f08217c7b9fa7ad33e148dbf33280e0f2b33f
Parents: d9f1a81
Author: ravipesala 
Authored: Mon Dec 3 18:27:33 2018 +0530
Committer: kumarvishal09 
Committed: Mon Dec 10 19:18:32 2018 +0530

--
 .../presto/CarbonColumnVectorWrapper.java   |   2 +-
 .../presto/readers/SliceStreamReader.java   |  35 +++
 .../PrestoAllDataTypeLocalDictTest.scala| 291 +++
 .../integrationtest/PrestoAllDataTypeTest.scala |   2 +-
 .../carbondata/presto/server/PrestoServer.scala |   4 +-
 .../presto/util/CarbonDataStoreCreator.scala|  18 +-
 6 files changed, 342 insertions(+), 10 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/4c9f0821/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorWrapper.java
--
diff --git 
a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorWrapper.java
 
b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorWrapper.java
index a80751f..f001488 100644
--- 
a/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorWrapper.java
+++ 
b/integration/presto/src/main/java/org/apache/carbondata/presto/CarbonColumnVectorWrapper.java
@@ -244,7 +244,7 @@ public class CarbonColumnVectorWrapper implements 
CarbonColumnVector {
   }
 
   @Override public CarbonColumnVector getDictionaryVector() {
-return this.columnVector;
+return this.columnVector.getDictionaryVector();
   }
 
   @Override public void putFloats(int rowId, int count, float[] src, int 
srcIndex) {

http://git-wip-us.apache.org/repos/asf/carbondata/blob/4c9f0821/integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java
--
diff --git 
a/integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java
 
b/integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java
index ab270fc..04e5bb3 100644
--- 
a/integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java
+++ 
b/integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java
@@ -17,14 +17,19 @@
 
 package org.apache.carbondata.presto.readers;
 
+import java.util.Optional;
+
 import org.apache.carbondata.core.metadata.datatype.DataType;
+import org.apache.carbondata.core.scan.result.vector.CarbonDictionary;
 import 
org.apache.carbondata.core.scan.result.vector.impl.CarbonColumnVectorImpl;
 
 import com.facebook.presto.spi.block.Block;
 import com.facebook.presto.spi.block.BlockBuilder;
 import com.facebook.presto.spi.block.DictionaryBlock;
+import com.facebook.presto.spi.block.VariableWidthBlock;
 import com.facebook.presto.spi.type.Type;
 import com.facebook.presto.spi.type.VarcharType;
+import io.airlift.slice.Slices;
 
 import static io.airlift.slice.Slices.wrappedBuffer;
 
@@ -63,6 +68,36 @@ public class SliceStreamReader extends 
CarbonColumnVectorImpl implements PrestoV
 }
   }
 
+  @Override public void setDictionary(CarbonDictionary dictionary) {
+super.setDictionary(dictionary);
+if (dictionary == null) {
+  dictionaryBlock = null;
+  return;
+}
+boolean[] nulls = new boolean[dictionary.getDictionarySize()];
+nulls[0] = true;
+nulls[1] = true;
+int[] dictOffsets = new int[dictionary.getDictionarySize() + 1];
+int size = 0;
+for (int i = 0; i < dictionary.getDictionarySize(); i++) {
+  if (dictionary.getDictionaryValue(i) != null) {
+dictOffsets[i] = size;
+size += dictionary.getDictionaryValue(i).length;
+  }
+}
+byte[] singleArrayDictValues = new byte[size];
+for (int i = 0; i < dictionary.getDictionarySize(); i++) {
+  if (dictionary.getDictionaryValue(i) != null) {
+System.arraycopy(dictionary.getDictionaryValue(i), 0, 
singleArrayDictValues, dictOffsets[i],
+dictionary.getDictionaryValue(i).length);
+  }
+}
+dictOffsets[dictOffsets.length - 1] = size;
+dictionaryBlock = new VariableWidthBlock(dictionary.getDicti

carbondata git commit: [CARBONDATA-3138] Fix random count mismatch with multi-thread block pruning

2018-11-29 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 1bbae2657 -> 0bcd8677a


[CARBONDATA-3138] Fix random count mismatch with multi-thread block pruning

problem: Random count mismatch in query in multi-thread block-pruning scenario.

cause: Existing prune method not meant for multi-threading as synchronization 
was missing.
only in implicit filter scenario, while preparing the block ID list, 
synchronization was missing. Hence pruning was giving wrong result.

solution: synchronize the implicit filter preparation, as prune now called in 
multi-thread

This closes #2962


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/0bcd8677
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/0bcd8677
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/0bcd8677

Branch: refs/heads/master
Commit: 0bcd8677a88eab90942ebadf57a31fac1de7f75a
Parents: 1bbae26
Author: ajantha-bhat 
Authored: Wed Nov 28 19:18:16 2018 +0530
Committer: kumarvishal09 
Committed: Thu Nov 29 17:51:12 2018 +0530

--
 .../carbondata/core/datamap/TableDataMap.java| 13 +++--
 .../core/scan/filter/ColumnFilterInfo.java   | 19 +--
 2 files changed, 24 insertions(+), 8 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/0bcd8677/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java 
b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
index e1b2c13..06d2cab 100644
--- a/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
+++ b/core/src/main/java/org/apache/carbondata/core/datamap/TableDataMap.java
@@ -145,6 +145,7 @@ public final class TableDataMap extends 
OperationEventListener {
 // for filter queries
 int totalFiles = 0;
 int datamapsCount = 0;
+int filesCountPerDatamap;
 boolean isBlockDataMapType = true;
 for (Segment segment : segments) {
   for (DataMap dataMap : dataMaps.get(segment)) {
@@ -152,7 +153,9 @@ public final class TableDataMap extends 
OperationEventListener {
   isBlockDataMapType = false;
   break;
 }
-totalFiles += ((BlockDataMap) dataMap).getTotalBlocks();
+filesCountPerDatamap = ((BlockDataMap) dataMap).getTotalBlocks();
+// old legacy store can give 0, so consider one datamap as 1 record.
+totalFiles += (filesCountPerDatamap == 0) ? 1 : filesCountPerDatamap;
 datamapsCount++;
   }
   if (!isBlockDataMapType) {
@@ -206,10 +209,14 @@ public final class TableDataMap extends 
OperationEventListener {
   List blocklets, final Map> 
dataMaps,
   int totalFiles) {
 int numOfThreadsForPruning = getNumOfThreadsForPruning();
+LOG.info(
+"Number of threads selected for multi-thread block pruning is " + 
numOfThreadsForPruning
++ ". total files: " + totalFiles + ". total segments: " + 
segments.size());
 int filesPerEachThread = totalFiles / numOfThreadsForPruning;
 int prev;
 int filesCount = 0;
 int processedFileCount = 0;
+int filesCountPerDatamap;
 List> segmentList = new 
ArrayList<>(numOfThreadsForPruning);
 List segmentDataMapGroupList = new ArrayList<>();
 for (Segment segment : segments) {
@@ -217,7 +224,9 @@ public final class TableDataMap extends 
OperationEventListener {
   prev = 0;
   for (int i = 0; i < eachSegmentDataMapList.size(); i++) {
 DataMap dataMap = eachSegmentDataMapList.get(i);
-filesCount += ((BlockDataMap) dataMap).getTotalBlocks();
+filesCountPerDatamap = ((BlockDataMap) dataMap).getTotalBlocks();
+// old legacy store can give 0, so consider one datamap as 1 record.
+filesCount += (filesCountPerDatamap == 0) ? 1 : filesCountPerDatamap;
 if (filesCount >= filesPerEachThread) {
   if (segmentList.size() != numOfThreadsForPruning - 1) {
 // not the last segmentList

http://git-wip-us.apache.org/repos/asf/carbondata/blob/0bcd8677/core/src/main/java/org/apache/carbondata/core/scan/filter/ColumnFilterInfo.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/filter/ColumnFilterInfo.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/filter/ColumnFilterInfo.java
index 75ec35e..8677a2d 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/filter/ColumnFilterInfo.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/filter/ColumnFilterInfo.java
@@ -107,19 +107,26 @@ public class

carbondata git commit: [DOCUMENT] Added filter push handling parameter in documents.

2018-11-28 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master eeeaf50f1 -> c5bfe4acf


[DOCUMENT] Added filter push handling parameter in documents.

Added filter push handling parameter in documents

This closes #2957


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/c5bfe4ac
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/c5bfe4ac
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/c5bfe4ac

Branch: refs/heads/master
Commit: c5bfe4acfffe33679a95d22f67f0859da583adb1
Parents: eeeaf50
Author: ravipesala 
Authored: Tue Nov 27 15:16:57 2018 +0530
Committer: kumarvishal09 
Committed: Wed Nov 28 15:51:13 2018 +0530

--
 docs/configuration-parameters.md | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/c5bfe4ac/docs/configuration-parameters.md
--
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index c82d5d7..a41a3d5 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -138,6 +138,7 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.query.validate.direct.query.on.datamap | true | CarbonData supports 
creating pre-aggregate table datamaps as an independent tables. For some 
debugging purposes, it might be required to directly query from such datamap 
tables. This configuration allows to query on such datamaps. |
 | carbon.max.driver.threads.for.block.pruning | 4 | Number of threads used for 
driver pruning when the carbon files are more than 100k Maximum memory. This 
configuration can used to set number of threads between 1 to 4. |
 | carbon.heap.memory.pooling.threshold.bytes | 1048576 | CarbonData supports 
unsafe operations of Java to avoid GC overhead for certain operations. Using 
unsafe, memory can be allocated on Java Heap or off heap. This configuration 
controls the allocation mechanism on Java HEAP. If the heap memory allocations 
of the given size is greater or equal than this value,it should go through the 
pooling mechanism. But if set this size to -1, it should not go through the 
pooling mechanism. Default value is 1048576(1MB, the same as Spark). Value to 
be specified in bytes. |
+| carbon.push.rowfilters.for.vector | false | When enabled complete row 
filters will be handled by carbon in case of vector. If it is disabled then 
only page level pruning will be done by carbon and row level filtering will be 
done by spark for vector. And also there are scan optimizations in carbon to 
avoid multiple data copies when this parameter is set to false. There is no 
change in flow for non-vector based queries. |
 
 ## Data Mutation Configuration
 | Parameter | Default Value | Description |



carbondata git commit: [CARBONDATA-2896] Added TestCases for Adaptive encoding

2018-11-22 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 50ecb83a2 -> 0b83a8183


[CARBONDATA-2896] Added TestCases for Adaptive encoding

Test cases added for Adaptive encoding for primitive types.

This closes #2849


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/0b83a818
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/0b83a818
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/0b83a818

Branch: refs/heads/master
Commit: 0b83a8183f70973960ab7ea25b68f27fb3e7247f
Parents: 50ecb83
Author: dhatchayani 
Authored: Mon Oct 22 12:28:13 2018 +0530
Committer: kumarvishal09 
Committed: Thu Nov 22 16:35:31 2018 +0530

--
 .../test/resources/dataWithNegativeValues.csv   |   7 +
 .../TestAdaptiveEncodingForPrimitiveTypes.scala | 430 +++
 2 files changed, 437 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/0b83a818/integration/spark-common-test/src/test/resources/dataWithNegativeValues.csv
--
diff --git 
a/integration/spark-common-test/src/test/resources/dataWithNegativeValues.csv 
b/integration/spark-common-test/src/test/resources/dataWithNegativeValues.csv
new file mode 100644
index 000..9e369ca
--- /dev/null
+++ 
b/integration/spark-common-test/src/test/resources/dataWithNegativeValues.csv
@@ -0,0 +1,7 @@
+-3,aaa,-300
+0,ddd,0
+-2,bbb,-200
+7,ggg,700
+1,eee,100
+-1,ccc,-100
+null,null,null
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/carbondata/blob/0b83a818/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/TestAdaptiveEncodingForPrimitiveTypes.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/TestAdaptiveEncodingForPrimitiveTypes.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/TestAdaptiveEncodingForPrimitiveTypes.scala
new file mode 100644
index 000..944de37
--- /dev/null
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/primitiveTypes/TestAdaptiveEncodingForPrimitiveTypes.scala
@@ -0,0 +1,430 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.primitiveTypes
+
+import java.io.File
+
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class TestAdaptiveEncodingForPrimitiveTypes extends QueryTest with 
BeforeAndAfterAll {
+
+  val rootPath = new File(this.getClass.getResource("/").getPath
+  + "../../../..").getCanonicalPath
+
+  private val vectorReader = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.ENABLE_VECTOR_READER)
+
+  private val unsafeColumnPage = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE)
+
+  private val unsafeQueryExecution = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.ENABLE_UNSAFE_IN_QUERY_EXECUTION)
+
+  private val unsafeSort = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.ENABLE_UNSAFE_SORT)
+
+  private val compactionThreshold = CarbonProperties.getInstance()
+.getProperty(CarbonCommonConstants.COMPACTION_SEGMENT_LEVEL_THRESHOLD)
+
+  CarbonProperties.getInstance()
+.addProperty(CarbonCommonConstants.COMPACTION_SEGMENT_LEVEL_THRESHOLD, 
"2,2")
+
+  override def beforeAll: Unit = {
+dropTables
+sql(
+  "CREATE TABLE uniqdata_Compare (CUST_ID int,CUST_NAME 
String,ACTIVE_EMUI_VERSION string, DOB " +
+  "ti

carbondata git commit: [CARBONDATA-3114]Remove Null Values for a Dictionary_Include Timestamp column for Range Filters

2018-11-22 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 697eee3de -> 50ecb83a2


[CARBONDATA-3114]Remove Null Values for a Dictionary_Include Timestamp column 
for Range Filters

Problem:
Null Values are not removed in case of RangeFilters, if column is a dictionary 
and no_inverted_index timestamp column.
Solution:
Remove NULL values in case of RangeFilters for such dictionary and 
no_inverted_index timestamp column.

This closes #2937


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/50ecb83a
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/50ecb83a
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/50ecb83a

Branch: refs/heads/master
Commit: 50ecb83a264ab6512ebade0580e3288295452966
Parents: 697eee3
Author: Indhumathi27 
Authored: Wed Nov 21 15:21:49 2018 +0530
Committer: kumarvishal09 
Committed: Thu Nov 22 16:32:15 2018 +0530

--
 .../carbondata/core/scan/filter/FilterUtil.java | 23 +++
 .../executer/RangeValueFilterExecuterImpl.java  | 21 ++
 .../RowLevelRangeGrtThanFiterExecuterImpl.java  |  8 +-
 ...elRangeGrtrThanEquaToFilterExecuterImpl.java |  8 +-
 ...velRangeLessThanEqualFilterExecuterImpl.java | 20 -
 ...RowLevelRangeLessThanFilterExecuterImpl.java | 20 -
 .../src/test/resources/data_timestamp.csv   | 10 +++
 ...estampDataTypeDirectDictionaryTestCase.scala | 30 
 8 files changed, 89 insertions(+), 51 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/50ecb83a/core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java 
b/core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java
index 06672f5..286f68f 100644
--- a/core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java
+++ b/core/src/main/java/org/apache/carbondata/core/scan/filter/FilterUtil.java
@@ -52,6 +52,8 @@ import 
org.apache.carbondata.core.datastore.chunk.DimensionColumnPage;
 import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk;
 import org.apache.carbondata.core.keygenerator.KeyGenException;
 import org.apache.carbondata.core.keygenerator.KeyGenerator;
+import 
org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator;
+import 
org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryKeyGeneratorFactory;
 import org.apache.carbondata.core.keygenerator.factory.KeyGeneratorFactory;
 import 
org.apache.carbondata.core.keygenerator.mdkey.MultiDimKeyVarLengthGenerator;
 import org.apache.carbondata.core.metadata.AbsoluteTableIdentifier;
@@ -2247,4 +2249,25 @@ public final class FilterUtil {
 }
   }
 
+  /**
+   * This method is used to get default null values for a direct dictionary 
column
+   * @param currentBlockDimension
+   * @param segmentProperties
+   * @return
+   */
+  public static byte[] getDefaultNullValue(CarbonDimension 
currentBlockDimension,
+  SegmentProperties segmentProperties) {
+byte[] defaultValue = null;
+DirectDictionaryGenerator directDictionaryGenerator = 
DirectDictionaryKeyGeneratorFactory
+.getDirectDictionaryGenerator(currentBlockDimension.getDataType());
+int key = directDictionaryGenerator.generateDirectSurrogateKey(null);
+if (currentBlockDimension.isSortColumn()) {
+  defaultValue = FilterUtil
+  .getMaskKey(key, currentBlockDimension, 
segmentProperties.getSortColumnsGenerator());
+} else {
+  defaultValue = ByteUtil.toXorBytes(key);
+}
+return defaultValue;
+  }
+
 }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/50ecb83a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java
index e84e82d..bcae001 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RangeValueFilterExecuterImpl.java
@@ -24,8 +24,6 @@ import 
org.apache.carbondata.core.constants.CarbonCommonConstants;
 import org.apache.carbondata.core.datastore.block.SegmentProperties;
 import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage;
 import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk;
-imp

carbondata git commit: [CARBONDATA-3115] Fix CodeGen error in preaggregate table and codegen display issue in oldstores

2018-11-22 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 0fa0a96c4 -> 697eee3de


[CARBONDATA-3115] Fix CodeGen error in preaggregate table and codegen display 
issue in oldstores

Problem:
1. While querying a preaggregate table, codegen error is displayed.
2. In old stores, code is getting displayed while executing queries.

This closes #2939


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/697eee3d
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/697eee3d
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/697eee3d

Branch: refs/heads/master
Commit: 697eee3de7eb1147fd75452d10acfe087a0566ba
Parents: 0fa0a96
Author: Indhumathi27 
Authored: Wed Nov 21 17:23:25 2018 +0530
Committer: kumarvishal09 
Committed: Thu Nov 22 15:31:00 2018 +0530

--
 .../preaggregate/TestPreAggCreateCommand.scala  | 23 
 .../spark/sql/CarbonDictionaryDecoder.scala | 12 +-
 2 files changed, 29 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/697eee3d/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
index 9fbdff7..7851bd1 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/preaggregate/TestPreAggCreateCommand.scala
@@ -438,6 +438,29 @@ class TestPreAggCreateCommand extends QueryTest with 
BeforeAndAfterAll {
 }
   }
 
+  test("test codegen issue with preaggregate") {
+sql("DROP TABLE IF EXISTS PreAggMain")
+sql("CREATE TABLE PreAggMain (id Int, date date, country string, phonetype 
string, " +
+"serialname String,salary int ) STORED BY 
'org.apache.carbondata.format' " +
+"tblproperties('dictionary_include'='country')")
+sql("create datamap PreAggSum on table PreAggMain using 'preaggregate' as 
" +
+"select country,sum(salary) as sum from PreAggMain group by country")
+sql("create datamap PreAggAvg on table PreAggMain using 'preaggregate' as 
" +
+"select country,avg(salary) as avg from PreAggMain group by country")
+sql("create datamap PreAggCount on table PreAggMain using 'preaggregate' 
as " +
+"select country,count(salary) as count from PreAggMain group by 
country")
+sql("create datamap PreAggMin on table PreAggMain using 'preaggregate' as 
" +
+"select country,min(salary) as min from PreAggMain group by country")
+sql("create datamap PreAggMax on table PreAggMain using 'preaggregate' as 
" +
+"select country,max(salary) as max from PreAggMain group by country")
+sql(s"LOAD DATA INPATH 
'$integrationPath/spark-common-test/src/test/resources/source.csv' " +
+s"into table PreAggMain")
+checkExistence(sql("select t1.country,sum(id) from PreAggMain t1 join 
(select " +
+   "country as newcountry,sum(salary) as sum from 
PreAggMain group by country)" +
+   "t2 on t1.country=t2.newcountry group by country"), 
true, "france")
+sql("DROP TABLE IF EXISTS PreAggMain")
+  }
+
   // TODO: Need to Fix
   ignore("test creation of multiple preaggregate of same name concurrently") {
 sql("DROP TABLE IF EXISTS tbl_concurr")

http://git-wip-us.apache.org/repos/asf/carbondata/blob/697eee3d/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
--
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
index 95ab29d..3b20c2f 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/CarbonDictionaryDecoder.scala
@@ -248,34 +248,34 @@ case class CarbonDictionaryDecoder(
|org.apache.spark.sql.DictTuple $value = 
$decodeDecimal($dictRef, ${ev.value});
   

carbondata git commit: [CARBONDATA-3096] Wrong records size on the input metrics

2018-11-21 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 2f69e4fb7 -> b8d602598


[CARBONDATA-3096] Wrong records size on the input metrics

Scanned record result size is taking from the default batch size. It should be 
taken from the records scanned.

This closes #2927


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/b8d60259
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/b8d60259
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/b8d60259

Branch: refs/heads/master
Commit: b8d6025982cf27a172674de19db69b60f1448958
Parents: 2f69e4f
Author: dhatchayani 
Authored: Tue Nov 13 18:28:48 2018 +0530
Committer: kumarvishal09 
Committed: Wed Nov 21 19:45:21 2018 +0530

--
 .../spark/vectorreader/VectorizedCarbonRecordReader.java | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/b8d60259/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java
--
diff --git 
a/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java
 
b/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java
index 1f28b8c..c9a4ba4 100644
--- 
a/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java
+++ 
b/integration/spark-datasource/src/main/scala/org/apache/carbondata/spark/vectorreader/VectorizedCarbonRecordReader.java
@@ -163,8 +163,8 @@ public class VectorizedCarbonRecordReader extends 
AbstractRecordReader {
 
   @Override
   public void close() throws IOException {
-logStatistics(rowCount, queryModel.getStatisticsRecorder());
 if (vectorProxy != null) {
+  logStatistics(rowCount, queryModel.getStatisticsRecorder());
   vectorProxy.close();
   vectorProxy = null;
 }
@@ -200,7 +200,7 @@ public class VectorizedCarbonRecordReader extends 
AbstractRecordReader {
   @Override
   public Object getCurrentValue() throws IOException, InterruptedException {
 if (returnColumnarBatch) {
-  int value = vectorProxy.numRows();
+  int value = carbonColumnarBatch.getActualSize();
   rowCount += value;
   if (inputMetricsStats != null) {
 inputMetricsStats.incrementRecordRead((long) value);



carbondata git commit: [CARBONDATA-3070] Fix partition load issue when custom location is added.

2018-11-02 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 74a2ddee9 -> d62277696


[CARBONDATA-3070] Fix partition load issue when custom location is added.

Problem:
Load files from carbonfile format when custom partition location is added

Reason:
Carbon has its own filename for each carbondata file, it does not use the 
filename proposed by spark.
And also it has extra index file need to be created. In case of custom 
partition location sparks keep track the files
of name which creates and move them. But carbon has different files created and 
maintained, that creates the filenot found exception.

Solution:
Use custom protocol to manage commit and folder location for custom partition 
location.

This closes #2873


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/d6227769
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/d6227769
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/d6227769

Branch: refs/heads/master
Commit: d62277696cd19257a50cc956e3e7ff8fad5e651f
Parents: 74a2dde
Author: ravipesala 
Authored: Mon Oct 29 13:15:00 2018 +0530
Committer: kumarvishal09 
Committed: Fri Nov 2 18:29:46 2018 +0530

--
 .../datasources/SparkCarbonFileFormat.scala | 87 +++-
 .../org/apache/spark/sql/CarbonVectorProxy.java |  3 +
 .../datasource/SparkCarbonDataSourceTest.scala  | 34 
 3 files changed, 120 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/d6227769/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
--
diff --git 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
index cd2035c..8c2f200 100644
--- 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
+++ 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/SparkCarbonFileFormat.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.sql.carbondata.execution.datasources
 
+import java.net.URI
+
 import scala.collection.JavaConverters._
 import scala.collection.mutable.ArrayBuffer
 
@@ -27,6 +29,7 @@ import org.apache.hadoop.mapreduce._
 import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
 import org.apache.spark.TaskContext
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.io.FileCommitProtocol
 import org.apache.spark.memory.MemoryMode
 import org.apache.spark.sql._
 import 
org.apache.spark.sql.carbondata.execution.datasources.readsupport.SparkUnsafeRowReadSuport
@@ -112,6 +115,13 @@ class SparkCarbonFileFormat extends FileFormat
   }
 
   /**
+   * Add our own protocol to control the commit.
+   */
+  SparkSession.getActiveSession.get.sessionState.conf.setConfString(
+"spark.sql.sources.commitProtocolClass",
+
"org.apache.spark.sql.carbondata.execution.datasources.CarbonSQLHadoopMapReduceCommitProtocol")
+
+  /**
* Prepares a write job and returns an [[OutputWriterFactory]].  Client side 
job preparation is
* done here.
*/
@@ -125,6 +135,7 @@ class SparkCarbonFileFormat extends FileFormat
 val model = CarbonSparkDataSourceUtil.prepareLoadModel(options, dataSchema)
 model.setLoadWithoutConverterStep(true)
 CarbonTableOutputFormat.setLoadModel(conf, model)
+conf.set(CarbonSQLHadoopMapReduceCommitProtocol.COMMIT_PROTOCOL, "true")
 
 new OutputWriterFactory {
   override def newInstance(
@@ -310,7 +321,6 @@ class SparkCarbonFileFormat extends FileFormat
 vectorizedReader.toBoolean && 
schema.forall(_.dataType.isInstanceOf[AtomicType])
   }
 
-
   /**
* Returns whether this format support returning columnar batch or not.
*/
@@ -369,7 +379,7 @@ class SparkCarbonFileFormat extends FileFormat
 
   if (file.filePath.endsWith(CarbonTablePath.CARBON_DATA_EXT)) {
 val split = new CarbonInputSplit("null",
-  new Path(file.filePath),
+  new Path(new URI(file.filePath)),
   file.start,
   file.length,
   file.locations,
@@ -380,10 +390,12 @@ class SparkCarbonFileFormat extends FileFormat
 split.setDetailInfo(info)
 info.setBlockSize(file.length)
 // Read the footer offset and set.
-val reader = 
FileFactory.getFileHolder(FileFactory.getFileType(file.filePath),
+val reader = 
FileFactory.getFileHolder(FileFactory.getFileType(split.getPath.toString),
   bro

carbondata git commit: [HOTFIX-compatibility] Handle Lazy loading with inverted index for ColumnarVectorWrapperDirectWithInvertedIndex

2018-10-31 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master bcf3e0fd5 -> 94a4f8314


[HOTFIX-compatibility] Handle Lazy loading with inverted index for 
ColumnarVectorWrapperDirectWithInvertedIndex

Problem:
Create a store with 1.4 code with inverted index and read it with vector 
filling (latest master code).
below exception will be thrown from AbstractCarbonColumnarVector.
UnsupportedOperationException("Not allowed from here " + getClass().getName());

cause:
when the lazy loading with an inverted index,
getBlockDataType() was not implemented for 
ColumnarVectorWrapperDirectWithInvertedIndex.

So, Added implementation.

This closes #2870


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/94a4f831
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/94a4f831
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/94a4f831

Branch: refs/heads/master
Commit: 94a4f8314068ffd4c0743907752b58879578749b
Parents: bcf3e0f
Author: ajantha-bhat 
Authored: Mon Oct 29 12:33:55 2018 +0530
Committer: kumarvishal09 
Committed: Wed Oct 31 17:54:03 2018 +0530

--
 .../encoding/adaptive/AdaptiveDeltaFloatingCodec.java | 10 ++
 .../page/encoding/adaptive/AdaptiveFloatingCodec.java | 10 ++
 .../ColumnarVectorWrapperDirectWithInvertedIndex.java |  6 ++
 3 files changed, 26 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/94a4f831/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java
index d73318d..f91ede5 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveDeltaFloatingCodec.java
@@ -272,6 +272,11 @@ public class AdaptiveDeltaFloatingCodec extends 
AdaptiveCodec {
 int shortInt = ByteUtil.valueOf3Bytes(shortIntPage, i * 3);
 vector.putFloat(i, (max - shortInt) / floatFactor);
   }
+} else if (pageDataType == DataTypes.INT) {
+  int[] intData = columnPage.getIntPage();
+  for (int i = 0; i < pageSize; i++) {
+vector.putFloat(i, (max - intData[i]) / floatFactor);
+  }
 } else {
   throw new RuntimeException("internal error: " + this.toString());
 }
@@ -298,6 +303,11 @@ public class AdaptiveDeltaFloatingCodec extends 
AdaptiveCodec {
   for (int i = 0; i < pageSize; i++) {
 vector.putDouble(i, (max - intData[i]) / factor);
   }
+} else if (pageDataType == DataTypes.LONG) {
+  long[] longData = columnPage.getLongPage();
+  for (int i = 0; i < pageSize; i++) {
+vector.putDouble(i, (max - longData[i]) / factor);
+  }
 } else {
   throw new RuntimeException("Unsupported datatype : " + pageDataType);
 }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/94a4f831/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveFloatingCodec.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveFloatingCodec.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveFloatingCodec.java
index b300ee1..49696eb 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveFloatingCodec.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/adaptive/AdaptiveFloatingCodec.java
@@ -274,6 +274,11 @@ public class AdaptiveFloatingCodec extends AdaptiveCodec {
 int shortInt = ByteUtil.valueOf3Bytes(shortIntPage, i * 3);
 vector.putFloat(i, (shortInt / floatFactor));
   }
+} else if (pageDataType == DataTypes.INT) {
+  int[] intData = columnPage.getIntPage();
+  for (int i = 0; i < pageSize; i++) {
+vector.putFloat(i, (intData[i] / floatFactor));
+  }
 } else {
   throw new RuntimeException("internal error: " + this.toString());
 }
@@ -300,6 +305,11 @@ public class AdaptiveFloatingCodec extends AdaptiveCodec {
   for (int i = 0; i < pageSize; i++) {
 vector.putDouble(i, (intData[i] / factor));
   }
+} 

[1/2] carbondata git commit: [CARBONDATA-3015] Support Lazy load in carbon vector

2018-10-26 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 019f5cd06 -> 170c2f56d


http://git-wip-us.apache.org/repos/asf/carbondata/blob/170c2f56/integration/spark-datasource/src/main/spark2.3plus/org/apache/spark/sql/CarbonVectorProxy.java
--
diff --git 
a/integration/spark-datasource/src/main/spark2.3plus/org/apache/spark/sql/CarbonVectorProxy.java
 
b/integration/spark-datasource/src/main/spark2.3plus/org/apache/spark/sql/CarbonVectorProxy.java
index bd74b05..c8c4e2c 100644
--- 
a/integration/spark-datasource/src/main/spark2.3plus/org/apache/spark/sql/CarbonVectorProxy.java
+++ 
b/integration/spark-datasource/src/main/spark2.3plus/org/apache/spark/sql/CarbonVectorProxy.java
@@ -19,12 +19,16 @@ package org.apache.spark.sql;
 import java.math.BigInteger;
 
 import org.apache.carbondata.core.scan.result.vector.CarbonDictionary;
+import org.apache.carbondata.core.scan.scanner.LazyPageLoader;
 
 import org.apache.spark.memory.MemoryMode;
 import org.apache.spark.sql.catalyst.InternalRow;
 import org.apache.spark.sql.execution.vectorized.WritableColumnVector;
 import org.apache.spark.sql.types.*;
+import org.apache.spark.sql.vectorized.ColumnVector;
+import org.apache.spark.sql.vectorized.ColumnarArray;
 import org.apache.spark.sql.vectorized.ColumnarBatch;
+import org.apache.spark.sql.vectorized.ColumnarMap;
 import org.apache.spark.unsafe.types.CalendarInterval;
 import org.apache.spark.unsafe.types.UTF8String;
 
@@ -52,23 +56,23 @@ public class CarbonVectorProxy {
 public CarbonVectorProxy(MemoryMode memMode, int rowNum, StructField[] 
structFileds) {
 WritableColumnVector[] columnVectors =
 ColumnVectorFactory.getColumnVector(memMode, new 
StructType(structFileds), rowNum);
-columnarBatch = new ColumnarBatch(columnVectors);
-columnarBatch.setNumRows(rowNum);
-columnVectorProxies = new ColumnVectorProxy[columnarBatch.numCols()];
+columnVectorProxies = new ColumnVectorProxy[columnVectors.length];
 for (int i = 0; i < columnVectorProxies.length; i++) {
-columnVectorProxies[i] = new ColumnVectorProxy(columnarBatch, i);
+columnVectorProxies[i] = new ColumnVectorProxy(columnVectors[i]);
 }
+columnarBatch = new ColumnarBatch(columnVectorProxies);
+columnarBatch.setNumRows(rowNum);
 }
 
 public CarbonVectorProxy(MemoryMode memMode, StructType outputSchema, int 
rowNum) {
 WritableColumnVector[] columnVectors = ColumnVectorFactory
 .getColumnVector(memMode, outputSchema, rowNum);
-columnarBatch = new ColumnarBatch(columnVectors);
-columnarBatch.setNumRows(rowNum);
-columnVectorProxies = new ColumnVectorProxy[columnarBatch.numCols()];
+columnVectorProxies = new ColumnVectorProxy[columnVectors.length];
 for (int i = 0; i < columnVectorProxies.length; i++) {
-columnVectorProxies[i] = new ColumnVectorProxy(columnarBatch, i);
+columnVectorProxies[i] = new ColumnVectorProxy(columnVectors[i]);
 }
+columnarBatch = new ColumnarBatch(columnVectorProxies);
+columnarBatch.setNumRows(rowNum);
 }
 
 /**
@@ -86,7 +90,7 @@ public class CarbonVectorProxy {
  * @return
  */
 public WritableColumnVector column(int ordinal) {
-return (WritableColumnVector) columnarBatch.column(ordinal);
+return ((ColumnVectorProxy) columnarBatch.column(ordinal)).getVector();
 }
 
 public ColumnVectorProxy getColumnVector(int ordinal) {
@@ -97,12 +101,12 @@ public class CarbonVectorProxy {
  */
 public void reset() {
 for (int i = 0; i < columnarBatch.numCols(); i++) {
-((WritableColumnVector)columnarBatch.column(i)).reset();
+((ColumnVectorProxy) columnarBatch.column(i)).reset();
 }
 }
 
 public void resetDictionaryIds(int ordinal) {
-
((WritableColumnVector)columnarBatch.column(ordinal)).getDictionaryIds().reset();
+(((ColumnVectorProxy) 
columnarBatch.column(ordinal)).getVector()).getDictionaryIds().reset();
 }
 
 /**
@@ -140,65 +144,70 @@ public class CarbonVectorProxy {
 return columnarBatch.column(ordinal).dataType();
 }
 
-public static class ColumnVectorProxy {
+public static class ColumnVectorProxy extends ColumnVector {
 
 private WritableColumnVector vector;
 
-public ColumnVectorProxy(ColumnarBatch columnarBatch, int ordinal) {
-vector = (WritableColumnVector) columnarBatch.column(ordinal);
+private LazyPageLoader pageLoad;
+
+private boolean isLoaded;
+
+public ColumnVectorProxy(ColumnVector columnVector) {
+super(columnVector.dataType());
+vector = (WritableColumnVector) columnVector;
 }
 
-public void putRowToColumnBatch(int rowId, Object value, int offset) {
-DataType t = dataType(offset);
+   

[2/2] carbondata git commit: [CARBONDATA-3015] Support Lazy load in carbon vector

2018-10-26 Thread kumarvishal09
[CARBONDATA-3015] Support Lazy load in carbon vector

Even though we prune the pages as per min/max there is a high chance of false 
positives in case of filters on high cardinality columns.
So to avoid that we can use the lazy loading design. It does not 
read/decompresses data and fill the vector immediately
when the call comes for data filling from spark/presto.
First only reads the required filter columns give back to execution engine, 
execution engine starts filtering on the filtered column vector
and if it finds some data need to be read from projection columns then only it 
starts reads the projection columns and fills the vector on demand.
It is the concept of presto and same is integrated with spark 2.3. Older 
versions of spark cannot use this advantage as ColumnVector interfaces are 
non-extendable.
For the above purpose added new classes 'LazyBlockletLoad' and 'LazyPageLoad' 
and changed the carbon vector interfaces.

This closes #2823


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/170c2f56
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/170c2f56
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/170c2f56

Branch: refs/heads/master
Commit: 170c2f56dc1f9b55444aa727d0e587a207f7b8c7
Parents: 019f5cd
Author: ravipesala 
Authored: Tue Oct 16 18:39:16 2018 +0530
Committer: kumarvishal09 
Committed: Sat Oct 27 05:28:54 2018 +0530

--
 .../core/constants/CarbonCommonConstants.java   |   2 +-
 .../safe/AbstractNonDictionaryVectorFiller.java |   2 +-
 .../datastore/page/SafeFixLengthColumnPage.java |   4 +-
 .../encoding/compress/DirectCompressCodec.java  |   5 +
 .../core/scan/result/BlockletScannedResult.java |  33 ++-
 .../scan/result/vector/CarbonColumnVector.java  |   3 +
 .../vector/impl/CarbonColumnVectorImpl.java |   5 +-
 .../AbstractCarbonColumnarVector.java   |  46 ++--
 .../core/scan/scanner/LazyBlockletLoader.java   | 158 
 .../core/scan/scanner/LazyPageLoader.java   |  80 ++
 .../scanner/impl/BlockletFilterScanner.java |  77 ++
 .../scan/scanner/impl/BlockletFullScanner.java  |   5 +-
 .../presto/CarbonColumnVectorWrapper.java   |   4 +
 .../lucene/LuceneFineGrainDataMapSuite.scala|   2 +-
 ...imestampNoDictionaryColumnCastTestCase.scala |   2 +-
 .../vectorreader/ColumnarVectorWrapper.java |  80 +++---
 .../ColumnarVectorWrapperDirect.java|  57 +++--
 .../datasources/SparkCarbonFileFormat.scala |   2 +-
 .../org/apache/spark/sql/CarbonVectorProxy.java |  88 +++
 .../org/apache/spark/sql/CarbonVectorProxy.java | 249 +--
 .../stream/CarbonStreamRecordReader.java|   2 +-
 .../partition/TestAlterPartitionTable.scala |   4 +-
 22 files changed, 630 insertions(+), 280 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/170c2f56/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index 72da3bd..7df1b7e 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -1735,7 +1735,7 @@ public final class CarbonCommonConstants {
   public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR =
   "carbon.push.rowfilters.for.vector";
 
-  public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = 
"true";
+  public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = 
"false";
 
   
//
   // Unused constants and parameters start here

http://git-wip-us.apache.org/repos/asf/carbondata/blob/170c2f56/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java
index 2e68648..9626da7 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java
@@ -52,7 +52,7 @@ class NonDictionaryVectorFillerFactory {
   public static Abs

carbondata git commit: [CARBONDATA-3014] Added support for inverted index and delete delta for direct scan queries

2018-10-26 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master b62b0fd9c -> 71d617955


[CARBONDATA-3014] Added support for inverted index and delete delta for direct 
scan queries

Added new classes to support inverted index and delete delta directly from 
column vector.
ColumnarVectorWrapperDirectWithInvertedIndex
ColumnarVectorWrapperDirectWithDeleteDelta
ColumnarVectorWrapperDirectWithDeleteDeltaAndInvertedIndex

This closes #2822


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/71d61795
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/71d61795
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/71d61795

Branch: refs/heads/master
Commit: 71d6179557703718ff0aac099efcc89ee41ed941
Parents: b62b0fd
Author: ravipesala 
Authored: Tue Oct 16 16:37:18 2018 +0530
Committer: kumarvishal09 
Committed: Fri Oct 26 18:52:10 2018 +0530

--
 ...mpressedDimensionChunkFileBasedReaderV3.java |  12 +-
 .../safe/AbstractNonDictionaryVectorFiller.java |   6 +-
 .../SafeFixedLengthDimensionDataChunkStore.java |  11 +
 ...feVariableLengthDimensionDataChunkStore.java |  10 +
 .../adaptive/AdaptiveDeltaFloatingCodec.java|   3 +
 .../adaptive/AdaptiveDeltaIntegralCodec.java|  35 ++-
 .../adaptive/AdaptiveFloatingCodec.java |   3 +
 .../adaptive/AdaptiveIntegralCodec.java |  17 +-
 .../encoding/compress/DirectCompressCodec.java  |  16 +-
 .../datatype/DecimalConverterFactory.java   |  42 +++-
 .../scan/collector/ResultCollectorFactory.java  |  11 +-
 .../executer/RestructureEvaluatorImpl.java  |   2 +-
 ...elRangeGrtrThanEquaToFilterExecuterImpl.java |  14 +-
 .../scan/result/vector/ColumnVectorInfo.java|   1 +
 .../AbstractCarbonColumnarVector.java   | 133 
 .../ColumnarVectorWrapperDirectFactory.java |  59 +
 ...umnarVectorWrapperDirectWithDeleteDelta.java | 216 +++
 ...erDirectWithDeleteDeltaAndInvertedIndex.java | 179 +++
 ...narVectorWrapperDirectWithInvertedIndex.java | 144 +
 .../impl/directread/ConvertableVector.java  |  30 +++
 .../scanner/impl/BlockletFilterScanner.java |   8 +-
 .../detailquery/CastColumnTestCase.scala|   2 +-
 .../datasources/SparkCarbonFileFormat.scala |   1 +
 23 files changed, 910 insertions(+), 45 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/71d61795/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java
index a9f9338..602e694 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimensionChunkFileBasedReaderV3.java
@@ -276,13 +276,19 @@ public class CompressedDimensionChunkFileBasedReaderV3 
extends AbstractChunkRead
 offset += pageMetadata.data_page_length;
 invertedIndexes = CarbonUtil
 .getUnCompressColumnIndex(pageMetadata.rowid_page_length, 
pageData, offset);
-// get the reverse index
-invertedIndexesReverse = 
CarbonUtil.getInvertedReverseIndex(invertedIndexes);
+if (vectorInfo == null) {
+  // get the reverse index
+  invertedIndexesReverse = 
CarbonUtil.getInvertedReverseIndex(invertedIndexes);
+} else {
+  vectorInfo.invertedIndex = invertedIndexes;
+}
   }
   BitSet nullBitSet = QueryUtil.getNullBitSet(pageMetadata.presence, 
this.compressor);
   ColumnPage decodedPage = decodeDimensionByMeta(pageMetadata, pageData, 
dataOffset,
   null != rawColumnPage.getLocalDictionary(), vectorInfo, nullBitSet);
-  decodedPage.setNullBits(nullBitSet);
+  if (decodedPage != null) {
+decodedPage.setNullBits(nullBitSet);
+  }
   return new ColumnPageWrapper(decodedPage, 
rawColumnPage.getLocalDictionary(), invertedIndexes,
   invertedIndexesReverse, isEncodedWithAdaptiveMeta(pageMetadata), 
isExplicitSorted);
 } else {

http://git-wip-us.apache.org/repos/asf/carbondata/blob/71d61795/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/safe/AbstractNonDictionaryVectorFiller.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/store/impl/s

carbondata git commit: [CARBONDATA-3013] Added support for pruning pages for vector direct fill.

2018-10-26 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 3d3b6ff16 -> e6d15da74


[CARBONDATA-3013] Added support for pruning pages for vector direct fill.

First, apply page level pruning using the min/max of each page and get the 
valid pages of blocklet.
Decompress only valid pages and fill the vector directly as mentioned in full 
scan query scenario.
For this purpose to prune pages first before decompressing the data, added new 
method inside a class FilterExecuter.

BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks) throws 
FilterUnsupportedException, IOException;

The above method reads the necessary column chunk metadata and prunes the pages 
as per the min/max meta.
Based on the pruned pages BlockletScannedResult decompresses and fills the 
column page data to vector as described in full scan in above mentioned PR .

This closes #2820


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/e6d15da7
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/e6d15da7
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/e6d15da7

Branch: refs/heads/master
Commit: e6d15da74a3a4d9c0af9d6886b811ac5bb2d89e9
Parents: 3d3b6ff
Author: ravipesala 
Authored: Tue Oct 16 14:53:14 2018 +0530
Committer: kumarvishal09 
Committed: Fri Oct 26 16:24:38 2018 +0530

--
 .../filter/executer/AndFilterExecuterImpl.java  |  15 ++
 .../executer/ExcludeFilterExecuterImpl.java |  10 ++
 .../filter/executer/FalseFilterExecutor.java|   8 +
 .../scan/filter/executer/FilterExecuter.java|   6 +
 .../ImplicitIncludeFilterExecutorImpl.java  |   9 +
 .../executer/IncludeFilterExecuterImpl.java |  87 --
 .../filter/executer/OrFilterExecuterImpl.java   |   9 +
 .../executer/RangeValueFilterExecuterImpl.java  |  38 +
 .../executer/RestructureEvaluatorImpl.java  |  10 ++
 .../executer/RowLevelFilterExecuterImpl.java|  10 ++
 .../RowLevelRangeGrtThanFiterExecuterImpl.java  |  85 --
 ...elRangeGrtrThanEquaToFilterExecuterImpl.java |  88 --
 ...velRangeLessThanEqualFilterExecuterImpl.java |  87 --
 ...RowLevelRangeLessThanFilterExecuterImpl.java |  86 --
 .../filter/executer/TrueFilterExecutor.java |   9 +
 .../scanner/impl/BlockletFilterScanner.java | 166 ++-
 16 files changed, 656 insertions(+), 67 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/e6d15da7/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/AndFilterExecuterImpl.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/AndFilterExecuterImpl.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/AndFilterExecuterImpl.java
index d743151..f0feb0e 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/AndFilterExecuterImpl.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/AndFilterExecuterImpl.java
@@ -50,6 +50,21 @@ public class AndFilterExecuterImpl implements 
FilterExecuter, ImplicitColumnFilt
 return leftFilters;
   }
 
+  @Override
+  public BitSet prunePages(RawBlockletColumnChunks rawBlockletColumnChunks)
+  throws FilterUnsupportedException, IOException {
+BitSet leftFilters = leftExecuter.prunePages(rawBlockletColumnChunks);
+if (leftFilters.isEmpty()) {
+  return leftFilters;
+}
+BitSet rightFilter = rightExecuter.prunePages(rawBlockletColumnChunks);
+if (rightFilter.isEmpty()) {
+  return rightFilter;
+}
+leftFilters.and(rightFilter);
+return leftFilters;
+  }
+
   @Override public boolean applyFilter(RowIntf value, int dimOrdinalMax)
   throws FilterUnsupportedException, IOException {
 return leftExecuter.applyFilter(value, dimOrdinalMax) &&

http://git-wip-us.apache.org/repos/asf/carbondata/blob/e6d15da7/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java
index 15a43c5..fc9fbae 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/filter/executer/ExcludeFilterExecuterImpl.java
@@ -25,6 +25,7 @@ import 
org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk;
 import org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk;
 import org.apache.carbondata.core.datastore.page.ColumnPa

[2/3] carbondata git commit: [CARBONDATA-3012] Added support for full scan queries for vector direct fill.

2018-10-25 Thread kumarvishal09
http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
index 39b8282..a760b64 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
@@ -124,8 +124,9 @@ public abstract class VarLengthColumnPageBase extends 
ColumnPage {
   /**
* Create a new column page for decimal page
*/
-  static ColumnPage newDecimalColumnPage(TableSpec.ColumnSpec columnSpec, 
byte[] lvEncodedBytes,
-  String compressorName) throws MemoryException {
+  static ColumnPage newDecimalColumnPage(ColumnPageEncoderMeta meta,
+  byte[] lvEncodedBytes) throws MemoryException {
+TableSpec.ColumnSpec columnSpec = meta.getColumnSpec();
 DecimalConverterFactory.DecimalConverter decimalConverter =
 
DecimalConverterFactory.INSTANCE.getDecimalConverter(columnSpec.getPrecision(),
 columnSpec.getScale());
@@ -133,10 +134,10 @@ public abstract class VarLengthColumnPageBase extends 
ColumnPage {
 if (size < 0) {
   return getLVBytesColumnPage(columnSpec, lvEncodedBytes,
   DataTypes.createDecimalType(columnSpec.getPrecision(), 
columnSpec.getScale()),
-  CarbonCommonConstants.INT_SIZE_IN_BYTE, compressorName);
+  CarbonCommonConstants.INT_SIZE_IN_BYTE, meta.getCompressorName());
 } else {
   // Here the size is always fixed.
-  return getDecimalColumnPage(columnSpec, lvEncodedBytes, size, 
compressorName);
+  return getDecimalColumnPage(meta, lvEncodedBytes, size);
 }
   }
 
@@ -158,8 +159,10 @@ public abstract class VarLengthColumnPageBase extends 
ColumnPage {
 lvLength, compressorName);
   }
 
-  private static ColumnPage getDecimalColumnPage(TableSpec.ColumnSpec 
columnSpec,
-  byte[] lvEncodedBytes, int size, String compressorName) throws 
MemoryException {
+  private static ColumnPage getDecimalColumnPage(ColumnPageEncoderMeta meta,
+  byte[] lvEncodedBytes, int size) throws MemoryException {
+TableSpec.ColumnSpec columnSpec = meta.getColumnSpec();
+String compressorName = meta.getCompressorName();
 TableSpec.ColumnSpec spec = TableSpec.ColumnSpec
 .newInstance(columnSpec.getFieldName(), DataTypes.INT, 
ColumnType.MEASURE);
 ColumnPage rowOffset = ColumnPage.newPage(
@@ -176,7 +179,7 @@ public abstract class VarLengthColumnPageBase extends 
ColumnPage {
 rowOffset.putInt(counter, offset);
 
 VarLengthColumnPageBase page;
-if (unsafe) {
+if (isUnsafeEnabled(meta)) {
   page = new UnsafeDecimalColumnPage(
   new ColumnPageEncoderMeta(columnSpec, 
columnSpec.getSchemaDataType(), compressorName),
   rowId);

http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java
index 4e491c5..d82a873 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageDecoder.java
@@ -18,9 +18,11 @@
 package org.apache.carbondata.core.datastore.page.encoding;
 
 import java.io.IOException;
+import java.util.BitSet;
 
 import org.apache.carbondata.core.datastore.page.ColumnPage;
 import org.apache.carbondata.core.memory.MemoryException;
+import org.apache.carbondata.core.scan.result.vector.ColumnVectorInfo;
 
 public interface ColumnPageDecoder {
 
@@ -29,6 +31,12 @@ public interface ColumnPageDecoder {
*/
   ColumnPage decode(byte[] input, int offset, int length) throws 
MemoryException, IOException;
 
+  /**
+   *  Apply decoding algorithm on input byte array and fill the vector here.
+   */
+  void decodeAndFillVector(byte[] input, int offset, int length, 
ColumnVectorInfo vectorInfo,
+  BitSet nullBits, boolean isLVEncoded) throws MemoryException, 
IOException;
+
   ColumnPage decode(byte[] input, int offset, int length, boolean isLVEncoded)
   throws MemoryException, IOException;
 }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/ColumnPageEncoderMeta.java
--
diff --git 

[1/3] carbondata git commit: [CARBONDATA-3012] Added support for full scan queries for vector direct fill.

2018-10-25 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master e0baa9b9f -> 3d3b6ff16


http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnarBatch.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnarBatch.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnarBatch.java
index 803715c..471f9b2 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnarBatch.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonColumnarBatch.java
@@ -56,7 +56,9 @@ public class CarbonColumnarBatch {
 actualSize = 0;
 rowCounter = 0;
 rowsFiltered = 0;
-Arrays.fill(filteredRows, false);
+if (filteredRows != null) {
+  Arrays.fill(filteredRows, false);
+}
 for (int i = 0; i < columnVectors.length; i++) {
   columnVectors[i].reset();
 }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonDictionary.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonDictionary.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonDictionary.java
index 50d2ac5..2147c43 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonDictionary.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/CarbonDictionary.java
@@ -27,4 +27,6 @@ public interface CarbonDictionary  {
   void setDictionaryUsed();
 
   byte[] getDictionaryValue(int index);
+
+  byte[][] getAllDictionaryValues();
 }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java
index 59117dd..d127728 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/ColumnVectorInfo.java
@@ -16,7 +16,10 @@
  */
 package org.apache.carbondata.core.scan.result.vector;
 
+import java.util.BitSet;
+
 import 
org.apache.carbondata.core.keygenerator.directdictionary.DirectDictionaryGenerator;
+import org.apache.carbondata.core.metadata.datatype.DecimalConverterFactory;
 import org.apache.carbondata.core.scan.filter.GenericQueryType;
 import org.apache.carbondata.core.scan.model.ProjectionDimension;
 import org.apache.carbondata.core.scan.model.ProjectionMeasure;
@@ -32,6 +35,8 @@ public class ColumnVectorInfo implements 
Comparable {
   public DirectDictionaryGenerator directDictionaryGenerator;
   public MeasureDataVectorProcessor.MeasureVectorFiller measureVectorFiller;
   public GenericQueryType genericQueryType;
+  public BitSet deletedRows;
+  public DecimalConverterFactory.DecimalConverter decimalConverter;
 
   @Override public int compareTo(ColumnVectorInfo o) {
 return ordinal - o.ordinal;

http://git-wip-us.apache.org/repos/asf/carbondata/blob/3d3b6ff1/core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java
index f8f663f..5dfd6ca 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/result/vector/impl/CarbonColumnVectorImpl.java
@@ -146,7 +146,7 @@ public class CarbonColumnVectorImpl implements 
CarbonColumnVector {
 }
   }
 
-  @Override public void putBytes(int rowId, byte[] value) {
+  @Override public void putByteArray(int rowId, byte[] value) {
 bytes[rowId] = value;
   }
 
@@ -160,7 +160,7 @@ public class CarbonColumnVectorImpl implements 
CarbonColumnVector {
 }
   }
 
-  @Override public void putBytes(int rowId, int offset, int length, byte[] 
value) {
+  @Override public void putByteArray(int rowId, int offset, int length, byte[] 
value) {
 bytes[rowId] = new byte[length];
 System.arraycopy(value, offset, bytes[rowId], 0, length);
   }
@@ -227,6 +227,31 @@ public class CarbonColumnVectorImpl implements 
CarbonColumnVector {
 }
   }
 
+  public Object getDataArray() {
+if (dataType == DataTypes.BOOLEAN || dataType == DataTypes.BYTE) {
+  return  byteArr;

[3/3] carbondata git commit: [CARBONDATA-3012] Added support for full scan queries for vector direct fill.

2018-10-25 Thread kumarvishal09
[CARBONDATA-3012] Added support for full scan queries for vector direct fill.

After decompressing the page in our V3 reader we can immediately fill the data 
to a vector without any condition checks inside loops.
So here complete column page data is set to column vector in a single batch and 
gives back data to Spark/Presto.
For this purpose, a new method is added in ColumnPageDecoder

ColumnPage decodeAndFillVector(byte[] input, int offset, int length, 
ColumnVectorInfo vectorInfo,
  BitSet nullBits, boolean isLVEncoded)
The above method takes vector fill it in a single loop without any checks 
inside loop.

And also added new method inside DimensionDataChunkStore

 void fillVector(int[] invertedIndex, int[] invertedIndexReverse, byte[] data,
  ColumnVectorInfo vectorInfo);
The above method takes vector fill it in a single loop without any checks 
inside loop.

This closes #2818


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/3d3b6ff1
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/3d3b6ff1
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/3d3b6ff1

Branch: refs/heads/master
Commit: 3d3b6ff1615e08131f6bcaea23dec0116a18081d
Parents: e0baa9b
Author: ravipesala 
Authored: Tue Oct 16 11:30:43 2018 +0530
Committer: kumarvishal09 
Committed: Thu Oct 25 22:24:24 2018 +0530

--
 .../chunk/impl/DimensionRawColumnChunk.java |  17 ++
 .../impl/FixedLengthDimensionColumnPage.java|  29 +-
 .../chunk/impl/MeasureRawColumnChunk.java   |  17 ++
 .../impl/VariableLengthDimensionColumnPage.java |  29 +-
 .../reader/DimensionColumnChunkReader.java  |   7 +
 .../chunk/reader/MeasureColumnChunkReader.java  |   7 +
 .../reader/dimension/AbstractChunkReader.java   |  11 +
 ...essedDimChunkFileBasedPageLevelReaderV3.java |   2 +-
 ...mpressedDimensionChunkFileBasedReaderV3.java |  78 +++--
 .../measure/AbstractMeasureChunkReader.java |  12 +
 ...CompressedMeasureChunkFileBasedReaderV3.java |  45 ++-
 ...essedMsrChunkFileBasedPageLevelReaderV3.java |   6 +-
 .../chunk/store/DimensionChunkStoreFactory.java |  16 +-
 .../chunk/store/DimensionDataChunkStore.java|   7 +
 .../impl/LocalDictDimensionDataChunkStore.java  |  25 ++
 .../safe/AbstractNonDictionaryVectorFiller.java | 282 ++
 .../SafeFixedLengthDimensionDataChunkStore.java |  51 +++-
 ...feVariableLengthDimensionDataChunkStore.java |  17 +-
 .../UnsafeAbstractDimensionDataChunkStore.java  |   6 +
 .../datastore/columnar/BlockIndexerStorage.java |   5 +-
 .../BlockIndexerStorageForNoDictionary.java |   3 +-
 .../columnar/BlockIndexerStorageForShort.java   |   3 +-
 .../core/datastore/columnar/UnBlockIndexer.java |   3 +
 .../core/datastore/impl/FileReaderImpl.java |   1 +
 .../core/datastore/page/ColumnPage.java | 130 
 .../page/ColumnPageValueConverter.java  |   3 +
 .../datastore/page/SafeDecimalColumnPage.java   |  25 ++
 .../datastore/page/VarLengthColumnPageBase.java |  17 +-
 .../page/encoding/ColumnPageDecoder.java|   8 +
 .../page/encoding/ColumnPageEncoderMeta.java|  11 +
 .../page/encoding/EncodingFactory.java  |  44 ++-
 .../adaptive/AdaptiveDeltaFloatingCodec.java|  82 +
 .../adaptive/AdaptiveDeltaIntegralCodec.java| 194 +++-
 .../adaptive/AdaptiveFloatingCodec.java |  84 +-
 .../adaptive/AdaptiveIntegralCodec.java | 157 ++
 .../encoding/compress/DirectCompressCodec.java  | 170 ++-
 .../datastore/page/encoding/rle/RLECodec.java   |   9 +
 .../DateDirectDictionaryGenerator.java  |   2 +-
 .../datatype/DecimalConverterFactory.java   |  91 +-
 .../carbondata/core/mutate/DeleteDeltaVo.java   |   4 +
 .../DictionaryBasedVectorResultCollector.java   | 112 +--
 .../executor/impl/AbstractQueryExecutor.java|  13 +
 .../scan/executor/infos/BlockExecutionInfo.java |  13 +
 .../core/scan/executor/util/QueryUtil.java  |   2 +-
 .../carbondata/core/scan/model/QueryModel.java  |   6 +-
 .../core/scan/result/BlockletScannedResult.java |  76 -
 .../scan/result/vector/CarbonColumnVector.java  |  18 +-
 .../scan/result/vector/CarbonColumnarBatch.java |   4 +-
 .../scan/result/vector/CarbonDictionary.java|   2 +
 .../scan/result/vector/ColumnVectorInfo.java|   5 +
 .../vector/impl/CarbonColumnVectorImpl.java |  67 -
 .../vector/impl/CarbonDictionaryImpl.java   |   3 +
 .../scan/scanner/impl/BlockletFullScanner.java  |   4 +-
 .../core/stats/QueryStatisticsModel.java|  13 +
 .../apache/carbondata/core/util/ByteUtil.java   |   8 +
 .../executer/IncludeFilterExecuterImplTest.java |   6 +-
 .../carbondata/core/util/CarbonUtilTest.java|   2 +-
 .../presto/CarbonColumnVectorWrapper.java   |  65 +++-
 .../presto/readers/SliceStreamReader.java   |   4 +-
 .../filterexpr

carbondata git commit: [CARBONDATA-3011] Add carbon property to configure vector based row pruning push down

2018-10-25 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 9578786b2 -> de6e98b08


[CARBONDATA-3011] Add carbon property to configure vector based row pruning 
push down

Added below configuration in carbon to enable or disable row filter push down 
for vector.

carbon.push.rowfilters.for.vector
When enabled complete row filters will be handled by carbon in case of vector.
If it is disabled then only page level pruning will be done by carbon and row 
level filtering will be done by spark for vector.
 There is no change in flow for non-vector based queries.

Default value is true

This closes #2818


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/de6e98b0
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/de6e98b0
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/de6e98b0

Branch: refs/heads/master
Commit: de6e98b085723811b0894e659c3c4ce9770f7ca2
Parents: 9578786
Author: ravipesala 
Authored: Tue Oct 16 10:32:18 2018 +0530
Committer: kumarvishal09 
Committed: Thu Oct 25 17:28:29 2018 +0530

--
 .../core/constants/CarbonCommonConstants.java   | 12 +++
 .../carbondata/core/scan/model/QueryModel.java  | 13 
 .../carbondata/core/util/CarbonProperties.java  |  8 ++
 .../carbondata/spark/rdd/CarbonScanRDD.scala| 17 +++-
 .../strategy/CarbonLateDecodeStrategy.scala | 82 +---
 5 files changed, 120 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/de6e98b0/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index fa5227b..72da3bd 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -1725,6 +1725,18 @@ public final class CarbonCommonConstants {
*/
   public static final String CARBON_WRITTEN_BY_APPNAME = 
"carbon.writtenby.app.name";
 
+  /**
+   * When enabled complete row filters will be handled by carbon in case of 
vector.
+   * If it is disabled then only page level pruning will be done by carbon and 
row level filtering
+   * will be done by spark for vector.
+   * There is no change in flow for non-vector based queries.
+   */
+  @CarbonProperty
+  public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR =
+  "carbon.push.rowfilters.for.vector";
+
+  public static final String CARBON_PUSH_ROW_FILTERS_FOR_VECTOR_DEFAULT = 
"true";
+
   
//
   // Unused constants and parameters start here
   
//

http://git-wip-us.apache.org/repos/asf/carbondata/blob/de6e98b0/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java 
b/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java
index d90c35e..0951da0 100644
--- a/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java
+++ b/core/src/main/java/org/apache/carbondata/core/scan/model/QueryModel.java
@@ -124,6 +124,11 @@ public class QueryModel {
 
   private boolean preFetchData = true;
 
+  /**
+   * It fills the vector directly from decoded column page with out any 
staging and conversions
+   */
+  private boolean isDirectVectorFill;
+
   private QueryModel(CarbonTable carbonTable) {
 tableBlockInfos = new ArrayList();
 invalidSegmentIds = new ArrayList<>();
@@ -406,6 +411,14 @@ public class QueryModel {
 this.preFetchData = preFetchData;
   }
 
+  public boolean isDirectVectorFill() {
+return isDirectVectorFill;
+  }
+
+  public void setDirectVectorFill(boolean directVectorFill) {
+isDirectVectorFill = directVectorFill;
+  }
+
   @Override
   public String toString() {
 return String.format("scan on table %s.%s, %d projection columns with 
filter (%s)",

http://git-wip-us.apache.org/repos/asf/carbondata/blob/de6e98b0/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java 
b/core/src/main/java/org/apache/carbondata/core/util/CarbonProperties.java
index e6d48e5..49d89e7 100644
--- a/core/src/main/java/org/apache

carbondata git commit: [CARBONDATA-2594] Do not add InvertedIndex in Encoding list for non-sort dimension column #2768

2018-10-04 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 8fbd4a5f5 -> 18fbdfc40


[CARBONDATA-2594] Do not add InvertedIndex in Encoding list for non-sort 
dimension column #2768

Not add InvertedIndex in Encoding list for non-sort dimension column

This closes #2768


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/18fbdfc4
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/18fbdfc4
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/18fbdfc4

Branch: refs/heads/master
Commit: 18fbdfc409dc14812c9f384c437a793e9293b32b
Parents: 8fbd4a5
Author: Jacky Li 
Authored: Wed Sep 26 21:31:35 2018 +0800
Committer: kumarvishal09 
Committed: Thu Oct 4 16:57:57 2018 +0530

--
 .../carbondata/core/metadata/schema/table/TableSchemaBuilder.java  | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/18fbdfc4/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java
 
b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java
index f1be5ca..b5ce725 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/TableSchemaBuilder.java
@@ -224,7 +224,7 @@ public class TableSchemaBuilder {
 }
   }
 }
-if (newColumn.isDimensionColumn()) {
+if (newColumn.isDimensionColumn() && newColumn.isSortColumn()) {
   newColumn.setUseInvertedIndex(true);
 }
 if (field.getDataType().isComplexType()) {



carbondata git commit: [HOTFIX] Fixed S3 metrics issue.

2018-10-03 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 2081bc87a -> 7d1fcb309


[HOTFIX] Fixed S3 metrics issue.

Problem: When data read from s3 it shows the data read as more than the size of 
carbon data total size.
Reason: It happens because carbondata uses dataInputStream.skip but in s3 
interface it cannot handle properly
it reads in a loop and reads more data than required.
Solution: Use FSDataInputStream.seek instead of skip to fix this issue.

This closes #2789


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/7d1fcb30
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/7d1fcb30
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/7d1fcb30

Branch: refs/heads/master
Commit: 7d1fcb3092a1e9da6c49f17c63c6217892e9e531
Parents: 2081bc8
Author: ravipesala 
Authored: Fri Sep 28 18:29:08 2018 +0530
Committer: kumarvishal09 
Committed: Wed Oct 3 16:08:49 2018 +0530

--
 .../datastore/filesystem/AbstractDFSCarbonFile.java |  7 +--
 .../apache/carbondata/core/reader/ThriftReader.java | 16 ++--
 2 files changed, 11 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/7d1fcb30/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
index b1e476b..c764430 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
@@ -327,8 +327,11 @@ public abstract class AbstractDFSCarbonFile implements 
CarbonFile {
   CompressionCodec codec = new 
CompressionCodecFactory(hadoopConf).getCodecByName(codecName);
   inputStream = codec.createInputStream(inputStream);
 }
-
-return new DataInputStream(new BufferedInputStream(inputStream));
+if (bufferSize <= 0 && inputStream instanceof FSDataInputStream) {
+  return (DataInputStream) inputStream;
+} else {
+  return new DataInputStream(new BufferedInputStream(inputStream));
+}
   }
 
   /**

http://git-wip-us.apache.org/repos/asf/carbondata/blob/7d1fcb30/core/src/main/java/org/apache/carbondata/core/reader/ThriftReader.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/reader/ThriftReader.java 
b/core/src/main/java/org/apache/carbondata/core/reader/ThriftReader.java
index 48d8345..f5ecda6 100644
--- a/core/src/main/java/org/apache/carbondata/core/reader/ThriftReader.java
+++ b/core/src/main/java/org/apache/carbondata/core/reader/ThriftReader.java
@@ -25,6 +25,7 @@ import org.apache.carbondata.core.datastore.impl.FileFactory;
 import org.apache.carbondata.core.util.CarbonUtil;
 
 import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
 import org.apache.thrift.TBase;
 import org.apache.thrift.TException;
 import org.apache.thrift.protocol.TCompactProtocol;
@@ -36,10 +37,6 @@ import org.apache.thrift.transport.TIOStreamTransport;
  */
 public class ThriftReader {
   /**
-   * buffer size
-   */
-  private static final int bufferSize = 2048;
-  /**
* File containing the objects.
*/
   private String fileName;
@@ -101,7 +98,7 @@ public class ThriftReader {
   public void open() throws IOException {
 Configuration conf = configuration != null ? configuration : 
FileFactory.getConfiguration();
 FileFactory.FileType fileType = FileFactory.getFileType(fileName);
-dataInputStream = FileFactory.getDataInputStream(fileName, fileType, 
bufferSize, conf);
+dataInputStream = FileFactory.getDataInputStream(fileName, fileType, conf);
 binaryIn = new TCompactProtocol(new TIOStreamTransport(dataInputStream));
   }
 
@@ -109,7 +106,9 @@ public class ThriftReader {
* This method will set the position of stream from where data has to be read
*/
   public void setReadOffset(long bytesToSkip) throws IOException {
-if (dataInputStream.skip(bytesToSkip) != bytesToSkip) {
+if (dataInputStream instanceof FSDataInputStream) {
+  ((FSDataInputStream)dataInputStream).seek(bytesToSkip);
+} else if (dataInputStream.skip(bytesToSkip) != bytesToSkip) {
   throw new IOException("It doesn't set the offset properly");
 }
   }
@@ -118,10 +117,7 @@ public class ThriftReader {
* Checks if another objects is available by attempting to read another byte 
from the stream.
*/
   public boolean hasNext() throws IOException {
-  

carbondata git commit: [CARBONDATA-2978] Fixed JVM crash issue when insert into carbon table from other carbon table

2018-09-28 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master c01636163 -> 9ae91cc5a


[CARBONDATA-2978] Fixed JVM crash issue when insert into carbon table from 
other carbon table

Problem:
When data is inserted from one carbon to other carbon table and unsafe load and 
query is enabled then JVM crash is happening.
Reason: When insert happens from one carbon table another table it uses same 
task and thread so it
gets the same taskid and at the unsafe manager tries to release all memory 
acquired by the task even though load happens on the task.

Solution:
Check the listeners and ignore cache clearing.

This closes #2773


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/9ae91cc5
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/9ae91cc5
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/9ae91cc5

Branch: refs/heads/master
Commit: 9ae91cc5a9d683ef54550cfe7e65c4d63d5e5a24
Parents: c016361
Author: ravipesala 
Authored: Wed Sep 26 23:04:59 2018 +0530
Committer: kumarvishal09 
Committed: Fri Sep 28 19:51:06 2018 +0530

--
 .../hadoop/api/CarbonTableOutputFormat.java | 35 +
 .../InsertIntoNonCarbonTableTestCase.scala  | 79 +++-
 .../carbondata/spark/rdd/CarbonScanRDD.scala| 76 ---
 .../rdd/InsertTaskCompletionListener.scala  |  4 +-
 .../spark/rdd/QueryTaskCompletionListener.scala |  4 +-
 .../datasources/SparkCarbonFileFormat.scala | 23 +-
 .../CarbonTaskCompletionListener.scala  | 72 ++
 7 files changed, 246 insertions(+), 47 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/9ae91cc5/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
--
diff --git 
a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
 
b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
index 28817e9..762983b 100644
--- 
a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
+++ 
b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
@@ -424,6 +424,8 @@ public class CarbonTableOutputFormat extends 
FileOutputFormathttp://git-wip-us.apache.org/repos/asf/carbondata/blob/9ae91cc5/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/insertQuery/InsertIntoNonCarbonTableTestCase.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/insertQuery/InsertIntoNonCarbonTableTestCase.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/insertQuery/InsertIntoNonCarbonTableTestCase.scala
index a745672..a3fb11c 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/insertQuery/InsertIntoNonCarbonTableTestCase.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/insertQuery/InsertIntoNonCarbonTableTestCase.scala
@@ -18,10 +18,13 @@
  */
 package org.apache.carbondata.spark.testsuite.insertQuery
 
-import org.apache.spark.sql.Row
+import org.apache.spark.sql.{Row, SaveMode}
 import org.apache.spark.sql.test.util.QueryTest
 import org.scalatest.BeforeAndAfterAll
 
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
 
 class InsertIntoNonCarbonTableTestCase extends QueryTest with 
BeforeAndAfterAll {
   override def beforeAll {
@@ -64,6 +67,8 @@ class InsertIntoNonCarbonTableTestCase extends QueryTest with 
BeforeAndAfterAll
   
"Latest_webTypeDataVerNumber,Latest_operatorsVersion,Latest_phonePADPartitionedVersions,"
 +
   "Latest_operatorId,gamePointDescription,gamePointId,contractNumber', " +
   "'bad_records_logger_enable'='false','bad_records_action'='FORCE')")
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_UNSAFE_IN_QUERY_EXECUTION,
 "true")
+
CarbonProperties.getInstance().addProperty(CarbonCommonConstants.ENABLE_UNSAFE_COLUMN_PAGE,
 "true")
   }
 
   test("insert into hive") {
@@ -102,7 +107,79 @@ class InsertIntoNonCarbonTableTestCase extends QueryTest 
with BeforeAndAfterAll
 sql("drop table thive_cond")
   }
 
+  test("jvm crash when insert data from datasource table to session table") {
+val spark = sqlContext.sparkSession
+import spark.implicits._
+
+import scala.util.Random
+val r = new Random()
+val df = spark.sparkContext.parallelize(1 to 10)
+  .map(x => (r.nextInt(10), "n

carbondata git commit: [CARBONDATA-2970]lock object creation fix for viewFS

2018-09-27 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 5d17ff40b -> 1b4109d5b


[CARBONDATA-2970]lock object creation fix for viewFS

Problem
when default fs is set to ViewFS then the drop table and load fails with 
exception saying failed to get lock like meta.lock, tablestatus.lock.
This is because when getting locktypeObject we wre not checking for viewfs and 
we are returning it as local file system and failes while acquiring

Solution
Check for viewFS also when trying to get the lock object

This closes #2762


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/1b4109d5
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/1b4109d5
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/1b4109d5

Branch: refs/heads/master
Commit: 1b4109d5b2badc0c10d5522502bd799c6325263c
Parents: 5d17ff4
Author: akashrn5 
Authored: Tue Sep 25 18:59:04 2018 +0530
Committer: kumarvishal09 
Committed: Thu Sep 27 16:46:11 2018 +0530

--
 .../java/org/apache/carbondata/core/locks/CarbonLockFactory.java  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/1b4109d5/core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java 
b/core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java
index 91677a6..79bad6c 100644
--- a/core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java
+++ b/core/src/main/java/org/apache/carbondata/core/locks/CarbonLockFactory.java
@@ -71,7 +71,8 @@ public class CarbonLockFactory {
   lockTypeConfigured = CarbonCommonConstants.CARBON_LOCK_TYPE_S3;
   return new S3FileLock(absoluteLockPath,
   lockFile);
-} else if 
(absoluteLockPath.startsWith(CarbonCommonConstants.HDFSURL_PREFIX)) {
+} else if 
(absoluteLockPath.startsWith(CarbonCommonConstants.HDFSURL_PREFIX) || 
absoluteLockPath
+.startsWith(CarbonCommonConstants.VIEWFSURL_PREFIX)) {
   lockTypeConfigured = CarbonCommonConstants.CARBON_LOCK_TYPE_HDFS;
   return new HdfsFileLock(absoluteLockPath, lockFile);
 } else {



carbondata git commit: [CARBONDATA-2968] Single pass load fails 2nd time in Spark submit execution due to port binding error

2018-09-26 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master e07df44a1 -> 13ecc9e7a


[CARBONDATA-2968] Single pass load fails 2nd time in Spark submit execution due 
to port binding error

Problem : In secure cluster setup, single pass load is failing in spark-submit 
after using the beeline.
Solution: It was happening because port was not getting updated and was not 
looking for the next empty port.
port variable was not changing.So modified that part and added log to diplay 
the port number.

This closes #2760


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/13ecc9e7
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/13ecc9e7
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/13ecc9e7

Branch: refs/heads/master
Commit: 13ecc9e7a0a42ebf2f8417814c20474f3ce489f1
Parents: e07df44
Author: shardul-cr7 
Authored: Tue Sep 25 19:55:19 2018 +0530
Committer: kumarvishal09 
Committed: Wed Sep 26 14:16:21 2018 +0530

--
 .../core/dictionary/server/NonSecureDictionaryServer.java  | 3 ++-
 .../spark/dictionary/server/SecureDictionaryServer.java| 6 --
 2 files changed, 6 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/13ecc9e7/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServer.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServer.java
 
b/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServer.java
index 95f3d69..dc2d211 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServer.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/dictionary/server/NonSecureDictionaryServer.java
@@ -109,6 +109,7 @@ public class NonSecureDictionaryServer extends 
AbstractDictionaryServer
 });
 bootstrap.childOption(ChannelOption.SO_KEEPALIVE, true);
 String hostToBind = findLocalIpAddress(LOGGER);
+//iteratively listening to newports
 InetSocketAddress address = hostToBind == null ?
 new InetSocketAddress(newPort) :
 new InetSocketAddress(hostToBind, newPort);
@@ -119,7 +120,7 @@ public class NonSecureDictionaryServer extends 
AbstractDictionaryServer
 this.host = hostToBind;
 break;
   } catch (Exception e) {
-LOGGER.error(e, "Dictionary Server Failed to bind to port:");
+LOGGER.error(e, "Dictionary Server Failed to bind to port:" + newPort);
 if (i == 9) {
   throw new RuntimeException("Dictionary Server Could not bind to any 
port");
 }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/13ecc9e7/integration/spark-common/src/main/java/org/apache/carbondata/spark/dictionary/server/SecureDictionaryServer.java
--
diff --git 
a/integration/spark-common/src/main/java/org/apache/carbondata/spark/dictionary/server/SecureDictionaryServer.java
 
b/integration/spark-common/src/main/java/org/apache/carbondata/spark/dictionary/server/SecureDictionaryServer.java
index f4948c4..995e520 100644
--- 
a/integration/spark-common/src/main/java/org/apache/carbondata/spark/dictionary/server/SecureDictionaryServer.java
+++ 
b/integration/spark-common/src/main/java/org/apache/carbondata/spark/dictionary/server/SecureDictionaryServer.java
@@ -143,14 +143,16 @@ public class SecureDictionaryServer extends 
AbstractDictionaryServer implements
 TransportServerBootstrap bootstrap =
 new SaslServerBootstrap(transportConf, securityManager);
 String host = findLocalIpAddress(LOGGER);
-context.createServer(host, port, 
Lists.newArrayList(bootstrap));
+//iteratively listening to newports
+context
+.createServer(host, newPort, 
Lists.newArrayList(bootstrap));
 LOGGER.audit("Dictionary Server started, Time spent " + 
(System.currentTimeMillis() - start)
 + " Listening on port " + newPort);
 this.port = newPort;
 this.host = host;
 break;
   } catch (Exception e) {
-LOGGER.error(e, "Dictionary Server Failed to bind to port:");
+LOGGER.error(e, "Dictionary Server Failed to bind to port: " + 
newPort);
 if (i == 9) {
   throw new RuntimeException("Dictionary Server Could not bind to any 
port");
 }



carbondata git commit: [CARBONDATA-2962]Even after carbon file is copied to targetfolder(local/hdfs), carbon files is not deleted from temp directory

2018-09-26 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 2ab2254be -> 49f67153a


[CARBONDATA-2962]Even after carbon file is copied to targetfolder(local/hdfs), 
carbon files is not deleted from temp directory

Problem:
Even after carbon file is copied to targetfolder(local/hdfs), carbon files is 
not deleted from temp directory.
Solution:
After copying Carbon data and index files from temp directory, delete those 
files.

This closes #2752


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/49f67153
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/49f67153
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/49f67153

Branch: refs/heads/master
Commit: 49f67153a21e5a0cb5705adeb0f056eef4d3ed25
Parents: 2ab2254
Author: Indhumathi27 
Authored: Mon Sep 24 12:28:47 2018 +0530
Committer: kumarvishal09 
Committed: Wed Sep 26 12:35:24 2018 +0530

--
 .../store/writer/AbstractFactDataWriter.java| 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/49f67153/processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
--
diff --git 
a/processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
 
b/processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
index ad0e8e0..4afb3ef 100644
--- 
a/processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
+++ 
b/processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java
@@ -270,12 +270,18 @@ public abstract class AbstractFactDataWriter implements 
CarbonFactDataWriter {
 notifyDataMapBlockEnd();
 CarbonUtil.closeStreams(this.fileOutputStream, this.fileChannel);
 if (!enableDirectlyWriteData2Hdfs) {
-  if (copyInCurrentThread) {
-CarbonUtil.copyCarbonDataFileToCarbonStorePath(carbonDataFileTempPath,
-model.getCarbonDataDirectoryPath(), fileSizeInBytes);
-  } else {
-executorServiceSubmitList.add(executorService.submit(
-new CompleteHdfsBackendThread(carbonDataFileTempPath)));
+  try {
+if (copyInCurrentThread) {
+  
CarbonUtil.copyCarbonDataFileToCarbonStorePath(carbonDataFileTempPath,
+  model.getCarbonDataDirectoryPath(), fileSizeInBytes);
+  FileFactory
+  .deleteFile(carbonDataFileTempPath, 
FileFactory.getFileType(carbonDataFileTempPath));
+} else {
+  executorServiceSubmitList
+  .add(executorService.submit(new 
CompleteHdfsBackendThread(carbonDataFileTempPath)));
+}
+  } catch (IOException e) {
+LOGGER.error("Failed to delete carbondata file from temp location" + 
e.getMessage());
   }
 }
   }
@@ -405,6 +411,7 @@ public abstract class AbstractFactDataWriter implements 
CarbonFactDataWriter {
   CarbonUtil
   .copyCarbonDataFileToCarbonStorePath(indexFileName, 
model.getCarbonDataDirectoryPath(),
   fileSizeInBytes);
+  FileFactory.deleteFile(indexFileName, 
FileFactory.getFileType(indexFileName));
 }
   }
 
@@ -470,6 +477,7 @@ public abstract class AbstractFactDataWriter implements 
CarbonFactDataWriter {
 public Void call() throws Exception {
   CarbonUtil.copyCarbonDataFileToCarbonStorePath(fileName, 
model.getCarbonDataDirectoryPath(),
   fileSizeInBytes);
+  FileFactory.deleteFile(fileName, FileFactory.getFileType(fileName));
   return null;
 }
   }



carbondata git commit: [CARBONDATA-2960] SDK Reader fix with projection columns

2018-09-25 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master e3eb03054 -> 786db2171


[CARBONDATA-2960] SDK Reader fix with projection columns

SDK Reader was not working when all projection columns were given.
Added exception for Complex child projections too.


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/786db217
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/786db217
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/786db217

Branch: refs/heads/master
Commit: 786db217120e1d341e9a7d00ce9576dccd1d96af
Parents: e3eb030
Author: Manish Nalla 
Authored: Fri Sep 21 19:24:01 2018 +0530
Committer: kumarvishal09 
Committed: Tue Sep 25 12:38:52 2018 +0530

--
 .../hadoop/api/CarbonInputFormat.java   | 13 -
 ...tNonTransactionalCarbonTableForMapType.scala | 53 
 .../sdk/file/CarbonReaderBuilder.java   |  8 +++
 3 files changed, 73 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/786db217/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java
--
diff --git 
a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java 
b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java
index 8183335..db93cbd 100644
--- 
a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java
+++ 
b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java
@@ -775,9 +775,20 @@ m filterExpression
   public String[] projectAllColumns(CarbonTable carbonTable) {
 List colList = 
carbonTable.getTableInfo().getFactTable().getListOfColumns();
 List projectColumn = new ArrayList<>();
+// childCount will recursively count the number of children for any parent
+// complex type and add just the parent column name while skipping the 
child columns.
+int childDimCount = 0;
 for (ColumnSchema cols : colList) {
   if (cols.getSchemaOrdinal() != -1) {
-projectColumn.add(cols.getColumnName());
+if (childDimCount == 0) {
+  projectColumn.add(cols.getColumnName());
+}
+if (childDimCount > 0) {
+  childDimCount--;
+}
+if (cols.getDataType().isComplexType()) {
+  childDimCount += cols.getNumberOfChild();
+}
   }
 }
 String[] projectionColumns = new String[projectColumn.size()];

http://git-wip-us.apache.org/repos/asf/carbondata/blob/786db217/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTableForMapType.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTableForMapType.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTableForMapType.scala
index a6bc224..b060ec1 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTableForMapType.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTableForMapType.scala
@@ -20,15 +20,20 @@ package org.apache.carbondata.spark.testsuite.createTable
 import java.io.File
 
 import org.apache.commons.io.FileUtils
+import org.apache.hadoop.conf.Configuration
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.test.util.QueryTest
 import org.scalatest.BeforeAndAfterAll
 
+import org.apache.carbondata.sdk.file.CarbonReader
+
 /**
  * test cases for SDK complex map data type support
  */
 class TestNonTransactionalCarbonTableForMapType extends QueryTest with 
BeforeAndAfterAll {
 
+  private val conf: Configuration = new Configuration(false)
+
   private val nonTransactionalCarbonTable = new TestNonTransactionalCarbonTable
   private val writerPath = nonTransactionalCarbonTable.writerPath
 
@@ -401,6 +406,54 @@ class TestNonTransactionalCarbonTableForMapType extends 
QueryTest with BeforeAnd
 dropSchema
   }
 
+  test("SDK Reader Without Projection Columns"){
+deleteDirectory(writerPath)
+val mySchema =
+  """
+|{
+|  "name": "address",
+|  "type": "record",
+|  "fields": [
+|{
+|  "name": "name",
+|  "type": "string"
+|},
+|{
+|  "name": "age",
+|  "type": "

carbondata git commit: [CARBONDATA-2954]Fix error when create external table command fired if path already exists

2018-09-24 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 25d949cfa -> 759cb31f6


[CARBONDATA-2954]Fix error when create external table command fired if path 
already exists

Problem : Creating a external table and providing a valid location having some 
empty directory and .carbondata
files was giving "operation not allowed: invalid datapath provided" error.

Solution: It was happening because if the location was having some empty 
directory getFilePathExternalFilePath method in
carbonutil.java was returning null due to the presence of empty directory.So 
made a slight modification to prevent this problem.

This closes #2739


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/759cb31f
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/759cb31f
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/759cb31f

Branch: refs/heads/master
Commit: 759cb31f64c22b1dd67b1b90e2edb89380f36094
Parents: 25d949c
Author: shardul-cr7 
Authored: Thu Sep 20 19:42:54 2018 +0530
Committer: kumarvishal09 
Committed: Mon Sep 24 15:13:55 2018 +0530

--
 .../core/metadata/schema/table/CarbonTable.java |  8 +++-
 .../org/apache/carbondata/core/util/CarbonUtil.java |  9 -
 .../createTable/TestNonTransactionalCarbonTable.scala   | 10 ++
 .../apache/carbondata/sdk/file/CarbonReaderTest.java| 12 
 4 files changed, 37 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/759cb31f/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
 
b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
index c606063..3d04cca 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java
@@ -261,7 +261,13 @@ public class CarbonTable implements Serializable {
 CarbonFile[] carbonFiles = tablePath.listFiles();
 for (CarbonFile carbonFile : carbonFiles) {
   if (carbonFile.isDirectory()) {
-return getFirstIndexFile(carbonFile);
+// if the list has directories that doesn't contain index files,
+// continue checking other files/directories in the list.
+if (getFirstIndexFile(carbonFile) == null) {
+  continue;
+} else {
+  return getFirstIndexFile(carbonFile);
+}
   } else if 
(carbonFile.getName().endsWith(CarbonTablePath.INDEX_FILE_EXT)) {
 return carbonFile;
   }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/759cb31f/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java
--
diff --git a/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java 
b/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java
index 5a85b14..03054bf 100644
--- a/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java
+++ b/core/src/main/java/org/apache/carbondata/core/util/CarbonUtil.java
@@ -2230,9 +2230,16 @@ public final class CarbonUtil {
   if (dataFile.getName().endsWith(CarbonCommonConstants.FACT_FILE_EXT)) {
 return dataFile.getAbsolutePath();
   } else if (dataFile.isDirectory()) {
-return getFilePathExternalFilePath(dataFile.getAbsolutePath(), 
configuration);
+// if the list has directories that doesn't contain data files,
+// continue checking other files/directories in the list.
+if (getFilePathExternalFilePath(dataFile.getAbsolutePath(), 
configuration) == null) {
+  continue;
+} else {
+  return getFilePathExternalFilePath(dataFile.getAbsolutePath(), 
configuration);
+}
   }
 }
+//returning null only if the path doesn't have data files.
 return null;
   }
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/759cb31f/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
index b80a2f2..f6d12ab 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestNonTransactionalCarbonTable.scala
+++ 
b/inte

carbondata git commit: [HOTFIX] Fix partition filter slow issue #2740

2018-09-24 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master ed8564421 -> 25d949cfa


[HOTFIX] Fix partition filter slow issue #2740

Problem: In FileSourceScanExec it lists all the files of partitions from 
CatalogFileIndex , it causes another job creation to list files per each query.
Solution: Make the CatalogFileIndex as we don't want any list files. so make 
the CatalogFileIndex as dummy.

This closes #2740


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/25d949cf
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/25d949cf
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/25d949cf

Branch: refs/heads/master
Commit: 25d949cfa82c9a29fe0e54ddbe54e890cc865b7f
Parents: ed85644
Author: ravipesala 
Authored: Thu Sep 20 21:21:47 2018 +0530
Committer: kumarvishal09 
Committed: Mon Sep 24 12:54:19 2018 +0530

--
 .../execution/datasources/CarbonFileIndex.scala  | 14 ++
 .../strategy/CarbonLateDecodeStrategy.scala  | 15 ++-
 2 files changed, 24 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/25d949cf/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala
--
diff --git 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala
 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala
index 3a650ec..c57528f 100644
--- 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala
+++ 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala
@@ -51,6 +51,10 @@ class CarbonFileIndex(
 fileIndex: FileIndex)
   extends FileIndex with AbstractCarbonFileIndex {
 
+  // When this flag is set it just returns empty files during pruning. It is 
needed for carbon
+  // session partition flow as we handle directly through datamap pruining.
+  private var actAsDummy = false
+
   override def rootPaths: Seq[Path] = fileIndex.rootPaths
 
   override def inputFiles: Array[String] = fileIndex.inputFiles
@@ -70,6 +74,9 @@ class CarbonFileIndex(
*/
   override def listFiles(partitionFilters: Seq[Expression],
   dataFilters: Seq[Expression]): Seq[PartitionDirectory] = {
+if (actAsDummy) {
+  return Seq.empty
+}
 val method = fileIndex.getClass.getMethods.find(_.getName == 
"listFiles").get
 val directories =
   method.invoke(
@@ -143,11 +150,18 @@ class CarbonFileIndex(
   }
 
   override def listFiles(filters: Seq[Expression]): Seq[PartitionDirectory] = {
+if (actAsDummy) {
+  return Seq.empty
+}
 val method = fileIndex.getClass.getMethods.find(_.getName == 
"listFiles").get
 val directories =
   method.invoke(fileIndex, filters).asInstanceOf[Seq[PartitionDirectory]]
 prune(filters, directories)
   }
+
+  def setDummy(actDummy: Boolean): Unit = {
+actAsDummy = actDummy
+  }
 }
 
 /**

http://git-wip-us.apache.org/repos/asf/carbondata/blob/25d949cf/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
--
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
index 8f128fe..f0184cd 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
@@ -34,7 +34,7 @@ import org.apache.spark.sql.optimizer.{CarbonDecoderRelation, 
CarbonFilters}
 import org.apache.spark.sql.sources.{BaseRelation, Filter}
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.CarbonExpressions.{MatchCast => Cast}
-import 
org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil
+import org.apache.spark.sql.carbondata.execution.datasources.{CarbonFileIndex, 
CarbonSparkDataSourceUtil}
 import org.apache.spark.util.CarbonReflectionUtils
 
 import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
@@ -704,11 +704,16 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 val sparkSession = relation.relation.sqlContext.sparkSession
 relation.catalogTable match {
   case Some(catalogTable) =>
-HadoopFsRelation(
+val fi

carbondata git commit: [CARBONDATA-2958] Compaction with CarbonProperty 'carbon.enable.page.level.reader.in.compaction' enabled fails as Compressor is null

2018-09-24 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 8320918e5 -> ed8564421


[CARBONDATA-2958] Compaction with CarbonProperty 
'carbon.enable.page.level.reader.in.compaction' enabled fails as Compressor is 
null

Problem:
When CarbonProperty 'carbon.enable.page.level.reader.in.compaction' is enabled, 
compaction fails throwing Null Pointer Exception as compressor is Null
Solution:
Set compressor from pageMetaData

This closes #2745


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/ed856442
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/ed856442
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/ed856442

Branch: refs/heads/master
Commit: ed856442166a96d1b414336945fb1dbc1d514c4a
Parents: 8320918
Author: Indhumathi27 
Authored: Fri Sep 21 15:24:39 2018 +0530
Committer: kumarvishal09 
Committed: Mon Sep 24 12:24:28 2018 +0530

--
 ...essedDimChunkFileBasedPageLevelReaderV3.java |  7 +++
 ...andardPartitionTableCompactionTestCase.scala | 22 
 2 files changed, 29 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/ed856442/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimChunkFileBasedPageLevelReaderV3.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimChunkFileBasedPageLevelReaderV3.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimChunkFileBasedPageLevelReaderV3.java
index e69984b..6efaf8a 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimChunkFileBasedPageLevelReaderV3.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/chunk/reader/dimension/v3/CompressedDimChunkFileBasedPageLevelReaderV3.java
@@ -23,8 +23,10 @@ import java.nio.ByteBuffer;
 import org.apache.carbondata.core.datastore.FileReader;
 import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage;
 import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk;
+import org.apache.carbondata.core.datastore.compression.CompressorFactory;
 import org.apache.carbondata.core.memory.MemoryException;
 import org.apache.carbondata.core.metadata.blocklet.BlockletInfo;
+import org.apache.carbondata.core.util.CarbonMetadataUtil;
 import org.apache.carbondata.core.util.CarbonUtil;
 import org.apache.carbondata.format.DataChunk2;
 import org.apache.carbondata.format.DataChunk3;
@@ -146,6 +148,11 @@ public class CompressedDimChunkFileBasedPageLevelReaderV3
 DataChunk3 dataChunk3 = dimensionRawColumnChunk.getDataChunkV3();
 
 pageMetadata = dataChunk3.getData_chunk_list().get(pageNumber);
+
+if (compressor == null) {
+  this.compressor = CompressorFactory.getInstance().getCompressor(
+  
CarbonMetadataUtil.getCompressorNameFromChunkMeta(pageMetadata.getChunk_meta()));
+}
 // calculating the start point of data
 // as buffer can contain multiple column data, start point will be 
datachunkoffset +
 // data chunk length + page offset

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ed856442/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableCompactionTestCase.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableCompactionTestCase.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableCompactionTestCase.scala
index 33e761f..23c2aa0 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableCompactionTestCase.scala
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/standardpartition/StandardPartitionTableCompactionTestCase.scala
@@ -16,6 +16,7 @@
  */
 package org.apache.carbondata.spark.testsuite.standardpartition
 
+import org.apache.spark.sql.Row
 import org.apache.spark.sql.test.util.QueryTest
 import org.scalatest.BeforeAndAfterAll
 
@@ -183,6 +184,27 @@ class StandardPartitionTableCompactionTestCase extends 
QueryTest with BeforeAndA
 sql(s"""alter table compactionupdatepartition compact 'major'""").collect
   }
 
+  test("test compaction when 'carbon.enable.page.level.reader.in.compaction' 
is set to true") {
+sql("DROP TABLE IF EXISTS originTable")
+  

carbondata git commit: [CARBONDATA-2950]alter add column of hive table fails from carbon for spark versions above 2.1

2018-09-21 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master f962e41b7 -> 8320918e5


[CARBONDATA-2950]alter add column of hive table fails from carbon for spark 
versions above 2.1

Problem:
spark does not support add columns in spark-2.1, but it is supported in 2.2 and 
above
when add column is fired for hive table in carbon session, for spark -version 
above 2.1, it throws error as unsupported operation on hive table

Solution:
when alter add columns for hive is fired for spark-2.2 and above, it should not 
throw any exception and it should pass

This closes #2735


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/8320918e
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/8320918e
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/8320918e

Branch: refs/heads/master
Commit: 8320918e55b393fedc946e4543843a72712d9199
Parents: f962e41
Author: akashrn5 
Authored: Wed Sep 19 19:51:39 2018 +0530
Committer: kumarvishal09 
Committed: Fri Sep 21 21:55:06 2018 +0530

--
 .../sdv/generated/AlterTableTestCase.scala  | 18 -
 .../lucene/LuceneFineGrainDataMapSuite.scala| 27 
 .../org/apache/carbondata/spark/util/Util.java  |  2 +-
 .../spark/util/CarbonReflectionUtils.scala  | 15 +++
 .../sql/execution/strategy/DDLStrategy.scala| 21 +--
 5 files changed, 52 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/8320918e/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/AlterTableTestCase.scala
--
diff --git 
a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/AlterTableTestCase.scala
 
b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/AlterTableTestCase.scala
index 4e53ea3..90fa602 100644
--- 
a/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/AlterTableTestCase.scala
+++ 
b/integration/spark-common-cluster-test/src/test/scala/org/apache/carbondata/cluster/sdv/generated/AlterTableTestCase.scala
@@ -18,12 +18,14 @@
 
 package org.apache.carbondata.cluster.sdv.generated
 
+import org.apache.spark.SPARK_VERSION
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.common.util._
-import org.apache.spark.sql.test.TestQueryExecutor
+import org.apache.spark.util.SparkUtil
 import org.scalatest.BeforeAndAfterAll
 
 import org.apache.carbondata.common.constants.LoggerAction
+import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
 import org.apache.carbondata.core.constants.CarbonCommonConstants
 import org.apache.carbondata.core.util.CarbonProperties
 
@@ -1000,6 +1002,20 @@ class AlterTableTestCase extends QueryTest with 
BeforeAndAfterAll {
  sql(s"""drop table  if exists uniqdata59""").collect
   }
 
+  test("Alter table add column for hive table for spark version above 2.1") {
+sql("drop table if exists alter_hive")
+sql("create table alter_hive(name string)")
+if(SPARK_VERSION.startsWith("2.1")) {
+  val exception = intercept[MalformedCarbonCommandException] {
+sql("alter table alter_hive add columns(add string)")
+  }
+  assert(exception.getMessage.contains("Unsupported alter operation on 
hive table"))
+} else if (SparkUtil.isSparkVersionXandAbove("2.2")) {
+  sql("alter table alter_hive add columns(add string)")
+  sql("insert into alter_hive select 'abc','banglore'")
+}
+  }
+
   val prop = CarbonProperties.getInstance()
   val p1 = prop.getProperty("carbon.horizontal.compaction.enable", 
CarbonCommonConstants.defaultIsHorizontalCompactionEnabled)
   val p2 = prop.getProperty("carbon.horizontal.update.compaction.threshold", 
CarbonCommonConstants.DEFAULT_UPDATE_DELTAFILE_COUNT_THRESHOLD_IUD_COMPACTION)

http://git-wip-us.apache.org/repos/asf/carbondata/blob/8320918e/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala
index 0c6134b..2e3019a 100644
--- 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/datamap/lucene/LuceneFineGrainDataMapSuite.scala
+++ 
b/integration/spark-common-test/src/test/scala/

carbondata git commit: [CARBONDATA-2953]fixed dataload failure with sort columns and query wrong result from other session

2018-09-21 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master edfcdca0a -> f962e41b7


[CARBONDATA-2953]fixed dataload failure with sort columns and query wrong 
result from other session

Problem:
when dataload is done with sort columns, it fails with following exeptions
when two sessions are running in parallel, the follow below steps in session1
drop table
create table
load data to table
follow below step in session2
query on table(select * from table limit 1), then the query returns null result 
instead of proper result

Solution
During sorting, the index increament for no dictionary measure data was not 
happening correctly, hence was trying to cast to byte array and failing
If table is dropped from first session and created again, and queries from 
another session,
the metastore needs to be updated for newly created table, but since the 
database in identifier was None.
we were trying to get old table from default database, here need to get from 
current database

This closes #2743


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/f962e41b
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/f962e41b
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/f962e41b

Branch: refs/heads/master
Commit: f962e41b7f2c2dd29ae71ad5e1f7797e3aaec084
Parents: edfcdca
Author: akashrn5 
Authored: Thu Sep 20 15:39:01 2018 +0530
Committer: kumarvishal09 
Committed: Fri Sep 21 18:46:25 2018 +0530

--
 .../execution/command/datamap/CarbonDataMapShowCommand.scala| 2 +-
 .../scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala   | 5 -
 .../processing/loading/partition/impl/RawRowComparator.java | 2 +-
 .../sort/sortdata/IntermediateSortTempRowComparator.java| 2 +-
 .../carbondata/processing/sort/sortdata/NewRowComparator.java   | 2 +-
 5 files changed, 8 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/f962e41b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapShowCommand.scala
--
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapShowCommand.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapShowCommand.scala
index b583a30..ae33aa8 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapShowCommand.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/datamap/CarbonDataMapShowCommand.scala
@@ -57,8 +57,8 @@ case class CarbonDataMapShowCommand(tableIdentifier: 
Option[TableIdentifier])
 val dataMapSchemaList: util.List[DataMapSchema] = new 
util.ArrayList[DataMapSchema]()
 tableIdentifier match {
   case Some(table) =>
-Checker.validateTableExists(table.database, table.table, sparkSession)
 val carbonTable = CarbonEnv.getCarbonTable(table)(sparkSession)
+Checker.validateTableExists(table.database, table.table, sparkSession)
 if (carbonTable.hasDataMapSchema) {
   
dataMapSchemaList.addAll(carbonTable.getTableInfo.getDataMapSchemaList)
 }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/f962e41b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
--
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
index 1840c5d..982bbee 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonFileMetastore.scala
@@ -580,7 +580,10 @@ class CarbonFileMetastore extends CarbonMetaStore {
 
tableModifiedTimeStore.get(CarbonCommonConstants.DATABASE_DEFAULT_NAME))) {
 metadata.carbonTables = metadata.carbonTables.filterNot(
   table => table.getTableName.equalsIgnoreCase(tableIdentifier.table) 
&&
-
table.getDatabaseName.equalsIgnoreCase(tableIdentifier.database.getOrElse("default")))
+   table.getDatabaseName
+ .equalsIgnoreCase(tableIdentifier.database
+   
.getOrElse(SparkSession.getActiveSession.get.sessionState.catalog
+ .getCurrentDatabase)))
 updateSchemasUpdatedTime(lastModifiedTime)
 isRefreshed = true
   }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/f962e41b/processing/src/main/java/org/apache/carbondata/processing/loading/partition

carbondata git commit: [HOTFIX] Correct metrics and avoid twice read when prefetch is disabled

2018-09-21 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 817230da1 -> b04269b2b


[HOTFIX] Correct metrics and avoid twice read when prefetch is disabled

When prefetch is disabled in full scan queries read twice the data. This PR 
removes extra read.

This closes #2737


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/b04269b2
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/b04269b2
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/b04269b2

Branch: refs/heads/master
Commit: b04269b2b8d05ce21e2fb4f8ebeab668e902aba7
Parents: 817230d
Author: ravipesala 
Authored: Thu Sep 20 14:44:09 2018 +0530
Committer: kumarvishal09 
Committed: Fri Sep 21 16:46:15 2018 +0530

--
 .../carbondata/core/scan/scanner/impl/BlockletFullScanner.java| 3 ---
 1 file changed, 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/b04269b2/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java
index f61a8b1..4ec8cb6 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java
@@ -84,9 +84,6 @@ public class BlockletFullScanner implements BlockletScanner {
 String blockletId = blockExecutionInfo.getBlockIdString() + 
CarbonCommonConstants.FILE_SEPARATOR
 + rawBlockletColumnChunks.getDataBlock().blockletIndex();
 scannedResult.setBlockletId(blockletId);
-if (!blockExecutionInfo.isPrefetchBlocklet()) {
-  readBlocklet(rawBlockletColumnChunks);
-}
 DimensionRawColumnChunk[] dimensionRawColumnChunks =
 rawBlockletColumnChunks.getDimensionRawColumnChunks();
 DimensionColumnPage[][] dimensionColumnDataChunks =



carbondata git commit: [HOTFIX] Old stores cannot read with new table infered through sdk.

2018-09-12 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master daa91c88e -> 4c692d185


[HOTFIX] Old stores cannot read with new table infered through sdk.

Problem: Old stores column schema is written in the different case then 
fileformat cannot read data because of sdk infer schema always gives lower case 
schema.
Solution: Do case insensitivity check while comparing.
It also disables prefetch as it is redundant for fileformat read and not 
getting inputmetrics properly if we use thread

This closes #2704


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/4c692d18
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/4c692d18
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/4c692d18

Branch: refs/heads/master
Commit: 4c692d185c4247e645d94c2d79787744c413817b
Parents: daa91c8
Author: ravipesala 
Authored: Mon Sep 10 21:11:18 2018 +0530
Committer: kumarvishal09 
Committed: Wed Sep 12 19:16:39 2018 +0530

--
 .../apache/carbondata/core/metadata/CarbonMetadata.java   |  5 +++--
 .../metadata/schema/table/AggregationDataMapSchema.java   |  4 ++--
 .../core/metadata/schema/table/column/ColumnSchema.java   |  2 +-
 .../core/scan/executor/impl/AbstractQueryExecutor.java|  6 +-
 .../core/scan/executor/util/RestructureUtil.java  |  7 ---
 .../scan/expression/logical/BinaryLogicalExpression.java  |  2 +-
 .../apache/carbondata/core/scan/filter/FilterUtil.java|  2 +-
 .../org/apache/carbondata/core/scan/model/QueryModel.java | 10 ++
 .../apache/carbondata/core/util/BlockletDataMapUtil.java  |  2 +-
 .../java/org/apache/carbondata/core/util/CarbonUtil.java  |  2 +-
 .../execution/datasources/SparkCarbonFileFormat.scala |  1 +
 11 files changed, 30 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/4c692d18/core/src/main/java/org/apache/carbondata/core/metadata/CarbonMetadata.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/metadata/CarbonMetadata.java 
b/core/src/main/java/org/apache/carbondata/core/metadata/CarbonMetadata.java
index 3f8c12d..850f477 100644
--- a/core/src/main/java/org/apache/carbondata/core/metadata/CarbonMetadata.java
+++ b/core/src/main/java/org/apache/carbondata/core/metadata/CarbonMetadata.java
@@ -143,7 +143,7 @@ public final class CarbonMetadata {
 List listOfCarbonDims =
 carbonTable.getDimensionByTableName(carbonTable.getTableName());
 for (CarbonDimension dimension : listOfCarbonDims) {
-  if (dimension.getColumnId().equals(columnIdentifier)) {
+  if (dimension.getColumnId().equalsIgnoreCase(columnIdentifier)) {
 return dimension;
   }
   if (dimension.getNumberOfChild() > 0) {
@@ -168,7 +168,8 @@ public final class CarbonMetadata {
   private CarbonDimension getCarbonChildDimsBasedOnColIdentifier(String 
columnIdentifier,
   CarbonDimension dimension) {
 for (int i = 0; i < dimension.getNumberOfChild(); i++) {
-  if 
(dimension.getListOfChildDimensions().get(i).getColumnId().equals(columnIdentifier))
 {
+  if (dimension.getListOfChildDimensions().get(i).getColumnId()
+  .equalsIgnoreCase(columnIdentifier)) {
 return dimension.getListOfChildDimensions().get(i);
   } else if 
(dimension.getListOfChildDimensions().get(i).getNumberOfChild() > 0) {
 CarbonDimension childDim = 
getCarbonChildDimsBasedOnColIdentifier(columnIdentifier,

http://git-wip-us.apache.org/repos/asf/carbondata/blob/4c692d18/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/AggregationDataMapSchema.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/AggregationDataMapSchema.java
 
b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/AggregationDataMapSchema.java
index 2bb6d18..c8bb5ad 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/AggregationDataMapSchema.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/metadata/schema/table/AggregationDataMapSchema.java
@@ -152,7 +152,7 @@ public class AggregationDataMapSchema extends DataMapSchema 
{
   List parentColumnTableRelations =
   columnSchema.getParentColumnTableRelations();
   if (null != parentColumnTableRelations && 
parentColumnTableRelations.size() == 1
-  && 
parentColumnTableRelations.get(0).getColumnName().equals(columName) &&
+  && 
parentColumnTableRelations.get(0).getColumnName().equalsIgnoreCase(columName) &&
   columnSchema.getColumnName().endsWith(columName)) {
 return columnSchema;
   }
@@ -198,7 +19

carbondata git commit: [CARBONDATA-2915] update document links

2018-09-10 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master a9cc43411 -> 73a5885a4


[CARBONDATA-2915] update document links

update document links

This closes #2707


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/73a5885a
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/73a5885a
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/73a5885a

Branch: refs/heads/master
Commit: 73a5885a4a4ab85aab45602bd2c6ab93f40f98dc
Parents: a9cc434
Author: Raghunandan S 
Authored: Tue Sep 11 12:59:53 2018 +0800
Committer: kumarvishal09 
Committed: Tue Sep 11 10:47:32 2018 +0530

--
 README.md | 9 -
 docs/dml-of-carbondata.md | 6 +++---
 docs/language-manual.md   | 2 +-
 3 files changed, 8 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/73a5885a/README.md
--
diff --git a/README.md b/README.md
index 960d4e9..ba2cbf7 100644
--- a/README.md
+++ b/README.md
@@ -48,8 +48,7 @@ CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com
 * [Quick 
Start](https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md)
 * [CarbonData File 
Structure](https://github.com/apache/carbondata/blob/master/docs/file-structure-of-carbondata.md)
 * [Data 
Types](https://github.com/apache/carbondata/blob/master/docs/supported-data-types-in-carbondata.md)
-* [Data Management on 
CarbonData](https://github.com/apache/carbondata/blob/master/docs/data-management-on-carbondata.md)
-* [Cluster Installation and 
Deployment](https://github.com/apache/carbondata/blob/master/docs/installation-guide.md)
+* [Data Management on 
CarbonData](https://github.com/apache/carbondata/blob/master/docs/language-manual.md)
 * [Configuring 
Carbondata](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md)
 * [Streaming 
Ingestion](https://github.com/apache/carbondata/blob/master/docs/streaming-guide.md)
 * [SDK 
Guide](https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md)
@@ -60,9 +59,9 @@ CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com
 * [CarbonData Lucene 
DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/lucene-datamap-guide.md)
 * [CarbonData Pre-aggregate 
DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/preaggregate-datamap-guide.md)
 * [CarbonData Timeseries 
DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/timeseries-datamap-guide.md)
+* [Performance 
Tuning](https://github.com/apache/carbondata/blob/master/docs/performance-tuning.md)
 * [FAQ](https://github.com/apache/carbondata/blob/master/docs/faq.md)
-* [Trouble 
Shooting](https://github.com/apache/carbondata/blob/master/docs/troubleshooting.md)
-* [Useful 
Tips](https://github.com/apache/carbondata/blob/master/docs/useful-tips-on-carbondata.md)
+* [Use 
Cases](https://github.com/apache/carbondata/blob/master/docs/usecases.md)
 
 ## Other Technical Material
 * [Apache CarbonData meetup 
material](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=66850609)
@@ -70,7 +69,7 @@ CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com
 
 ## Fork and Contribute
 This is an active open source project for everyone, and we are always open to 
people who want to use this system or contribute to it. 
-This guide document introduce [how to contribute to 
CarbonData](https://github.com/apache/carbondata/blob/master/docs/How-to-contribute-to-Apache-CarbonData.md).
+This guide document introduce [how to contribute to 
CarbonData](https://github.com/apache/carbondata/blob/master/docs/how-to-contribute-to-apache-carbondata.md).
 
 ## Contact us
 To get involved in CarbonData:

http://git-wip-us.apache.org/repos/asf/carbondata/blob/73a5885a/docs/dml-of-carbondata.md
--
diff --git a/docs/dml-of-carbondata.md b/docs/dml-of-carbondata.md
index 42da655..98bb132 100644
--- a/docs/dml-of-carbondata.md
+++ b/docs/dml-of-carbondata.md
@@ -46,7 +46,7 @@ CarbonData DML statements are documented here,which includes:
 | --- | 
 |
 | [DELIMITER](#delimiter) | Character used to 
separate the data in the input csv file|
 | [QUOTECHAR](#quotechar) | Character used to 
quote the data in the input csv file   |
-| [COMMENTCHAR](#commentchar) | Character used to 
comment the rows in the input csv file.Those rows will be skipped from 
processing |
+| [COMMENTCHAR](#commentc

carbondata git commit: [CARBONDATA-2889]Add decoder based fallback mechanism in local dictionary to reduce memory footprint

2018-09-10 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 9ebab5748 -> 2ccdbb78c


[CARBONDATA-2889]Add decoder based fallback mechanism in local dictionary to 
reduce memory footprint

Problem
Currently, when the fallback is initiated for a column page in case of local 
dictionary,
we are keeping both encoded data and actual data in memory and then we form the 
new column page without
dictionary encoding and then at last we free the Encoded Column Page.Because of 
this offheap memory footprint increases.

Solution
We can reduce the offheap memory footprint. This can be done using decoder 
based fallback mechanism.
This means, no need to keep the actual data along with encoded data in encoded 
column page.
We can keep only encoded data and to form a new column page,
get the dictionary data from encoded column page by uncompressing and using 
dictionary data
get the actual data using local dictionary generator and put it in new column 
page created and
compress it again and give to consumer for writing blocklet.

This closes #2662


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/2ccdbb78
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/2ccdbb78
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/2ccdbb78

Branch: refs/heads/master
Commit: 2ccdbb78c461be8c68770f7732c233c319a65ad1
Parents: 9ebab57
Author: akashrn5 
Authored: Mon Aug 20 10:29:26 2018 +0530
Committer: kumarvishal09 
Committed: Mon Sep 10 14:24:55 2018 +0530

--
 .../core/constants/CarbonCommonConstants.java   |  10 ++
 .../blocklet/BlockletEncodedColumnPage.java |  42 +-
 .../datastore/blocklet/EncodedBlocklet.java |  34 +++--
 .../reader/dimension/AbstractChunkReader.java   |  15 ---
 .../AbstractChunkReaderV2V3Format.java  |  12 --
 ...mpressedDimensionChunkFileBasedReaderV1.java |   2 +-
 ...mpressedDimensionChunkFileBasedReaderV2.java |   8 +-
 ...essedDimChunkFileBasedPageLevelReaderV3.java |   4 +-
 ...mpressedDimensionChunkFileBasedReaderV3.java |  10 +-
 .../page/ActualDataBasedFallbackEncoder.java|  67 ++
 .../core/datastore/page/ColumnPage.java |   9 +-
 .../page/DecoderBasedFallbackEncoder.java   | 132 +++
 .../page/FallbackColumnPageEncoder.java |  86 
 .../datastore/page/LocalDictColumnPage.java |  19 ++-
 .../apache/carbondata/core/util/CarbonUtil.java |  73 ++
 .../VectorizedCarbonRecordReader.java   |   3 +-
 .../store/writer/v3/BlockletDataHolder.java |  11 +-
 .../writer/v3/CarbonFactDataWriterImplV3.java   |   2 +-
 18 files changed, 389 insertions(+), 150 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/2ccdbb78/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index 3bdb2f7..7a34c98 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -881,6 +881,16 @@ public final class CarbonCommonConstants {
   public static final String LOCAL_DICTIONARY_SYSTEM_ENABLE = 
"carbon.local.dictionary.enable";
 
   /**
+   * System property to enable or disable decoder based local dictionary 
fallback
+   */
+  public static final String LOCAL_DICTIONARY_DECODER_BASED_FALLBACK =
+  "carbon.local.dictionary.decoder.fallback";
+
+  /**
+   * System property to enable or disable decoder based local dictionary 
fallback default value
+   */
+  public static final String LOCAL_DICTIONARY_DECODER_BASED_FALLBACK_DEFAULT = 
"true";
+  /**
* Threshold value for local dictionary
*/
   public static final String LOCAL_DICTIONARY_THRESHOLD = 
"local_dictionary_threshold";

http://git-wip-us.apache.org/repos/asf/carbondata/blob/2ccdbb78/core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java
index 8abc0e4..135b1e2 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/blocklet/BlockletEncodedColumnPage.java
@@ -26,10 +26,12 @@ import java.util.concurrent.Future;
 
 import org.apache.ca

carbondata git commit: [CARBONDATA-2895] Fix Query result count is more than actual csv rows with Batch-sort in save to disk (sort temp files) scenario

2018-09-05 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 94d2089b2 -> 50248f51b


[CARBONDATA-2895] Fix Query result count is more than actual csv rows with 
Batch-sort in save to disk (sort temp files) scenario

probelm: Query result mismatch with Batch-sort in save to disk (sort
temp files) scenario.

scenario:
a) Configure batchsort but give batch size more than
UnsafeMemoryManager.INSTANCE.getUsableMemory().
b) Load data that is greater than batch size. Observe that
unsafeMemoryManager save to disk happened as it cannot process one
batch.
c) so load happens in 2 batch.
d) When query the results. There result data rows is more than expected
data rows.

root cause:

For each batch, createSortDataRows() will be called.
Files saved to disk during sorting of previous batch was considered for
this batch.

solution:
Files saved to disk during sorting of previous batch ,should not be
considered for this batch.
Hence use batchID as rangeID field of sorttempfiles.
So getFilesToMergeSort() will select files of only this batch.

This closes #2664


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/50248f51
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/50248f51
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/50248f51

Branch: refs/heads/master
Commit: 50248f51bcaf44f37429d2420c6ecf5c815c3770
Parents: 94d2089
Author: ajantha-bhat 
Authored: Mon Aug 27 20:55:03 2018 +0530
Committer: kumarvishal09 
Committed: Wed Sep 5 20:30:59 2018 +0530

--
 .../impl/UnsafeBatchParallelReadMergeSorterImpl.java | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/50248f51/processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java
--
diff --git 
a/processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java
 
b/processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java
index 5cb099e..1b1d383 100644
--- 
a/processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java
+++ 
b/processing/src/main/java/org/apache/carbondata/processing/loading/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java
@@ -62,12 +62,17 @@ public class UnsafeBatchParallelReadMergeSorterImpl extends 
AbstractMergeSorter
 
   private AtomicLong rowCounter;
 
+  /* will be incremented for each batch. This ID is used in sort temp files 
name,
+   to identify files of that batch */
+  private AtomicInteger batchId;
+
   public UnsafeBatchParallelReadMergeSorterImpl(AtomicLong rowCounter) {
 this.rowCounter = rowCounter;
   }
 
   @Override public void initialize(SortParameters sortParameters) {
 this.sortParameters = sortParameters;
+batchId = new AtomicInteger(0);
 
   }
 
@@ -172,7 +177,7 @@ public class UnsafeBatchParallelReadMergeSorterImpl extends 
AbstractMergeSorter
 
   }
 
-  private static class SortBatchHolder
+  private class SortBatchHolder
   extends CarbonIterator {
 
 private SortParameters sortParameters;
@@ -193,7 +198,7 @@ public class UnsafeBatchParallelReadMergeSorterImpl extends 
AbstractMergeSorter
 
 private final Object lock = new Object();
 
-public SortBatchHolder(SortParameters sortParameters, int numberOfThreads,
+SortBatchHolder(SortParameters sortParameters, int numberOfThreads,
 ThreadStatusObserver threadStatusObserver) {
   this.sortParameters = sortParameters.getCopy();
   this.iteratorCount = new AtomicInteger(numberOfThreads);
@@ -203,6 +208,12 @@ public class UnsafeBatchParallelReadMergeSorterImpl 
extends AbstractMergeSorter
 }
 
 private void createSortDataRows() {
+  // For each batch, createSortDataRows() will be called.
+  // Files saved to disk during sorting of previous batch,should not be 
considered
+  // for this batch.
+  // Hence use batchID as rangeID field of sorttempfiles.
+  // so getFilesToMergeSort() will select only this batch files.
+  this.sortParameters.setRangeId(batchId.incrementAndGet());
   int inMemoryChunkSizeInMB = 
CarbonProperties.getInstance().getSortMemoryChunkSizeInMB();
   setTempLocation(sortParameters);
   this.finalMerger = new 
UnsafeSingleThreadFinalSortFilesMerger(sortParameters,



carbondata git commit: [HOTFIX] improve sdk multi-thread performance

2018-09-05 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master af2c469bb -> 94d2089b2


[HOTFIX] improve sdk multi-thread performance

problem: currently SDK writer will create multiple iterators in multi-thread 
scenario.
 But filling each iterator is not happening concurrently as it is synchronized 
at method level.

solution: In SDK multi-thread write scenario, don't synchronize method level. 
Synchronize at iterator level.
As each iterator has its own queue, it can be done concurrently.
Also for Avro can use sdkUserCore in input processor step.

This closes #2672


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/94d2089b
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/94d2089b
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/94d2089b

Branch: refs/heads/master
Commit: 94d2089b246d2e4dc0ea2a673a89553e5eff1e35
Parents: af2c469
Author: ajantha-bhat 
Authored: Wed Aug 29 23:11:09 2018 +0530
Committer: kumarvishal09 
Committed: Wed Sep 5 20:25:39 2018 +0530

--
 .../hadoop/api/CarbonTableOutputFormat.java |  27 +++--
 .../loading/DataLoadProcessBuilder.java |   4 +-
 .../loading/model/CarbonLoadModel.java  |  14 +--
 .../loading/steps/InputProcessorStepImpl.java   |   7 +-
 .../InputProcessorStepWithNoConverterImpl.java  |  31 +
 .../steps/JsonInputProcessorStepImpl.java   |   9 +-
 .../util/CarbonDataProcessorUtil.java   |   6 +-
 .../sdk/file/CarbonWriterBuilder.java   |   6 +-
 .../sdk/file/ConcurrentAvroSdkWriterTest.java   | 116 +++
 9 files changed, 162 insertions(+), 58 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/94d2089b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
--
diff --git 
a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
 
b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
index 5cc275b..99d8532 100644
--- 
a/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
+++ 
b/hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableOutputFormat.java
@@ -23,6 +23,7 @@ import java.util.concurrent.ExecutionException;
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.Executors;
 import java.util.concurrent.Future;
+import java.util.concurrent.atomic.AtomicLong;
 
 import org.apache.carbondata.core.constants.CarbonCommonConstants;
 import org.apache.carbondata.core.constants.CarbonLoadOptionConstants;
@@ -235,8 +236,8 @@ public class CarbonTableOutputFormat extends 
FileOutputFormat 0) ? sdkUserCore : 1;
+short sdkWriterCores = loadModel.getSdkWriterCores();
+int itrSize = (sdkWriterCores > 0) ? sdkWriterCores : 1;
 final CarbonOutputIteratorWrapper[] iterators = new 
CarbonOutputIteratorWrapper[itrSize];
 for (int i = 0; i < itrSize; i++) {
   iterators[i] = new CarbonOutputIteratorWrapper();
@@ -273,7 +274,7 @@ public class CarbonTableOutputFormat extends 
FileOutputFormat 0) {
+if (sdkWriterCores > 0) {
   // CarbonMultiRecordWriter handles the load balancing of the write rows 
in round robin.
   return new CarbonMultiRecordWriter(iterators, dataLoadExecutor, 
loadModel, future,
   executorService);
@@ -460,27 +461,31 @@ public class CarbonTableOutputFormat extends 
FileOutputFormathttp://git-wip-us.apache.org/repos/asf/carbondata/blob/94d2089b/processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java
--
diff --git 
a/processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java
 
b/processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java
index 666c598..a628d41 100644
--- 
a/processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java
+++ 
b/processing/src/main/java/org/apache/carbondata/processing/loading/DataLoadProcessBuilder.java
@@ -313,8 +313,8 @@ public final class DataLoadProcessBuilder {
 }
 TableSpec tableSpec = new TableSpec(carbonTable);
 configuration.setTableSpec(tableSpec);
-if (loadModel.getSdkUserCores() > 0) {
-  configuration.setWritingCoresCount(loadModel.getSdkUserCores());
+if (loadModel.getSdkWriterCores() > 0) {
+  configuration.setWritingCoresCount(loadModel.getSdkWriterCores());
 }
 return configuration;
   }

http://git-wip-us.apache.org/repos/asf/carbondata/blob/94d2089b/processing/src/main/java/org/apache/carbondata/processing/loading/model/CarbonLoadModel.java
-

carbondata git commit: [CARBONDATA-2898] Fix double boundary condition and clear datamaps issue

2018-08-30 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master e8ddbbb02 -> de0f54516


[CARBONDATA-2898] Fix double boundary condition and clear datamaps issue

1.DataMaps are not clearing properly as it creates temp table for each request. 
Now it searches the datamap using the tablepath to clear and also to get the 
datamap

In double value bounadry cases loading fails as carbon does not handle infinite 
properly. Now added a check for infinite value.

Added validations for sort columns cannot be used while infering the schema.

This closes #2666


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/de0f5451
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/de0f5451
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/de0f5451

Branch: refs/heads/master
Commit: de0f54516431eb1454a588d48641e0f540127279
Parents: e8ddbbb
Author: ravipesala 
Authored: Tue Aug 28 17:17:38 2018 +0530
Committer: kumarvishal09 
Committed: Thu Aug 30 11:53:06 2018 +0530

--
 .../core/datamap/DataMapStoreManager.java   | 41 --
 .../core/datastore/impl/FileFactory.java| 20 +
 .../statistics/PrimitivePageStatsCollector.java | 14 +++-
 .../datasources/SparkCarbonFileFormat.scala | 17 ++--
 .../datasource/SparkCarbonDataSourceTest.scala  | 86 +++-
 .../loading/model/CarbonLoadModelBuilder.java   |  4 +-
 6 files changed, 157 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/de0f5451/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
 
b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
index 6e4fb4d..22db211 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datamap/DataMapStoreManager.java
@@ -315,6 +315,13 @@ public final class DataMapStoreManager {
 String tableUniqueName =
 
table.getAbsoluteTableIdentifier().getCarbonTableIdentifier().getTableUniqueName();
 List tableIndices = allDataMaps.get(tableUniqueName);
+if (tableIndices == null) {
+  String keyUsingTablePath = getKeyUsingTablePath(table.getTablePath());
+  if (keyUsingTablePath != null) {
+tableUniqueName = keyUsingTablePath;
+tableIndices = allDataMaps.get(tableUniqueName);
+  }
+}
 TableDataMap dataMap = null;
 if (tableIndices != null) {
   dataMap = getTableDataMap(dataMapSchema.getDataMapName(), tableIndices);
@@ -341,6 +348,18 @@ public final class DataMapStoreManager {
 return dataMap;
   }
 
+  private String getKeyUsingTablePath(String tablePath) {
+if (tablePath != null) {
+  // Try get using table path
+  for (Map.Entry entry : tablePathMap.entrySet()) {
+if (new Path(entry.getValue()).equals(new Path(tablePath))) {
+  return entry.getKey();
+}
+  }
+}
+return null;
+  }
+
   /**
* Return a new datamap instance and registered in the store manager.
* The datamap is created using datamap name, datamap factory class and 
table identifier.
@@ -379,6 +398,13 @@ public final class DataMapStoreManager {
 getTableSegmentRefresher(table);
 List tableIndices = allDataMaps.get(tableUniqueName);
 if (tableIndices == null) {
+  String keyUsingTablePath = getKeyUsingTablePath(table.getTablePath());
+  if (keyUsingTablePath != null) {
+tableUniqueName = keyUsingTablePath;
+tableIndices = allDataMaps.get(tableUniqueName);
+  }
+}
+if (tableIndices == null) {
   tableIndices = new ArrayList<>();
 }
 
@@ -434,14 +460,11 @@ public final class DataMapStoreManager {
 CarbonTable carbonTable = getCarbonTable(identifier);
 String tableUniqueName = 
identifier.getCarbonTableIdentifier().getTableUniqueName();
 List tableIndices = allDataMaps.get(tableUniqueName);
-if (tableIndices == null && identifier.getTablePath() != null) {
-  // Try get using table path
-  for (Map.Entry entry : tablePathMap.entrySet()) {
-if (new Path(entry.getValue()).equals(new 
Path(identifier.getTablePath( {
-  tableIndices = allDataMaps.get(entry.getKey());
-  tableUniqueName = entry.getKey();
-  break;
-}
+if (tableIndices == null) {
+  String keyUsingTablePath = 
getKeyUsingTablePath(identifier.getTablePath());
+  if (keyUsingTablePath != null) {
+tableUniqueName = keyUsingTablePath;
+tableIndices = allDataMaps.get(tableUniqueName);
   }
 }
 if (null != carbonTable && tableInd

carbondata git commit: [CARBONDATA-2887] Fix complex filters on spark carbon file format

2018-08-29 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master d801548aa -> 2f537b724


[CARBONDATA-2887] Fix complex filters on spark carbon file format

Problem:
Filters on complex types are not working using carbon fileformat as it try to 
push down nonull filter of complex type to carbon,
 but carbon does not handle any type of filters in complex types.
Solution:
Removed all types complex filters pushed down from carbon fileformat

This closes #2659


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/2f537b72
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/2f537b72
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/2f537b72

Branch: refs/heads/master
Commit: 2f537b724f6f03ab40c95f7ecc8ebd38f6500099
Parents: d801548
Author: ravipesala 
Authored: Fri Aug 24 20:43:07 2018 +0530
Committer: kumarvishal09 
Committed: Wed Aug 29 13:27:08 2018 +0530

--
 .../spark/sql/test/TestQueryExecutor.scala  |   1 +
 .../execution/datasources/CarbonFileIndex.scala |  15 +-
 .../CarbonFileIndexReplaceRule.scala|   2 +-
 .../datasources/CarbonSparkDataSourceUtil.scala |  34 ++-
 .../datasources/SparkCarbonFileFormat.scala |  33 ++-
 .../src/test/resources/Array.csv|  21 ++
 .../spark-datasource/src/test/resources/j2.csv  |   1 +
 .../src/test/resources/structofarray.csv|  21 ++
 .../datasource/SparkCarbonDataSourceTest.scala  | 267 +--
 ...tCreateTableUsingSparkCarbonFileFormat.scala |   9 +-
 .../sql/carbondata/datasource/TestUtil.scala|  16 +-
 .../InputProcessorStepWithNoConverterImpl.java  |  21 +-
 12 files changed, 355 insertions(+), 86 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/2f537b72/integration/spark-common/src/main/scala/org/apache/spark/sql/test/TestQueryExecutor.scala
--
diff --git 
a/integration/spark-common/src/main/scala/org/apache/spark/sql/test/TestQueryExecutor.scala
 
b/integration/spark-common/src/main/scala/org/apache/spark/sql/test/TestQueryExecutor.scala
index d3a20c3..f69a142 100644
--- 
a/integration/spark-common/src/main/scala/org/apache/spark/sql/test/TestQueryExecutor.scala
+++ 
b/integration/spark-common/src/main/scala/org/apache/spark/sql/test/TestQueryExecutor.scala
@@ -153,6 +153,7 @@ object TestQueryExecutor {
 TestQueryExecutor.projectPath + "/core/target",
 TestQueryExecutor.projectPath + "/hadoop/target",
 TestQueryExecutor.projectPath + "/processing/target",
+TestQueryExecutor.projectPath + "/integration/spark-datasource/target",
 TestQueryExecutor.projectPath + "/integration/spark-common/target",
 TestQueryExecutor.projectPath + "/integration/spark2/target",
 TestQueryExecutor.projectPath + "/integration/spark-common/target/jars",

http://git-wip-us.apache.org/repos/asf/carbondata/blob/2f537b72/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala
--
diff --git 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala
 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala
index 8471181..c330fcb 100644
--- 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala
+++ 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/carbondata/execution/datasources/CarbonFileIndex.scala
@@ -21,14 +21,13 @@ import java.util
 
 import scala.collection.JavaConverters._
 
-import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.fs.Path
 import org.apache.hadoop.mapred.JobConf
 import org.apache.hadoop.mapreduce.Job
 import org.apache.spark.deploy.SparkHadoopUtil
 import org.apache.spark.sql.SparkSession
 import org.apache.spark.sql.catalyst.expressions.Expression
-import org.apache.spark.sql.execution.datasources.{InMemoryFileIndex, _}
+import org.apache.spark.sql.execution.datasources._
 import org.apache.spark.sql.types.StructType
 
 import org.apache.carbondata.core.datastore.filesystem.{CarbonFile, 
HDFSCarbonFile}
@@ -79,9 +78,9 @@ class CarbonFileIndex(
   }
 
   private def prune(dataFilters: Seq[Expression],
-  directories: Seq[PartitionDirectory]) = {
+  directories: Seq[PartitionDirectory]): Seq[PartitionDirectory] = {
 val tablePath = parameters.get("path")
-if (tablePath.nonEmpty) {
+if (tablePath.nonEmpty && dataFilters.nonEmpty) {
   val hadoopConf = sparkSession.sessionState.newHad

carbondata git commit: [CARBONDATA-2885] Broadcast Issue and Small file distribution Issue

2018-08-27 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master f81543e95 -> 1fb1f19f2


[CARBONDATA-2885] Broadcast Issue and Small file distribution Issue

Issue :-

In External Table Carbon Relation sizeInByte is wrong (always 0) because of 
this Join Queries are identified for broadcast even
Table actual size is > 10MB( default broadcast).This is making fail some of the 
join table ( table which should select sortmergeJoin but
because of wrong calculation it gone for broadcast join)

if Merge_small_file task distribution is enabled ,Join queries are failed 
(TPCH).
carbon opens many carbon files but it not getting closed.

Root Cause :-
1. Current relation size calculation is based on tablestatus file but since
External Table does not have tablestatus file so always zero was returned.
2. if Merge_small_file task distribution is enabled carbon opens many carbon 
files but it not getting closed.
Solution :-

if Table is External Table then calculate size from TablePath .
close the carbon files for scan is finished.

This closes #2658


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/1fb1f19f
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/1fb1f19f
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/1fb1f19f

Branch: refs/heads/master
Commit: 1fb1f19f207eb157711ae0c7a79fd39b883e4621
Parents: f81543e
Author: BJangir 
Authored: Fri Aug 24 14:47:49 2018 +0530
Committer: kumarvishal09 
Committed: Mon Aug 27 12:57:22 2018 +0530

--
 .../AbstractDetailQueryResultIterator.java  |  5 ++
 .../apache/spark/sql/hive/CarbonRelation.scala  | 65 +++-
 2 files changed, 42 insertions(+), 28 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/1fb1f19f/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java
index 01aa939..26925d3 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/result/iterator/AbstractDetailQueryResultIterator.java
@@ -254,6 +254,11 @@ public abstract class AbstractDetailQueryResultIterator 
extends CarbonIterato
 
   private DataBlockIterator getDataBlockIterator() {
 if (blockExecutionInfos.size() > 0) {
+  try {
+fileReader.finish();
+  } catch (IOException e) {
+throw new RuntimeException(e);
+  }
   BlockExecutionInfo executionInfo = blockExecutionInfos.get(0);
   blockExecutionInfos.remove(executionInfo);
   return new DataBlockIterator(executionInfo, fileReader, batchSize, 
queryStatisticsModel,

http://git-wip-us.apache.org/repos/asf/carbondata/blob/1fb1f19f/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
--
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
index f700441..80257b8 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
@@ -156,39 +156,48 @@ case class CarbonRelation(
   private var sizeInBytesLocalValue = 0L
 
   def sizeInBytes: Long = {
-val tableStatusNewLastUpdatedTime = 
SegmentStatusManager.getTableStatusLastModifiedTime(
-  carbonTable.getAbsoluteTableIdentifier)
-if (tableStatusLastUpdateTime != tableStatusNewLastUpdatedTime) {
-  if (new SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier)
-.getValidAndInvalidSegments.getValidSegments.isEmpty) {
-sizeInBytesLocalValue = 0L
-  } else {
-val tablePath = carbonTable.getTablePath
-val fileType = FileFactory.getFileType(tablePath)
-if (FileFactory.isFileExist(tablePath, fileType)) {
-  // get the valid segments
-  val segments = new 
SegmentStatusManager(carbonTable.getAbsoluteTableIdentifier)
-.getValidAndInvalidSegments.getValidSegments.asScala
-  var size = 0L
-  // for each segment calculate the size
-  segments.foreach {validSeg =>
-// for older store
-if (null != validSeg.getLoadMetadataDetails.getDataSize &&
-null != validSeg.getLoadMetadataDetails.getIndexSize) {
- 

[3/4] carbondata git commit: [CARBONDATA-2872] Added Spark FileFormat interface implementation in Carbon

2018-08-24 Thread kumarvishal09
http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark-datasource/pom.xml
--
diff --git a/integration/spark-datasource/pom.xml 
b/integration/spark-datasource/pom.xml
new file mode 100644
index 000..38cf629
--- /dev/null
+++ b/integration/spark-datasource/pom.xml
@@ -0,0 +1,196 @@
+
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance; 
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+
+  4.0.0
+
+  
+org.apache.carbondata
+carbondata-parent
+1.5.0-SNAPSHOT
+../../pom.xml
+  
+
+  carbondata-spark-datasource
+  Apache CarbonData :: Spark Datasource
+
+  
+${basedir}/../../dev
+true
+  
+
+  
+
+  org.apache.carbondata
+  carbondata-hadoop
+  ${project.version}
+
+
+  org.apache.carbondata
+  carbondata-store-sdk
+  ${project.version}
+
+
+  org.apache.spark
+  spark-hive-thriftserver_${scala.binary.version}
+
+
+  org.apache.spark
+  spark-repl_${scala.binary.version}
+
+
+  junit
+  junit
+  test
+
+
+  org.scalatest
+  scalatest_${scala.binary.version}
+  test
+
+
+  org.apache.hadoop
+  hadoop-aws
+  ${hadoop.version}
+  
+
+  com.fasterxml.jackson.core
+  jackson-core
+
+
+  com.fasterxml.jackson.core
+  jackson-annotations
+
+
+  com.fasterxml.jackson.core
+  jackson-databind
+
+  
+
+  
+
+  
+src/test/scala
+
+  
+src/resources
+  
+  
+.
+
+  CARBON_SPARK_INTERFACELogResource.properties
+
+  
+
+
+  
+org.scala-tools
+maven-scala-plugin
+2.15.2
+
+  
+compile
+
+  compile
+
+compile
+  
+  
+testCompile
+
+  testCompile
+
+test
+  
+  
+process-resources
+
+  compile
+
+  
+
+  
+  
+maven-compiler-plugin
+
+  1.7
+  1.7
+
+  
+  
+org.apache.maven.plugins
+maven-surefire-plugin
+2.18
+
+
+  
${project.build.directory}/surefire-reports
+  -Xmx3g -XX:MaxPermSize=512m 
-XX:ReservedCodeCacheSize=512m
+  
+true
+
${carbon.hive.based.metastore}
+  
+  false
+
+  
+  
+org.scalatest
+scalatest-maven-plugin
+1.0
+
+
+  
${project.build.directory}/surefire-reports
+  .
+  CarbonTestSuite.txt
+   ${argLine} -ea -Xmx3g -XX:MaxPermSize=512m 
-XX:ReservedCodeCacheSize=512m
+  
+  
+  
+  
+  
+true
+
${carbon.hive.based.metastore}
+  
+
+
+  
+test
+
+  test
+
+  
+
+  
+
+  
+  
+
+  build-all
+  
+2.2.1
+2.11
+2.11.8
+  
+
+
+  sdvtest
+  
+true
+  
+
+  
+

http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark-datasource/src/main/scala/org/apache/carbondata/converter/SparkDataTypeConverterImpl.java
--
diff --git 
a/integration/spark-datasource/src/main/scala/org/apache/carbondata/converter/SparkDataTypeConverterImpl.java
 
b/integration/spark-datasource/src/main/scala/org/apache/carbondata/converter/SparkDataTypeConverterImpl.java
new file mode 100644
index 000..7e38691
--- /dev/null
+++ 
b/integration/spark-datasource/src/main/scala/org/apache/carbondata/converter/SparkDataTypeConverterImpl.java
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.converter;
+

[1/4] carbondata git commit: [CARBONDATA-2872] Added Spark FileFormat interface implementation in Carbon

2018-08-24 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 137245057 -> 347b8e1db


http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
--
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
index 91197fd..d8e8251 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
@@ -34,6 +34,7 @@ import org.apache.spark.sql.optimizer.{CarbonDecoderRelation, 
CarbonFilters}
 import org.apache.spark.sql.sources.{BaseRelation, Filter}
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.CarbonExpressions.{MatchCast => Cast}
+import 
org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil
 
 import 
org.apache.carbondata.common.exceptions.sql.MalformedCarbonCommandException
 import org.apache.carbondata.core.constants.CarbonCommonConstants
@@ -445,7 +446,7 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
 attrRef match {
   case Some(attr: AttributeReference) =>
 Some(AttributeReference(attr.name,
-  CarbonScalaUtil.convertCarbonToSparkDataType(n.getDataType),
+  
CarbonSparkDataSourceUtil.convertCarbonToSparkDataType(n.getDataType),
   attr.nullable,
   attr.metadata)(attr.exprId, attr.qualifier))
   case _ => None

http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
--
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
index 80d850b..f700441 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonRelation.scala
@@ -20,14 +20,12 @@ import java.util.LinkedHashSet
 
 import scala.Array.canBuildFrom
 import scala.collection.JavaConverters._
-import scala.util.parsing.combinator.RegexParsers
 
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
 import org.apache.spark.sql.catalyst.expressions.AttributeReference
 import org.apache.spark.sql.catalyst.plans.logical.{LeafNode, LogicalPlan}
 import org.apache.spark.sql.types._
-import org.apache.spark.sql.util.CarbonException
-import org.apache.spark.util.{CarbonMetastoreTypes, SparkTypeConverter}
+import org.apache.spark.sql.util.{CarbonMetastoreTypes, SparkTypeConverter}
 
 import org.apache.carbondata.core.datastore.impl.FileFactory
 import org.apache.carbondata.core.metadata.datatype.DataTypes

http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
--
diff --git 
a/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
 
b/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
index c052cd7..1ee22b6 100644
--- 
a/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
+++ 
b/integration/spark2/src/main/scala/org/apache/spark/sql/optimizer/CarbonFilters.scala
@@ -29,6 +29,7 @@ import org.apache.spark.sql.types._
 import org.apache.spark.sql.CarbonContainsWith
 import org.apache.spark.sql.CarbonEndsWith
 import org.apache.spark.sql.CarbonExpressions.{MatchCast => Cast}
+import 
org.apache.spark.sql.carbondata.execution.datasources.CarbonSparkDataSourceUtil
 import org.apache.spark.sql.catalyst.TableIdentifier
 import org.apache.spark.sql.hive.CarbonSessionCatalog
 
@@ -46,7 +47,6 @@ import org.apache.carbondata.core.util.CarbonProperties
 import org.apache.carbondata.core.util.ThreadLocalSessionInfo
 import org.apache.carbondata.datamap.{TextMatch, TextMatchLimit}
 import org.apache.carbondata.spark.CarbonAliasDecoderRelation
-import org.apache.carbondata.spark.util.CarbonScalaUtil
 
 
 /**
@@ -128,13 +128,15 @@ object CarbonFilters {
   Some(new SparkUnknownExpression(expr.transform {
 case AttributeReference(name, dataType, _, _) =>
   CarbonBoundReference(new CarbonColumnExpression(name.toString,
-CarbonScalaUtil.convertSparkToCarbonDataType(dataType)), 
dataType, expr.nullable)
+
CarbonSparkDataSourceUtil.convertSparkToCarbonDataType(dataType)),
+dataType, 

[4/4] carbondata git commit: [CARBONDATA-2872] Added Spark FileFormat interface implementation in Carbon

2018-08-24 Thread kumarvishal09
[CARBONDATA-2872] Added Spark FileFormat interface implementation in Carbon

Added new package carbondata-spark-datasource under 
/integration/spark-datasource
It contains the implementation of Spark's FileFormat and user can use carbon as 
format in spark
For example

create table test_table(c1 string, c2 int) using carbon
or
dataframe.write.format("carbon").saveAsTable("test_table")
There are few classes moved to this datasource package as part of refactoring 
and spark2 and spark-common packages now depends on spark-datasource package.

This closes #2647


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/347b8e1d
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/347b8e1d
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/347b8e1d

Branch: refs/heads/master
Commit: 347b8e1dbaef22fb1773e4771be8db8bad644a57
Parents: 1372450
Author: ravipesala 
Authored: Wed Aug 22 11:32:21 2018 +0530
Committer: kumarvishal09 
Committed: Fri Aug 24 18:31:05 2018 +0530

--
 .../core/datamap/DataMapStoreManager.java   |  20 +
 .../carbondata/core/datamap/DataMapUtil.java|   3 +
 .../core/metadata/AbsoluteTableIdentifier.java  |   4 +
 .../LatestFilesReadCommittedScope.java  | 115 +++---
 .../executor/impl/AbstractQueryExecutor.java|   2 +-
 .../apache/carbondata/core/util/CarbonUtil.java |  32 +-
 .../hadoop/api/CarbonFileInputFormat.java   |   5 +-
 .../hadoop/api/CarbonInputFormat.java   |  26 ++
 .../hadoop/api/CarbonTableInputFormat.java  |   5 +-
 ...FileInputFormatWithExternalCarbonTable.scala |   4 +-
 ...tCreateTableUsingSparkCarbonFileFormat.scala | 356 -
 .../DBLocationCarbonTableTestCase.scala |  18 +-
 .../iud/UpdateCarbonTableTestCase.scala |  16 +-
 integration/spark-common/pom.xml|   5 +
 .../spark/util/SparkDataTypeConverterImpl.java  | 219 --
 .../org/apache/carbondata/spark/util/Util.java  |  73 
 .../carbondata/spark/rdd/CarbonMergerRDD.scala  |   3 +-
 .../spark/rdd/CarbonScanPartitionRDD.scala  |   2 +-
 .../carbondata/spark/rdd/CarbonScanRDD.scala|   3 +-
 .../carbondata/spark/rdd/StreamHandoffRDD.scala |   3 +-
 .../carbondata/spark/util/CarbonScalaUtil.scala |  58 +--
 .../spark/util/CarbonMetastoreTypes.scala   | 104 -
 .../apache/spark/util/SparkTypeConverter.scala  | 137 ---
 integration/spark-datasource/pom.xml| 196 +
 .../converter/SparkDataTypeConverterImpl.java   | 175 
 .../vectorreader/CarbonDictionaryWrapper.java   |  44 ++
 .../vectorreader/ColumnarVectorWrapper.java | 272 +
 .../VectorizedCarbonRecordReader.java   | 333 
 .../execution/datasources/CarbonFileIndex.scala | 149 +++
 .../CarbonFileIndexReplaceRule.scala|  85 
 .../datasources/CarbonSparkDataSourceUtil.scala | 251 
 .../datasources/SparkCarbonFileFormat.scala | 398 +++
 .../readsupport/SparkUnsafeRowReadSuport.scala  |  44 ++
 .../spark/sql/util/CarbonMetastoreTypes.scala   | 104 +
 .../spark/sql/util/SparkTypeConverter.scala | 138 +++
 apache.spark.sql.sources.DataSourceRegister |  17 +
 .../datasource/SparkCarbonDataSourceTest.scala  | 302 ++
 ...tCreateTableUsingSparkCarbonFileFormat.scala | 326 +++
 .../sql/carbondata/datasource/TestUtil.scala| 134 +++
 .../vectorreader/CarbonDictionaryWrapper.java   |  44 --
 .../vectorreader/ColumnarVectorWrapper.java | 272 -
 .../VectorizedCarbonRecordReader.java   | 317 ---
 .../datamap/IndexDataMapRebuildRDD.scala|   2 +-
 .../carbondata/stream/StreamJobManager.scala|   4 +-
 .../spark/sql/CarbonDictionaryDecoder.scala |   2 +-
 .../spark/sql/SparkUnknownExpression.scala  |   5 +-
 .../management/CarbonLoadDataCommand.scala  |   3 +-
 .../stream/CarbonCreateStreamCommand.scala  |   4 +-
 .../datasources/SparkCarbonFileFormat.scala | 291 --
 .../datasources/SparkCarbonTableFormat.scala|   2 +-
 .../strategy/CarbonLateDecodeStrategy.scala |   3 +-
 .../apache/spark/sql/hive/CarbonRelation.scala  |   4 +-
 .../spark/sql/optimizer/CarbonFilters.scala |  27 +-
 .../sql/hive/CarbonInMemorySessionState.scala   |   8 +-
 apache.spark.sql.sources.DataSourceRegister |   3 +-
 .../register/TestRegisterCarbonTable.scala  |  22 +-
 pom.xml |   1 +
 .../sdk/file/CarbonWriterBuilder.java   |  10 +-
 58 files changed, 3272 insertions(+), 1933 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/core/src/main/java/org/apache/carbondata/core/datamap/Data

[2/4] carbondata git commit: [CARBONDATA-2872] Added Spark FileFormat interface implementation in Carbon

2018-08-24 Thread kumarvishal09
http://git-wip-us.apache.org/repos/asf/carbondata/blob/347b8e1d/integration/spark-datasource/src/main/scala/org/apache/spark/sql/util/SparkTypeConverter.scala
--
diff --git 
a/integration/spark-datasource/src/main/scala/org/apache/spark/sql/util/SparkTypeConverter.scala
 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/util/SparkTypeConverter.scala
new file mode 100644
index 000..facb4f1
--- /dev/null
+++ 
b/integration/spark-datasource/src/main/scala/org/apache/spark/sql/util/SparkTypeConverter.scala
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.util
+
+import java.util.Objects
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.types
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.core.metadata.datatype.{DataTypes => 
CarbonDataTypes}
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.metadata.schema.table.column.{CarbonColumn, 
CarbonDimension, ColumnSchema}
+
+private[spark] object SparkTypeConverter {
+
+  def createSparkSchema(table: CarbonTable, columns: Seq[String]): StructType 
= {
+Objects.requireNonNull(table)
+Objects.requireNonNull(columns)
+if (columns.isEmpty) {
+  throw new IllegalArgumentException("column list is empty")
+}
+val fields = new java.util.ArrayList[StructField](columns.size)
+val allColumns = table.getTableInfo.getFactTable.getListOfColumns.asScala
+
+// find the column and add it to fields array
+columns.foreach { column =>
+  val col = 
allColumns.find(_.getColumnName.equalsIgnoreCase(column)).getOrElse(
+throw new IllegalArgumentException(column + " does not exist")
+  )
+  fields.add(StructField(col.getColumnName, 
convertCarbonToSparkDataType(col, table)))
+}
+StructType(fields)
+  }
+
+  /**
+   * Converts from carbon datatype to corresponding spark datatype.
+   */
+  def convertCarbonToSparkDataType(
+  columnSchema: ColumnSchema,
+  table: CarbonTable): types.DataType = {
+if (CarbonDataTypes.isDecimal(columnSchema.getDataType)) {
+  val scale = columnSchema.getScale
+  val precision = columnSchema.getPrecision
+  if (scale == 0 && precision == 0) {
+DecimalType(18, 2)
+  } else {
+DecimalType(precision, scale)
+  }
+} else if (CarbonDataTypes.isArrayType(columnSchema.getDataType)) {
+  CarbonMetastoreTypes
+.toDataType(s"array<${ getArrayChildren(table, 
columnSchema.getColumnName) }>")
+} else if (CarbonDataTypes.isStructType(columnSchema.getDataType)) {
+  CarbonMetastoreTypes
+.toDataType(s"struct<${ getStructChildren(table, 
columnSchema.getColumnName) }>")
+} else {
+  columnSchema.getDataType match {
+case CarbonDataTypes.STRING => StringType
+case CarbonDataTypes.SHORT => ShortType
+case CarbonDataTypes.INT => IntegerType
+case CarbonDataTypes.LONG => LongType
+case CarbonDataTypes.DOUBLE => DoubleType
+case CarbonDataTypes.BOOLEAN => BooleanType
+case CarbonDataTypes.TIMESTAMP => TimestampType
+case CarbonDataTypes.DATE => DateType
+  }
+}
+  }
+
+  def getArrayChildren(table: CarbonTable, dimName: String): String = {
+table.getChildren(dimName).asScala.map(childDim => {
+  childDim.getDataType.getName.toLowerCase match {
+case "array" => s"array<${ getArrayChildren(table, 
childDim.getColName) }>"
+case "struct" => s"struct<${ getStructChildren(table, 
childDim.getColName) }>"
+case dType => addDecimalScaleAndPrecision(childDim, dType)
+  }
+}).mkString(",")
+  }
+
+  def getStructChildren(table: CarbonTable, dimName: String): String = {
+table.getChildren(dimName).asScala.map(childDim => {
+  childDim.getDataType.getName.toLowerCase match {
+case "array" => s"${
+  childDim.getColName.substring(dimName.length + 1)
+}:array<${ getArrayChildren(table, childDim.getColName) }>"
+case "struct" => s"${
+  childDim.getColName.substring(dimName.length + 1)
+

carbondata git commit: [HOTFIX]Fixed int overflow and comparison gone wrong during blocklet min/max

2018-08-09 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 7158d5203 -> 8affab843


[HOTFIX]Fixed int overflow and comparison gone wrong during blocklet min/max

Problem: During calculating min/max for blocklet, it needs to calculate from 
all the pages. During that comparison,
it is typecasting to int and overflows, so there is a chance the negative 
becomes positive and positive become negative.
That's why min max of long comes wrong for bigger values.
Solution: Don't typecast directly, instead check first the negative or positive 
and then return.


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/8affab84
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/8affab84
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/8affab84

Branch: refs/heads/master
Commit: 8affab8433bb4dea70fbb4ea9d3abc7eaf9fd7b2
Parents: 7158d52
Author: ravipesala 
Authored: Tue Aug 7 21:19:36 2018 +0530
Committer: kumarvishal09 
Committed: Thu Aug 9 15:01:12 2018 +0530

--
 .../core/util/CarbonMetadataUtil.java   | 16 +++-
 .../core/util/CarbonMetadataUtilTest.java   | 39 +---
 2 files changed, 39 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/8affab84/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java 
b/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java
index 8fc648b..70443d8 100644
--- a/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java
+++ b/core/src/main/java/org/apache/carbondata/core/util/CarbonMetadataUtil.java
@@ -368,7 +368,13 @@ public class CarbonMetadataUtil {
   secondBuffer.put(second);
   firstBuffer.flip();
   secondBuffer.flip();
-  return (int) (firstBuffer.getDouble() - secondBuffer.getDouble());
+  double compare = firstBuffer.getDouble() - secondBuffer.getDouble();
+  if (compare > 0) {
+compare = 1;
+  } else if (compare < 0) {
+compare = -1;
+  }
+  return (int) compare;
 } else if (dataType == DataTypes.LONG || dataType == DataTypes.INT
 || dataType == DataTypes.SHORT) {
   firstBuffer = ByteBuffer.allocate(8);
@@ -377,7 +383,13 @@ public class CarbonMetadataUtil {
   secondBuffer.put(second);
   firstBuffer.flip();
   secondBuffer.flip();
-  return (int) (firstBuffer.getLong() - secondBuffer.getLong());
+  long compare = firstBuffer.getLong() - secondBuffer.getLong();
+  if (compare > 0) {
+compare = 1;
+  } else if (compare < 0) {
+compare = -1;
+  }
+  return (int) compare;
 } else if (DataTypes.isDecimal(dataType)) {
   return 
DataTypeUtil.byteToBigDecimal(first).compareTo(DataTypeUtil.byteToBigDecimal(second));
 } else {

http://git-wip-us.apache.org/repos/asf/carbondata/blob/8affab84/core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java
--
diff --git 
a/core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java
 
b/core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java
index 2909dc4..14cd57a 100644
--- 
a/core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java
+++ 
b/core/src/test/java/org/apache/carbondata/core/util/CarbonMetadataUtilTest.java
@@ -17,40 +17,28 @@
 
 package org.apache.carbondata.core.util;
 
+import java.lang.reflect.Method;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.List;
 
-import org.apache.carbondata.core.datastore.block.SegmentProperties;
-import org.apache.carbondata.core.datastore.page.EncodedTablePage;
-import org.apache.carbondata.core.datastore.page.encoding.EncodedColumnPage;
-import org.apache.carbondata.core.datastore.page.key.TablePageKey;
-import 
org.apache.carbondata.core.datastore.page.statistics.PrimitivePageStatsCollector;
 import org.apache.carbondata.core.metadata.ValueEncoderMeta;
+import org.apache.carbondata.core.metadata.datatype.DataTypes;
 import org.apache.carbondata.core.metadata.index.BlockIndexInfo;
 import org.apache.carbondata.format.BlockIndex;
-import org.apache.carbondata.format.BlockletIndex;
 import org.apache.carbondata.format.BlockletInfo;
-import org.apache.carbondata.format.BlockletInfo3;
-import org.apache.carbondata.format.BlockletMinMaxIndex;
 import org.apache.carbondata.format.ColumnSchema;
 import org.apache.carbondata.format.DataChunk;
-import org.apache.carbondata.format.DataChunk2;
 import org.apache.carbondata.format.DataType;
 import org.apache.carbondata.

carbondata git commit: [CARBONDATA-2817]Thread Leak in Update and in No sort flow

2018-08-08 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 8f7b594a3 -> 7158d5203


[CARBONDATA-2817]Thread Leak in Update and in No sort flow

Issue :- After Update Command is finished , Loading threads are not getting 
stopped.

Root Cause :-

In Update flow DataLoadExecutor 's close method is not called so all Executors 
services are not closed.
In Exceptions are not handled property in AFDW class's closeExecutorService() 
which is cuasing Thread leak if Job is killed from SparkUI..
Solution :-

Add Task Completion Listener and call close method of DataLoadExecutor to it .
Handle Exception in closeExecutor Service so that all Writer steps Threads can 
be closed.

This closes #2606


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/7158d520
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/7158d520
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/7158d520

Branch: refs/heads/master
Commit: 7158d5203d84feaef23a5bb17a90b67c79ba52d0
Parents: 8f7b594
Author: BJangir 
Authored: Thu Aug 2 21:51:07 2018 +0530
Committer: kumarvishal09 
Committed: Wed Aug 8 17:42:04 2018 +0530

--
 .../core/util/BlockletDataMapUtil.java  |  4 +-
 .../carbondata/spark/rdd/UpdateDataLoad.scala   |  9 +++-
 .../CarbonRowDataWriterProcessorStepImpl.java   | 52 +---
 .../steps/DataWriterBatchProcessorStepImpl.java | 25 --
 .../store/writer/AbstractFactDataWriter.java| 16 --
 .../writer/v3/CarbonFactDataWriterImplV3.java   | 19 +--
 6 files changed, 103 insertions(+), 22 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/7158d520/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java 
b/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java
index 68ce1fb..404b426 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/util/BlockletDataMapUtil.java
@@ -115,7 +115,7 @@ public class BlockletDataMapUtil {
 CarbonTable.updateTableByTableInfo(carbonTable, 
carbonTable.getTableInfo());
   }
   String blockPath = 
footer.getBlockInfo().getTableBlockInfo().getFilePath();
-  if (null != fileNameToMetaInfoMapping && null == 
blockMetaInfoMap.get(blockPath)) {
+  if (null == blockMetaInfoMap.get(blockPath)) {
 BlockMetaInfo blockMetaInfo = 
createBlockMetaInfo(fileNameToMetaInfoMapping, blockPath);
 // if blockMetaInfo is null that means the file has been deleted from 
the file system.
 // This can happen in case IUD scenarios where after deleting or 
updating the data the
@@ -123,8 +123,6 @@ public class BlockletDataMapUtil {
 if (null != blockMetaInfo) {
   blockMetaInfoMap.put(blockPath, blockMetaInfo);
 }
-  } else {
-blockMetaInfoMap.put(blockPath, new BlockMetaInfo(new String[] {},0));
   }
 }
 return blockMetaInfoMap;

http://git-wip-us.apache.org/repos/asf/carbondata/blob/7158d520/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala
--
diff --git 
a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala
 
b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala
index 2e7c307..f4fdbc1 100644
--- 
a/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala
+++ 
b/integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/UpdateDataLoad.scala
@@ -25,8 +25,10 @@ import org.apache.spark.sql.Row
 import org.apache.carbondata.common.CarbonIterator
 import org.apache.carbondata.common.logging.LogServiceFactory
 import org.apache.carbondata.core.statusmanager.{LoadMetadataDetails, 
SegmentStatus}
+import org.apache.carbondata.core.util.ThreadLocalTaskInfo
 import org.apache.carbondata.processing.loading.{DataLoadExecutor, 
TableProcessingOperations}
 import org.apache.carbondata.processing.loading.model.CarbonLoadModel
+import org.apache.carbondata.spark.util.CommonUtil
 
 /**
  * Data load in case of update command .
@@ -54,7 +56,12 @@ object UpdateDataLoad {
   loader.initialize()
 
   loadMetadataDetails.setSegmentStatus(SegmentStatus.SUCCESS)
-  new DataLoadExecutor().execute(carbonLoadModel,
+  val executor = new DataLoadExecutor
+  TaskContext.get().addTaskCompletionListener { context =>
+executor.close()
+
CommonUtil.clearUnsafeMemory(ThreadLocalTaskInfo.getCarbonTask

carbondata git commit: [CARBONDATA-2775] Adaptive encoding fails for Unsafe OnHeap. if, target datatype is SHORT_INT

2018-07-29 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master 8d3e8b82c -> 4d95dfcff


[CARBONDATA-2775] Adaptive encoding fails for Unsafe OnHeap. if, target 
datatype is SHORT_INT

problem:
[CARBONDATA-2775] Adaptive encoding fails for Unsafe OnHeap if, target data 
type is SHORT_INT

solution: If ENABLE_OFFHEAP_SORT = false, in carbon property. 
UnsafeFixLengthColumnPage.java will use different compress logic. Not the raw 
compression. In that case, for SHORT_INT data type , conversion need to handle.

This closes #2546


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/4d95dfcf
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/4d95dfcf
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/4d95dfcf

Branch: refs/heads/master
Commit: 4d95dfcff2895ce0aed8ba6f75ce9946ae5172af
Parents: 8d3e8b8
Author: ajantha-bhat 
Authored: Tue Jul 24 12:33:47 2018 +0530
Committer: kumarvishal09 
Committed: Sun Jul 29 11:52:30 2018 +0530

--
 .../page/UnsafeFixLengthColumnPage.java |  2 +
 ...UnsafeHeapColumnPageForComplexDataType.scala | 61 
 2 files changed, 63 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/4d95dfcf/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
index bcb74c0..f75deb6 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java
@@ -495,6 +495,8 @@ public class UnsafeFixLengthColumnPage extends ColumnPage {
   return totalLength / ByteUtil.SIZEOF_BYTE;
 } else if (dataType == DataTypes.SHORT) {
   return totalLength / ByteUtil.SIZEOF_SHORT;
+} else if (dataType == DataTypes.SHORT_INT) {
+  return totalLength / ByteUtil.SIZEOF_SHORT_INT;
 } else if (dataType == DataTypes.INT) {
   return totalLength / ByteUtil.SIZEOF_INT;
 } else if (dataType == DataTypes.LONG) {

http://git-wip-us.apache.org/repos/asf/carbondata/blob/4d95dfcf/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestAdaptiveEncodingUnsafeHeapColumnPageForComplexDataType.scala
--
diff --git 
a/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestAdaptiveEncodingUnsafeHeapColumnPageForComplexDataType.scala
 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestAdaptiveEncodingUnsafeHeapColumnPageForComplexDataType.scala
new file mode 100644
index 000..acf75c1
--- /dev/null
+++ 
b/integration/spark-common-test/src/test/scala/org/apache/carbondata/integration/spark/testsuite/complexType/TestAdaptiveEncodingUnsafeHeapColumnPageForComplexDataType.scala
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.integration.spark.testsuite.complexType
+
+import java.io.File
+
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+/**
+ * Test class of Adaptive Encoding UnSafe Column Page with Complex Data type
+ *
+ */
+
+class TestAdaptiveEncodingUnsafeHeapColumnPageForComplexDataType
+  extends QueryTest with BeforeAndAfterAll with TestAdaptiveComplexType {
+
+  override def beforeAll(): Unit = {
+
+new File(CarbonProperties.getInstance().getSystemFolderLocation).delete()
+sql("DROP TABLE IF EXISTS adaptive")
+CarbonProperties.

carbondata git commit: [CARBONDATA-2753][Compatibility] Row count of page is calculated wrong for old store(V2 store)

2018-07-29 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master c79fc90d5 -> 8d3e8b82c


[CARBONDATA-2753][Compatibility] Row count of page is calculated wrong for old 
store(V2 store)

Row count of page is calculated wrong for V2 store.


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/8d3e8b82
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/8d3e8b82
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/8d3e8b82

Branch: refs/heads/master
Commit: 8d3e8b82cbb0d75c66219119c281ed910ac185e6
Parents: c79fc90
Author: dhatchayani 
Authored: Wed Jul 25 14:41:58 2018 +0530
Committer: kumarvishal09 
Committed: Sun Jul 29 11:47:25 2018 +0530

--
 .../blockletindex/BlockletDataRefNode.java| 18 +-
 .../scan/scanner/impl/BlockletFullScanner.java|  9 +
 2 files changed, 14 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/8d3e8b82/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
 
b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
index a11ae8d..5681528 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/indexstore/blockletindex/BlockletDataRefNode.java
@@ -61,18 +61,26 @@ public class BlockletDataRefNode implements DataRefNode {
   int numberOfPagesCompletelyFilled = detailInfo.getRowCount();
   // no. of rows to a page is 12 in V2 and 32000 in V3, same is 
handled to get the number
   // of pages filled
-  if (blockInfo.getVersion() == ColumnarFormatVersion.V2) {
+  int lastPageRowCount;
+  int fullyFilledRowsCount;
+  if (blockInfo.getVersion() == ColumnarFormatVersion.V2
+  || blockInfo.getVersion() == ColumnarFormatVersion.V1) {
 numberOfPagesCompletelyFilled /=
 
CarbonVersionConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT_V2;
+lastPageRowCount = detailInfo.getRowCount()
+% 
CarbonVersionConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT_V2;
+fullyFilledRowsCount =
+
CarbonVersionConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT_V2;
   } else {
 numberOfPagesCompletelyFilled /=
 
CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
+lastPageRowCount = detailInfo.getRowCount()
+% 
CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
+fullyFilledRowsCount =
+
CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
   }
-  int lastPageRowCount = detailInfo.getRowCount()
-  % 
CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
   for (int i = 0; i < numberOfPagesCompletelyFilled; i++) {
-pageRowCount[i] =
-
CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
+pageRowCount[i] = fullyFilledRowsCount;
   }
   if (lastPageRowCount > 0) {
 pageRowCount[pageRowCount.length - 1] = lastPageRowCount;

http://git-wip-us.apache.org/repos/asf/carbondata/blob/8d3e8b82/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java
 
b/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java
index c3d4df8..f61a8b1 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/scan/scanner/impl/BlockletFullScanner.java
@@ -19,7 +19,6 @@ package org.apache.carbondata.core.scan.scanner.impl;
 import java.io.IOException;
 
 import org.apache.carbondata.core.constants.CarbonCommonConstants;
-import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants;
 import org.apache.carbondata.core.datastore.DataRefNode;
 import org.apache.carbondata.core.datastore.chunk.DimensionColumnPage;
 import org.apache.carbondata.core.datastore.chunk.impl.DimensionRawColumnChunk;
@@ -123,13 +122,7 @@ public class BlockletFullScanner implements 
BlockletScanner {
 if (numberOfRows == null) {
   numberOfRows = new 
int[rawBlockletColumnChunks.getDataBlock().numberOfPages()];
   for (int i = 0; i < numberOfRows.length; i++) {
-nu

carbondata git commit: [CARBONDATA-2772] Size based dictionary fallback is failing even threshold is not reached.

2018-07-26 Thread kumarvishal09
Repository: carbondata
Updated Branches:
  refs/heads/master f8fa29e64 -> 005db3fa3


[CARBONDATA-2772] Size based dictionary fallback is failing even threshold is 
not reached.

Issue:- Size Based Fallback happened even threshold is not reached.
RootCause:- Current size calculation is wrong. it is calculated for each data. 
instead of generated dictionary data .

Solution :- Current size should be calculated only for generated dictionary 
data.

This closes #2542


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/005db3fa
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/005db3fa
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/005db3fa

Branch: refs/heads/master
Commit: 005db3fa359808d7988b94307a25a49010a42ca6
Parents: f8fa29e
Author: BJangir 
Authored: Mon Jul 23 22:14:12 2018 +0530
Committer: kumarvishal09 
Committed: Thu Jul 26 14:09:51 2018 +0530

--
 .../MapBasedDictionaryStore.java| 20 ++--
 .../ColumnLocalDictionaryGenerator.java |  8 
 2 files changed, 14 insertions(+), 14 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/carbondata/blob/005db3fa/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java
 
b/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java
index 05ca002..7b8617a 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/localdictionary/dictionaryholder/MapBasedDictionaryStore.java
@@ -55,6 +55,11 @@ public class MapBasedDictionaryStore implements 
DictionaryStore {
*/
   private boolean isThresholdReached;
 
+  /**
+   * current datasize
+   */
+  private long currentSize;
+
   public MapBasedDictionaryStore(int dictionaryThreshold) {
 this.dictionaryThreshold = dictionaryThreshold;
 this.dictionary = new ConcurrentHashMap<>();
@@ -86,11 +91,9 @@ public class MapBasedDictionaryStore implements 
DictionaryStore {
 if (null == value) {
   // increment the value
   value = ++lastAssignValue;
+  currentSize += data.length;
   // if new value is greater than threshold
-  if (value > dictionaryThreshold) {
-// clear the dictionary
-dictionary.clear();
-referenceDictionaryArray = null;
+  if (value > dictionaryThreshold || currentSize >= Integer.MAX_VALUE) 
{
 // set the threshold boolean to true
 isThresholdReached = true;
 // throw exception
@@ -108,8 +111,13 @@ public class MapBasedDictionaryStore implements 
DictionaryStore {
 
   private void checkIfThresholdReached() throws 
DictionaryThresholdReachedException {
 if (isThresholdReached) {
-  throw new DictionaryThresholdReachedException(
-  "Unable to generate dictionary value. Dictionary threshold reached");
+  if (currentSize >= Integer.MAX_VALUE) {
+throw new DictionaryThresholdReachedException(
+"Unable to generate dictionary. Dictionary Size crossed 2GB 
limit");
+  } else {
+throw new DictionaryThresholdReachedException(
+"Unable to generate dictionary value. Dictionary threshold 
reached");
+  }
 }
   }
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/005db3fa/core/src/main/java/org/apache/carbondata/core/localdictionary/generator/ColumnLocalDictionaryGenerator.java
--
diff --git 
a/core/src/main/java/org/apache/carbondata/core/localdictionary/generator/ColumnLocalDictionaryGenerator.java
 
b/core/src/main/java/org/apache/carbondata/core/localdictionary/generator/ColumnLocalDictionaryGenerator.java
index b0c7275..c55a289 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/localdictionary/generator/ColumnLocalDictionaryGenerator.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/localdictionary/generator/ColumnLocalDictionaryGenerator.java
@@ -33,8 +33,6 @@ public class ColumnLocalDictionaryGenerator implements 
LocalDictionaryGenerator
*/
   private DictionaryStore dictionaryHolder;
 
-  private long currentSize;
-
   public ColumnLocalDictionaryGenerator(int threshold, int lvLength) {
 // adding 1 to threshold for null value
 int newThreshold = threshold + 1;
@@ -54,7 +52,6 @@ public class ColumnLocalDictionaryGenerator implements 
LocalDictionaryGenerator

  1   2   >