date:20180619

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sandeep-katta

Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196655625
  
--- Diff: store/search/src/main/scala/org/apache/spark/rpc/Master.scala ---
@@ -81,7 +81,7 @@ class Master(sparkConf: SparkConf) {
   do {
 try {
   LOG.info(s"starting registry-service on $hostAddress:$port")
-  val config = RpcEnvConfig(
+  val config = RpcUtil.getRpcEnvConfig(
--- End diff --

After analyzing the #2372 these changes are not required,so reverted


---

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sandeep-katta

Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196655227
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala
 ---
@@ -247,6 +252,32 @@ object CarbonReflectionUtils {
 isFormatted
   }
 
+
+  def getRowDataSourceScanExecObj(relation: LogicalRelation,
--- End diff --

fixed


---

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sandeep-katta

Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196655176
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala
 ---
@@ -247,6 +252,32 @@ object CarbonReflectionUtils {
 isFormatted
   }
 
+
--- End diff --

Fixed


---

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sandeep-katta

Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196655245
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/DDLStrategy.scala
 ---
@@ -38,9 +39,17 @@ import org.apache.carbondata.common.logging.{LogService, 
LogServiceFactory}
 import org.apache.carbondata.core.features.TableOperation
 import org.apache.carbondata.core.util.CarbonProperties
 
-/**
- * Carbon strategies for ddl commands
- */
+  /** Carbon strategies for ddl commands
--- End diff --

fixed


---

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sandeep-katta

Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196655128
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/hive/CarbonPreAggregateRules.scala
 ---
@@ -1787,20 +1839,23 @@ case class 
CarbonPreAggregateDataLoadingRules(sparkSession: SparkSession)
   // named expression list otherwise update the list and add 
it to set
   if 
(!validExpressionsMap.contains(AggExpToColumnMappingModel(sumExp))) {
 namedExpressionList +=
-Alias(expressions.head, name + "_ 
sum")(NamedExpression.newExprId,
+CarbonCompilerUtil.createAliasRef(expressions.head,
+  name + "_ sum",
+  NamedExpression.newExprId,
   alias.qualifier,
   Some(alias.metadata),
-  alias.isGenerated)
+  Some(alias))
 validExpressionsMap += AggExpToColumnMappingModel(sumExp)
   }
   // check with same expression already count is present then 
do not add to
   // named expression list otherwise update the list and add 
it to set
   if 
(!validExpressionsMap.contains(AggExpToColumnMappingModel(countExp))) {
 namedExpressionList +=
-Alias(expressions.last, name + "_ 
count")(NamedExpression.newExprId,
-  alias.qualifier,
-  Some(alias.metadata),
-  alias.isGenerated)
+  CarbonCompilerUtil.createAliasRef(expressions.last, name 
+ "_ count",
--- End diff --

Fixed,Changed the name from CarbonCompilerUtil to CarbonToSparkAdapater


---

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sandeep-katta

Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196655020
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/sql/commands/StoredAsCarbondataSuite.scala
 ---
@@ -87,7 +87,7 @@ class StoredAsCarbondataSuite extends QueryTest with 
BeforeAndAfterEach {
   sql("CREATE TABLE carbon_table(key INT, value STRING) STORED AS  ")
 } catch {
   case e: Exception =>
-assert(e.getMessage.contains("no viable alternative at input"))
+assert(true)
--- End diff --

Fixed,added or condition with message as per spark 2.3.0


---

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sandeep-katta

Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196654906
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala
 ---
@@ -140,6 +142,13 @@ object CarbonReflectionUtils {
 relation,
 expectedOutputAttributes,
 catalogTable)._1.asInstanceOf[LogicalRelation]
+} else if (SPARK_VERSION.startsWith("2.3")) {
--- End diff --

Fixed,added the Utility method for spark version comparison in 
SparkUtil.scala


---

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sandeep-katta

Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196654926
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/strategy/CarbonLateDecodeStrategy.scala
 ---
@@ -355,18 +362,19 @@ private[sql] class CarbonLateDecodeStrategy extends 
SparkStrategy {
   }
 
   private def getDataSourceScan(relation: LogicalRelation,
-  output: Seq[Attribute],
-  partitions: Seq[PartitionSpec],
-  scanBuilder: (Seq[Attribute], Seq[Expression], Seq[Filter],
-ArrayBuffer[AttributeReference], Seq[PartitionSpec]) => 
RDD[InternalRow],
-  candidatePredicates: Seq[Expression],
-  pushedFilters: Seq[Filter],
-  metadata: Map[String, String],
-  needDecoder: ArrayBuffer[AttributeReference],
-  updateRequestedColumns: Seq[Attribute]): DataSourceScanExec = {
+output: Seq[Attribute],
--- End diff --

fixed


---

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sandeep-katta

Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196654954
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/bigdecimal/TestBigDecimal.scala
 ---
@@ -149,8 +149,9 @@ class TestBigDecimal extends QueryTest with 
BeforeAndAfterAll {
   }
 
   test("test sum*10 aggregation on big decimal column with high 
precision") {
-checkAnswer(sql("select sum(salary)*10 from carbonBigDecimal_2"),
-  sql("select sum(salary)*10 from hiveBigDecimal"))
+val carbonSeq = sql("select sum(salary)*10 from 
carbonBigDecimal_2").collect
--- End diff --

fixed


---

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sandeep-katta

Github user sandeep-katta commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196654884
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/util/CarbonReflectionUtils.scala
 ---
@@ -65,7 +66,7 @@ object CarbonReflectionUtils {
 className,
 tableIdentifier,
 tableAlias)._1.asInstanceOf[UnresolvedRelation]
-} else if (SPARK_VERSION.startsWith("2.2")) {
+} else if (SPARK_VERSION.startsWith("2.2") || 
SPARK_VERSION.startsWith("2.3")) {
--- End diff --

Fixed,added the Utility method for spark version comparison in 
SparkUtil.scala


---

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread kumarvishal09

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196654276
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java
 ---
@@ -71,7 +72,7 @@ public ColumnPageDecoder createDecoder(List 
encodings, List

[GitHub] carbondata issue #2380: [CARBONDATA-2509][CARBONDATA-2510][CARBONDATA-2511][...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2380
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5231/



---

[GitHub] carbondata issue #2379: [CARBONDATA-2420][32K] Support string longer than 32...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2379
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5342/



---

[GitHub] carbondata pull request #2366: [CARBONDATA-2532][Integration] Carbon to supp...

2018-06-19 Thread sujith71955

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2366#discussion_r196650804
  
--- Diff: integration/spark-common/pom.xml ---
@@ -65,6 +65,11 @@
   scalatest_${scala.binary.version}
   provided
 
+  
+  org.apache.zookeeper
--- End diff --

Not intentional change i guess :)


---

[GitHub] carbondata issue #2382: [CARBONDATA-2513][32K] Support write long string fro...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2382
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6397/



---

[GitHub] carbondata issue #2379: [CARBONDATA-2420][32K] Support string longer than 32...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2379
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6396/



---

[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2384
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5229/



---

[GitHub] carbondata issue #2379: [CARBONDATA-2420][32K] Support string longer than 32...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2379
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5230/



---

[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2384
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5341/



---

[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2384
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6395/



---

[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2374
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5340/



---

[GitHub] carbondata issue #2379: [CARBONDATA-2420][32K] Support string longer than 32...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2379
  
Rebased with the latest master branch.

The second commit is to fix the review comments.


---

[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2374
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5228/



---

[jira] [Assigned] (CARBONDATA-2608) SDK Support JSON data loading directly without AVRO conversion

2018-06-19 Thread Ajantha Bhat (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat reassigned CARBONDATA-2608:


Assignee: Ajantha Bhat

> SDK Support JSON data loading directly without AVRO conversion
> --
>
> Key: CARBONDATA-2608
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2608
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: sounak chakraborty
>Assignee: Ajantha Bhat
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Support JSON data loading directly into Carbon table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196637400
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/statistics/LVLongStringStatsCollector.java
 ---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.core.datastore.page.statistics;
+
+/**
+ * This class is for the columns with varchar data type,
+ * a string type which can hold more than 32000 characters
+ */
+public class LVLongStringStatsCollector extends LVStringStatsCollector {
+
+  public static LVLongStringStatsCollector newInstance() {
+return new LVLongStringStatsCollector();
+  }
+
+  private LVLongStringStatsCollector() {
+
+  }
+
+  @Override
+  protected byte[] getActualValue(byte[] value) {
+byte[] actualValue;
+assert (value.length >= 4);
+if (value.length == 4) {
+  assert (value[0] == 0 && value[1] == 0);
+  actualValue = new byte[0];
+} else {
+  // todo: what does this mean?
+  // int length = (value[0] << 8) + (value[1] & 0xff);
--- End diff --

yeah, I find a more readable way to fix it.


---

[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2374
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6394/



---

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196636059
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java
 ---
@@ -289,6 +289,12 @@ public void putDouble(int rowId, double value) {
 
   @Override
   public void putBytes(int rowId, byte[] bytes) {
+// rowId * 4 represents the length of L in LV
+if (bytes.length > (Integer.MAX_VALUE - totalLength - rowId * 4)) {
--- End diff --

I come across a new idea:
During parsing/converting, we can calculate #numberOfRowsPerPage * 
#currentCharacterLength, if it is larger than 2GB, dataload will fail.
Notice that the #numberOfRowsPerPage is specified by user through 
configurations.

If this is OK, I'll implementation in the future not this PR.


---

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196635615
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 ---
@@ -1601,6 +1602,8 @@
   // As Short data type is used for storing the length of a column during 
data processing hence
   // the maximum characters that can be supported should be less than 
Short max value
   public static final int MAX_CHARS_PER_COLUMN_DEFAULT = 32000;
+  // todo: use infinity first, will switch later
+  public static final int MAX_CHARS_PER_COLUMN_INFINITY = -1;
--- End diff --

As I mentioned in another PR, better not to introduce this limits.
-1 means that the parser can parse infinity characters.


---

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196635418
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala
 ---
@@ -279,7 +279,7 @@ abstract class CarbonDDLSqlParser extends 
AbstractCarbonSparkSQLParser {
 fields.zipWithIndex.foreach { case (field, index) =>
   field.schemaOrdinal = index
 }
-val (dims, msrs, noDictionaryDims, sortKeyDims) = 
extractDimAndMsrFields(
+val (dims, msrs, noDictionaryDims, sortKeyDims, varcharColumns) = 
extractDimAndMsrFields(
--- End diff --

Just like the other results of `extractDimAndMsrFields`, we validate and 
get the sort_column, dictionaries and the varcharColumns(longStringColumns).

For the varcharColumns, we change their datatype from string to varchar 
later.


---

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196634986
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/blocklet/BlockletInfo.java
 ---
@@ -268,7 +268,7 @@ private DataChunk deserializeDataChunk(byte[] bytes) 
throws IOException {
   @Override public void readFields(DataInput input) throws IOException {
 dimensionOffset = input.readLong();
 measureOffsets = input.readLong();
-short dimensionChunkOffsetsSize = input.readShort();
+int dimensionChunkOffsetsSize = input.readInt();
--- End diff --

OK~


---

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196634976
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/metadata/blocklet/BlockletInfo.java
 ---
@@ -205,7 +205,7 @@ public void setNumberOfPages(int numberOfPages) {
 output.writeLong(dimensionOffset);
 output.writeLong(measureOffsets);
 int dsize = dimensionChunkOffsets != null ? 
dimensionChunkOffsets.size() : 0;
-output.writeShort(dsize);
+output.writeInt(dsize);
--- End diff --

OK~


---

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196634573
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/compress/DirectCompressCodec.java
 ---
@@ -64,7 +64,7 @@ public ColumnPageDecoder 
createDecoder(ColumnPageEncoderMeta meta) {
 return new DirectDecompressor(meta);
   }
 
-  private static class DirectCompressor extends ColumnPageEncoder {
--- End diff --

Yeah, it is required because in the method `getEncodingList`, we want to 
use member `datatype` from the outside class. If it is static inner class, we 
cannot access that member.


---

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196634217
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/EncodingFactory.java
 ---
@@ -71,7 +72,7 @@ public ColumnPageDecoder createDecoder(List 
encodings, List

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196633906
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastore/page/encoding/DefaultEncodingFactory.java
 ---
@@ -103,6 +103,7 @@ private ColumnPageEncoder 
createEncoderForDimensionLegacy(TableSpec.DimensionSpe
 return new HighCardDictDimensionIndexCodec(
 dimensionSpec.isInSortColumns(),
--- End diff --

emm, better not do this in this PR. All the parameters for *IndexCodec 
looks alike. Changing all of them will introduce unrelated changes.


---

[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2384
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5339/



---

[GitHub] carbondata pull request #2372: [CARBONDATA-2609] Change RPC implementation t...

2018-06-19 Thread xubo245

Github user xubo245 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2372#discussion_r196633489
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonRecordReader.java ---
@@ -80,7 +80,7 @@ public void initialize(InputSplit inputSplit, 
TaskAttemptContext context)
 }
 // It should use the exists tableBlockInfos if tableBlockInfos of 
queryModel is not empty
 // otherwise the prune is no use before this method
-if (!queryModel.isFG()) {
+if (queryModel.getTableBlockInfos().isEmpty()) {
--- End diff --

If use  (queryModel.getTableBlockInfos().isEmpty()),  when the prune result 
is empty by FG in search mode, it will use the original TableBlockInfos and 
execute again, it mean the FG no use in this scenario. So we cannot change it 
to queryModel.getTableBlockInfos().isEmpty()


---

[GitHub] carbondata pull request #2379: [CARBONDATA-2420][32K] Support string longer ...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2379#discussion_r196633177
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/scan/filter/executer/RawColumnChunkUtil.java
 ---
@@ -0,0 +1,65 @@
+/*
--- End diff --

OK


---

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2383
  
@kumarvishal09 If the string is too long, the user have to adjust the page 
size manually. We cannot do it dynamic for now.


---

[GitHub] carbondata pull request #2383: [CARBONDATA-2615][32K] Support page size less...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2383#discussion_r196631555
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java
 ---
@@ -371,8 +371,13 @@ private void setWritingConfiguration() throws 
CarbonDataWriterException {
 this.pageSize = Integer.parseInt(CarbonProperties.getInstance()
 .getProperty(CarbonCommonConstants.BLOCKLET_SIZE,
 CarbonCommonConstants.BLOCKLET_SIZE_DEFAULT_VAL));
+// support less than 32000 rows in one page, because we support super 
long string,
+// if it is long enough, a clomun page with 32000 rows will exceed 2GB
 if (version == ColumnarFormatVersion.V3) {
-  this.pageSize = 
CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
+  this.pageSize =
--- End diff --

In V3, it is 32000 by default. Here we use the min(32000, user_specified)


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196627999
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
 ---
@@ -403,6 +403,17 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser 
{
   partition = partitionSpec)
 }
 
+  /**
+   * The syntax of
+   * ALTER TABLE [dbName.]tableName ADD SEGMENT LOCATION 'path/to/data'
+   */
+  protected lazy val addSegment: Parser[LogicalPlan] =
+ALTER ~> TABLE ~> (ident <~ ".").? ~ ident ~
+ADD ~ SEGMENT ~ LOCATION ~ stringLit <~ opt(";") ^^ {
+  case dbName ~ tableName ~ add ~ segment ~ location ~ filePath =>
--- End diff --

OK


---

[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2384
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5338/



---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196627213
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddSegmentCommand.scala
 ---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.management
+
+import java.util.UUID
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
+import org.apache.spark.sql.execution.command.AtomicRunnableCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.util.FileUtils
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.datamap.status.DataMapStatusManager
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil
+import org.apache.carbondata.core.statusmanager.{FileFormat, 
LoadMetadataDetails, SegmentStatus, SegmentStatusManager}
+import org.apache.carbondata.core.util.CarbonUtil
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.events.{OperationContext, 
OperationListenerBus}
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadMetadataEvent
+import 
org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+import org.apache.carbondata.processing.util.CarbonLoaderUtil
+
+/**
+ * support `alter table tableName add segment location 'path'` command.
+ * It will create a segment and map the path of datafile to segment's 
storage
+ */
+case class CarbonAddSegmentCommand(
+dbNameOp: Option[String],
+tableName: String,
+filePathFromUser: String,
+var operationContext: OperationContext = new OperationContext) extends 
AtomicRunnableCommand {
+  private val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getName)
+  var carbonTable: CarbonTable = _
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+val dbName = CarbonEnv.getDatabaseName(dbNameOp)(sparkSession)
+carbonTable = {
+  val relation = CarbonEnv.getInstance(sparkSession).carbonMetastore
+.lookupRelation(Option(dbName), 
tableName)(sparkSession).asInstanceOf[CarbonRelation]
+  if (relation == null) {
+LOGGER.error(s"Add segment failed due to table $dbName.$tableName 
not found")
+throw new NoSuchTableException(dbName, tableName)
+  }
+  relation.carbonTable
+}
+
+if (carbonTable.isHivePartitionTable) {
+  LOGGER.error("Ignore hive partition table for now")
+}
+
+operationContext.setProperty("isOverwrite", false)
+if (CarbonUtil.hasAggregationDataMap(carbonTable)) {
+  val loadMetadataEvent = new LoadMetadataEvent(carbonTable, false)
+  OperationListenerBus.getInstance().fireEvent(loadMetadataEvent, 
operationContext)
+}
+Seq.empty
+  }
+
+  // will just mapping external files to segment metadata
+  override def processData(sparkSession: SparkSession): Seq[Row] = {
--- End diff --

In my opinion, creating the segment and updating the tablestatus both 
belong to `processData`. And in other command such as LoadData, these operation 
are in `processData` too.


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196626592
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala
 ---
@@ -700,6 +700,13 @@ class TableNewProcessor(cm: TableModel) {
   cm.tableName))
 tableInfo.setLastUpdatedTime(System.currentTimeMillis())
 tableInfo.setFactTable(tableSchema)
+val format = cm.tableProperties.get(CarbonCommonConstants.FORMAT)
--- End diff --

OK


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196626156
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
 ---
@@ -426,6 +439,22 @@ class CarbonScanRDD[T: ClassTag](
 
CarbonTimeStatisticsFactory.createExecutorRecorder(model.getQueryId))
   streamReader.setQueryModel(model)
   streamReader
+case FileFormat.EXTERNAL =>
+  assert(storageFormat.equals("csv"),
--- End diff --

OK~


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196625677
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -515,12 +574,73 @@ private CarbonInputSplit 
convertToCarbonInputSplit(ExtendedBlocklet blocklet) th
 return split;
   }
 
+  private List 
convertToInputSplit4ExternalFormat(JobContext jobContext,
+  ExtendedBlocklet extendedBlocklet) throws IOException {
+List splits = new ArrayList();
+String factFilePath = extendedBlocklet.getFilePath();
+Path path = new Path(factFilePath);
+FileSystem fs = FileFactory.getFileSystem(path);
+FileStatus fileStatus = fs.getFileStatus(path);
+long length = fileStatus.getLen();
+if (length != 0) {
+  BlockLocation[] blkLocations = fs.getFileBlockLocations(path, 0, 
length);
+  long blkSize = fileStatus.getBlockSize();
+  long minSplitSize = Math.max(getFormatMinSplitSize(), 
getMinSplitSize(jobContext));
+  long maxSplitSize = getMaxSplitSize(jobContext);
+  long splitSize = computeSplitSize(blkSize, minSplitSize, 
maxSplitSize);
+  long bytesRemaining = fileStatus.getLen();
+  while (((double) bytesRemaining) / splitSize > 1.1) {
+int blkIndex = getBlockIndex(blkLocations, length - 
bytesRemaining);
+splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), 
path,
+length - bytesRemaining,
+splitSize, blkLocations[blkIndex].getHosts(),
+blkLocations[blkIndex].getCachedHosts(), FileFormat.EXTERNAL));
+bytesRemaining -= splitSize;
+  }
+  if (bytesRemaining != 0) {
+int blkIndex = getBlockIndex(blkLocations, length - 
bytesRemaining);
+splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), 
path,
+length - bytesRemaining,
+bytesRemaining, blkLocations[blkIndex].getHosts(),
+blkLocations[blkIndex].getCachedHosts(), FileFormat.EXTERNAL));
+  }
+} else {
+  splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), 
path, 0, length,
+  new String[0], FileFormat.EXTERNAL));
+}
+return splits;
+  }
+
   @Override public RecordReader createRecordReader(InputSplit 
inputSplit,
   TaskAttemptContext taskAttemptContext) throws IOException, 
InterruptedException {
 Configuration configuration = taskAttemptContext.getConfiguration();
 QueryModel queryModel = createQueryModel(inputSplit, 
taskAttemptContext);
 CarbonReadSupport readSupport = getReadSupportClass(configuration);
-return new CarbonRecordReader(queryModel, readSupport);
+if (inputSplit instanceof CarbonMultiBlockSplit
+&& ((CarbonMultiBlockSplit) inputSplit).getFileFormat() == 
FileFormat.EXTERNAL) {
+  return createRecordReaderForExternalFormat(queryModel, readSupport,
+  
configuration.get(CarbonCommonConstants.CARBON_EXTERNAL_FORMAT_CONF_KEY));
+} else if (inputSplit instanceof CarbonInputSplit
+&& ((CarbonInputSplit) inputSplit).getFileFormat() == 
FileFormat.EXTERNAL) {
+  return createRecordReaderForExternalFormat(queryModel, readSupport,
+  
configuration.get(CarbonCommonConstants.CARBON_EXTERNAL_FORMAT_CONF_KEY));
+} else {
+  return new CarbonRecordReader(queryModel, readSupport);
+}
+  }
+
+  @Since("1.4.1")
--- End diff --

OK


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196625604
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java
 ---
@@ -174,9 +174,15 @@ public CarbonTable 
getOrCreateCarbonTable(Configuration configuration) throws IO
 List result = new LinkedList();
 
 // for each segment fetch blocks matching filter in Driver BTree
-List dataBlocksOfSegment =
-getDataBlocksOfSegment(job, carbonTable, filterResolver, 
matchedPartitions,
-validSegments, partitionInfo, oldPartitionIdList);
+List dataBlocksOfSegment;
+if (carbonTable.getTableInfo().getFormat().equals("")
--- End diff --

The default value of format is 'carbondata', so there is no need to handle 
empty. Will remove it


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196625258
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CsvRecordReader.java ---
@@ -0,0 +1,510 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop;
+
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.Reader;
+import java.io.UnsupportedEncodingException;
+import java.math.BigDecimal;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.annotations.InterfaceStability;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException;
+import org.apache.carbondata.core.scan.filter.FilterUtil;
+import org.apache.carbondata.core.scan.filter.GenericQueryType;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecuter;
+import org.apache.carbondata.core.scan.filter.intf.RowImpl;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import org.apache.carbondata.core.statusmanager.FileFormatProperties;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.hadoop.api.CarbonTableInputFormat;
+import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport;
+import org.apache.carbondata.processing.loading.csvinput.CSVInputFormat;
+
+import com.univocity.parsers.csv.CsvParser;
+import com.univocity.parsers.csv.CsvParserSettings;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+
+/**
+ * scan csv file and filter on it
+ */
+@InterfaceStability.Evolving
+@InterfaceAudience.Internal
+public class CsvRecordReader extends AbstractRecordReader {
--- End diff --

The procedure is alike, but the implementation is quite different. The most 
import parts are converting origin data to internal row and converting origin 
data to output row. StreamRecordReader, its origin source is ROW_V1 format 
while in CsvRecordReader, its origin source is CSV format.
Besides, in StreamRecordReader there are more details, such as 'syncMark', 
'rawRow', we do not need it in CSV.

Maybe we can extract the common code in utils or create a new abstraction 
for ReadSupport or RecordReader.


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196624366
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CsvRecordReader.java ---
@@ -0,0 +1,510 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop;
+
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.Reader;
+import java.io.UnsupportedEncodingException;
+import java.math.BigDecimal;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.annotations.InterfaceStability;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException;
+import org.apache.carbondata.core.scan.filter.FilterUtil;
+import org.apache.carbondata.core.scan.filter.GenericQueryType;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecuter;
+import org.apache.carbondata.core.scan.filter.intf.RowImpl;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import org.apache.carbondata.core.statusmanager.FileFormatProperties;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.hadoop.api.CarbonTableInputFormat;
+import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport;
+import org.apache.carbondata.processing.loading.csvinput.CSVInputFormat;
+
+import com.univocity.parsers.csv.CsvParser;
+import com.univocity.parsers.csv.CsvParserSettings;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+
+/**
+ * scan csv file and filter on it
+ */
+@InterfaceStability.Evolving
+@InterfaceAudience.Internal
+public class CsvRecordReader extends AbstractRecordReader {
+  private static final LogService LOGGER = LogServiceFactory.getLogService(
+  CsvRecordReader.class.getName());
+  private static final int MAX_BATCH_SIZE =
+  
CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
+  // vector reader
+  private boolean isVectorReader;
+  private T columnarBatch;
+
+  // metadata
+  private CarbonTable carbonTable;
+  private CarbonColumn[] carbonColumns;
+  // input
+  private QueryModel queryModel;
+  private CarbonReadSupport readSupport;
+  private FileSplit fileSplit;
+  private Configuration hadoopConf;
+  // the index is schema ordinal, the value is the csv ordinal
+  private int[] schema2csvIdx;
+
+  // filter
+  private FilterExecuter filter;
+  // the index is the dimension ordinal, the value is the schema ordinal
+  private int[]

[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2328
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5337/



---

[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2377
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5336/



---

[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2374
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5335/



---

[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2377
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5334/



---

[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2377
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5333/



---

[GitHub] carbondata issue #2265: Added Performance Optimization for Presto by using M...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2265
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5332/



---

[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2328
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5331/



---

[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2384
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5227/



---

[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2377
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5226/



---

[GitHub] carbondata issue #2384: [CARBONDATA-2608] SDK Support JSON data loading dire...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2384
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6393/



---

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2375
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5330/



---

[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2377
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6392/



---

[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2328
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5224/



---

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2375
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5329/



---

[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2328
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6390/



---

[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...

2018-06-19 Thread praveenmeenakshi56

Github user praveenmeenakshi56 commented on the issue:

https://github.com/apache/carbondata/pull/2377
  
retest this please


---

[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2377
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5223/



---

[GitHub] carbondata pull request #2384: [CARBONDATA-2608] SDK Support JSON data loadi...

2018-06-19 Thread ajantha-bhat

GitHub user ajantha-bhat opened a pull request:

https://github.com/apache/carbondata/pull/2384

[CARBONDATA-2608] SDK Support JSON data loading directly (without AVRO 
conversion)

What changes were proposed in this pull request?
Currently SDK Support JSON data loading only with AVRO support.
So, converting json to avro record and avro to carbon object is a two step 
process. Hence there is a need for a new carbonWriter that works with Json 
without AVRO.
This PR implents that.

Highlights:
Works with just the json data and carbon schema.
supports reading multiple json files in a folder.
supports single row json write.

How was this patch tested?

Manual testing, and UTs are added in another PR.

Be sure to do all of the following checklist to help us incorporate 
your contribution quickly and easily:

 - [ ] Any interfaces changed? NA
 
 - [ ] Any backward compatibility impacted? NA
 
 - [ ] Document update required? Yes, will be handled in separate PR

 - [ ] Testing done
Yes, updated the UT.   
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.  NA



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajantha-bhat/carbondata issue_fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/2384.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2384


commit 0c99d11c68d681f15c051d8c8e3ded5ced8b1708
Author: ajantha-bhat 
Date:   2018-06-15T10:21:16Z

JsonCarbonWrtier




---

[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2377
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6389/



---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196512608
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/parser/CarbonSpark2SqlParser.scala
 ---
@@ -403,6 +403,17 @@ class CarbonSpark2SqlParser extends CarbonDDLSqlParser 
{
   partition = partitionSpec)
 }
 
+  /**
+   * The syntax of
+   * ALTER TABLE [dbName.]tableName ADD SEGMENT LOCATION 'path/to/data'
+   */
+  protected lazy val addSegment: Parser[LogicalPlan] =
+ALTER ~> TABLE ~> (ident <~ ".").? ~ ident ~
+ADD ~ SEGMENT ~ LOCATION ~ stringLit <~ opt(";") ^^ {
+  case dbName ~ tableName ~ add ~ segment ~ location ~ filePath =>
--- End diff --

I think it should be `case dbName ~ tableName ~ filePath =>`


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196512126
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonAddSegmentCommand.scala
 ---
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command.management
+
+import java.util.UUID
+
+import org.apache.spark.sql.{CarbonEnv, Row, SparkSession}
+import org.apache.spark.sql.catalyst.analysis.NoSuchTableException
+import org.apache.spark.sql.execution.command.AtomicRunnableCommand
+import org.apache.spark.sql.hive.CarbonRelation
+import org.apache.spark.util.FileUtils
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.datamap.status.DataMapStatusManager
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.mutate.CarbonUpdateUtil
+import org.apache.carbondata.core.statusmanager.{FileFormat, 
LoadMetadataDetails, SegmentStatus, SegmentStatusManager}
+import org.apache.carbondata.core.util.CarbonUtil
+import org.apache.carbondata.core.util.path.CarbonTablePath
+import org.apache.carbondata.events.{OperationContext, 
OperationListenerBus}
+import 
org.apache.carbondata.processing.loading.events.LoadEvents.LoadMetadataEvent
+import 
org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+import org.apache.carbondata.processing.util.CarbonLoaderUtil
+
+/**
+ * support `alter table tableName add segment location 'path'` command.
+ * It will create a segment and map the path of datafile to segment's 
storage
+ */
+case class CarbonAddSegmentCommand(
+dbNameOp: Option[String],
+tableName: String,
+filePathFromUser: String,
+var operationContext: OperationContext = new OperationContext) extends 
AtomicRunnableCommand {
+  private val LOGGER = 
LogServiceFactory.getLogService(this.getClass.getName)
+  var carbonTable: CarbonTable = _
+
+  override def processMetadata(sparkSession: SparkSession): Seq[Row] = {
+val dbName = CarbonEnv.getDatabaseName(dbNameOp)(sparkSession)
+carbonTable = {
+  val relation = CarbonEnv.getInstance(sparkSession).carbonMetastore
+.lookupRelation(Option(dbName), 
tableName)(sparkSession).asInstanceOf[CarbonRelation]
+  if (relation == null) {
+LOGGER.error(s"Add segment failed due to table $dbName.$tableName 
not found")
+throw new NoSuchTableException(dbName, tableName)
+  }
+  relation.carbonTable
+}
+
+if (carbonTable.isHivePartitionTable) {
+  LOGGER.error("Ignore hive partition table for now")
+}
+
+operationContext.setProperty("isOverwrite", false)
+if (CarbonUtil.hasAggregationDataMap(carbonTable)) {
+  val loadMetadataEvent = new LoadMetadataEvent(carbonTable, false)
+  OperationListenerBus.getInstance().fireEvent(loadMetadataEvent, 
operationContext)
+}
+Seq.empty
+  }
+
+  // will just mapping external files to segment metadata
+  override def processData(sparkSession: SparkSession): Seq[Row] = {
--- End diff --

All these operations are metadata only, so I think this class should extend 
`MetadataProcessOpeation` instead


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196511544
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchemaCommon.scala
 ---
@@ -700,6 +700,13 @@ class TableNewProcessor(cm: TableModel) {
   cm.tableName))
 tableInfo.setLastUpdatedTime(System.currentTimeMillis())
 tableInfo.setFactTable(tableSchema)
+val format = cm.tableProperties.get(CarbonCommonConstants.FORMAT)
--- End diff --

`format` table property should also be checked, now only csv is supported


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196510839
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
 ---
@@ -426,6 +439,22 @@ class CarbonScanRDD[T: ClassTag](
 
CarbonTimeStatisticsFactory.createExecutorRecorder(model.getQueryId))
   streamReader.setQueryModel(model)
   streamReader
+case FileFormat.EXTERNAL =>
+  assert(storageFormat.equals("csv"),
--- End diff --

should use if check instead of assert


---

[jira] [Updated] (CARBONDATA-2608) SDK Support JSON data loading directly without AVRO conversion

2018-06-19 Thread Ajantha Bhat (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ajantha Bhat updated CARBONDATA-2608:
-
Summary: SDK Support JSON data loading directly without AVRO conversion  
(was: Support JSON data loading directly into Carbon table.)

> SDK Support JSON data loading directly without AVRO conversion
> --
>
> Key: CARBONDATA-2608
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2608
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: sounak chakraborty
>Priority: Major
>
> Support JSON data loading directly into Carbon table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196510278
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonInputFormat.java ---
@@ -515,12 +574,73 @@ private CarbonInputSplit 
convertToCarbonInputSplit(ExtendedBlocklet blocklet) th
 return split;
   }
 
+  private List 
convertToInputSplit4ExternalFormat(JobContext jobContext,
+  ExtendedBlocklet extendedBlocklet) throws IOException {
+List splits = new ArrayList();
+String factFilePath = extendedBlocklet.getFilePath();
+Path path = new Path(factFilePath);
+FileSystem fs = FileFactory.getFileSystem(path);
+FileStatus fileStatus = fs.getFileStatus(path);
+long length = fileStatus.getLen();
+if (length != 0) {
+  BlockLocation[] blkLocations = fs.getFileBlockLocations(path, 0, 
length);
+  long blkSize = fileStatus.getBlockSize();
+  long minSplitSize = Math.max(getFormatMinSplitSize(), 
getMinSplitSize(jobContext));
+  long maxSplitSize = getMaxSplitSize(jobContext);
+  long splitSize = computeSplitSize(blkSize, minSplitSize, 
maxSplitSize);
+  long bytesRemaining = fileStatus.getLen();
+  while (((double) bytesRemaining) / splitSize > 1.1) {
+int blkIndex = getBlockIndex(blkLocations, length - 
bytesRemaining);
+splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), 
path,
+length - bytesRemaining,
+splitSize, blkLocations[blkIndex].getHosts(),
+blkLocations[blkIndex].getCachedHosts(), FileFormat.EXTERNAL));
+bytesRemaining -= splitSize;
+  }
+  if (bytesRemaining != 0) {
+int blkIndex = getBlockIndex(blkLocations, length - 
bytesRemaining);
+splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), 
path,
+length - bytesRemaining,
+bytesRemaining, blkLocations[blkIndex].getHosts(),
+blkLocations[blkIndex].getCachedHosts(), FileFormat.EXTERNAL));
+  }
+} else {
+  splits.add(new CarbonInputSplit(extendedBlocklet.getSegmentId(), 
path, 0, length,
+  new String[0], FileFormat.EXTERNAL));
+}
+return splits;
+  }
+
   @Override public RecordReader createRecordReader(InputSplit 
inputSplit,
   TaskAttemptContext taskAttemptContext) throws IOException, 
InterruptedException {
 Configuration configuration = taskAttemptContext.getConfiguration();
 QueryModel queryModel = createQueryModel(inputSplit, 
taskAttemptContext);
 CarbonReadSupport readSupport = getReadSupportClass(configuration);
-return new CarbonRecordReader(queryModel, readSupport);
+if (inputSplit instanceof CarbonMultiBlockSplit
+&& ((CarbonMultiBlockSplit) inputSplit).getFileFormat() == 
FileFormat.EXTERNAL) {
+  return createRecordReaderForExternalFormat(queryModel, readSupport,
+  
configuration.get(CarbonCommonConstants.CARBON_EXTERNAL_FORMAT_CONF_KEY));
+} else if (inputSplit instanceof CarbonInputSplit
+&& ((CarbonInputSplit) inputSplit).getFileFormat() == 
FileFormat.EXTERNAL) {
+  return createRecordReaderForExternalFormat(queryModel, readSupport,
+  
configuration.get(CarbonCommonConstants.CARBON_EXTERNAL_FORMAT_CONF_KEY));
+} else {
+  return new CarbonRecordReader(queryModel, readSupport);
+}
+  }
+
+  @Since("1.4.1")
--- End diff --

I think for private method, this annotation is not required


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196509935
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonFileInputFormat.java
 ---
@@ -174,9 +174,15 @@ public CarbonTable 
getOrCreateCarbonTable(Configuration configuration) throws IO
 List result = new LinkedList();
 
 // for each segment fetch blocks matching filter in Driver BTree
-List dataBlocksOfSegment =
-getDataBlocksOfSegment(job, carbonTable, filterResolver, 
matchedPartitions,
-validSegments, partitionInfo, oldPartitionIdList);
+List dataBlocksOfSegment;
+if (carbonTable.getTableInfo().getFormat().equals("")
--- End diff --

why support empty string?


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196509716
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CsvRecordReader.java ---
@@ -0,0 +1,510 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop;
+
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.Reader;
+import java.io.UnsupportedEncodingException;
+import java.math.BigDecimal;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.annotations.InterfaceStability;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException;
+import org.apache.carbondata.core.scan.filter.FilterUtil;
+import org.apache.carbondata.core.scan.filter.GenericQueryType;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecuter;
+import org.apache.carbondata.core.scan.filter.intf.RowImpl;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import org.apache.carbondata.core.statusmanager.FileFormatProperties;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.hadoop.api.CarbonTableInputFormat;
+import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport;
+import org.apache.carbondata.processing.loading.csvinput.CSVInputFormat;
+
+import com.univocity.parsers.csv.CsvParser;
+import com.univocity.parsers.csv.CsvParserSettings;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+
+/**
+ * scan csv file and filter on it
+ */
+@InterfaceStability.Evolving
+@InterfaceAudience.Internal
+public class CsvRecordReader extends AbstractRecordReader {
--- End diff --

This class is much like StreamRecordReader, and it implements filter 
execution on internal row, can you extract common code to a parent class?


---

[GitHub] carbondata pull request #2374: [CARBONDATA-2613] Support csv based carbon ta...

2018-06-19 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2374#discussion_r196508522
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CsvRecordReader.java ---
@@ -0,0 +1,510 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.hadoop;
+
+import java.io.IOException;
+import java.io.InputStreamReader;
+import java.io.Reader;
+import java.io.UnsupportedEncodingException;
+import java.math.BigDecimal;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.annotations.InterfaceAudience;
+import org.apache.carbondata.common.annotations.InterfaceStability;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.constants.CarbonV3DataFormatConstants;
+import org.apache.carbondata.core.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonDimension;
+import 
org.apache.carbondata.core.metadata.schema.table.column.CarbonMeasure;
+import 
org.apache.carbondata.core.metadata.schema.table.column.ColumnSchema;
+import 
org.apache.carbondata.core.scan.expression.exception.FilterUnsupportedException;
+import org.apache.carbondata.core.scan.filter.FilterUtil;
+import org.apache.carbondata.core.scan.filter.GenericQueryType;
+import org.apache.carbondata.core.scan.filter.executer.FilterExecuter;
+import org.apache.carbondata.core.scan.filter.intf.RowImpl;
+import org.apache.carbondata.core.scan.filter.intf.RowIntf;
+import org.apache.carbondata.core.scan.filter.resolver.FilterResolverIntf;
+import org.apache.carbondata.core.scan.model.QueryModel;
+import org.apache.carbondata.core.statusmanager.FileFormatProperties;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.core.util.DataTypeUtil;
+import org.apache.carbondata.hadoop.api.CarbonTableInputFormat;
+import org.apache.carbondata.hadoop.readsupport.CarbonReadSupport;
+import org.apache.carbondata.processing.loading.csvinput.CSVInputFormat;
+
+import com.univocity.parsers.csv.CsvParser;
+import com.univocity.parsers.csv.CsvParserSettings;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+
+/**
+ * scan csv file and filter on it
+ */
+@InterfaceStability.Evolving
+@InterfaceAudience.Internal
+public class CsvRecordReader extends AbstractRecordReader {
+  private static final LogService LOGGER = LogServiceFactory.getLogService(
+  CsvRecordReader.class.getName());
+  private static final int MAX_BATCH_SIZE =
+  
CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
+  // vector reader
+  private boolean isVectorReader;
+  private T columnarBatch;
+
+  // metadata
+  private CarbonTable carbonTable;
+  private CarbonColumn[] carbonColumns;
+  // input
+  private QueryModel queryModel;
+  private CarbonReadSupport readSupport;
+  private FileSplit fileSplit;
+  private Configuration hadoopConf;
+  // the index is schema ordinal, the value is the csv ordinal
+  private int[] schema2csvIdx;
+
+  // filter
+  private FilterExecuter filter;
+  // the index is the dimension ordinal, the value is the schema ordinal
+  private int[]

[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2374
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5222/



---

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2375
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5328/



---

[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2374
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6387/



---

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2375
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5327/



---

[GitHub] carbondata issue #2265: Added Performance Optimization for Presto by using M...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2265
  
Build Success with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5219/



---

[GitHub] carbondata issue #2265: Added Performance Optimization for Presto by using M...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2265
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6384/



---

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

2018-06-19 Thread kumarvishal09

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/2383
  
@xuchuanyin then number of rows will depend on number of character in long 
string columns right?


---

[GitHub] carbondata pull request #2383: [CARBONDATA-2615][32K] Support page size less...

2018-06-19 Thread kumarvishal09

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/2383#discussion_r196487039
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactDataHandlerColumnar.java
 ---
@@ -371,8 +371,13 @@ private void setWritingConfiguration() throws 
CarbonDataWriterException {
 this.pageSize = Integer.parseInt(CarbonProperties.getInstance()
 .getProperty(CarbonCommonConstants.BLOCKLET_SIZE,
 CarbonCommonConstants.BLOCKLET_SIZE_DEFAULT_VAL));
+// support less than 32000 rows in one page, because we support super 
long string,
+// if it is long enough, a clomun page with 32000 rows will exceed 2GB
 if (version == ColumnarFormatVersion.V3) {
-  this.pageSize = 
CarbonV3DataFormatConstants.NUMBER_OF_ROWS_PER_BLOCKLET_COLUMN_PAGE_DEFAULT;
+  this.pageSize =
--- End diff --

how much is the default value for page size ?


---

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2383
  
@kumarvishal09 I asked someone who has the longstring requirement and get 
the response that the length of string is about 100K.
Since we don't want to change the internal implementation of column page, 
decreasing the row number in a page may be the only way to solve the problem.


---

[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...

2018-06-19 Thread praveenmeenakshi56

Github user praveenmeenakshi56 commented on the issue:

https://github.com/apache/carbondata/pull/2377
  
retest this please


---

[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2328
  
Build Failed with Spark 2.2.1, Please check CI 
http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5218/



---

[GitHub] carbondata issue #2374: [CARBONDATA-2613] Support csv based carbon table

2018-06-19 Thread xuchuanyin

Github user xuchuanyin commented on the issue:

https://github.com/apache/carbondata/pull/2374
  
@jackylk All the comments has been resolved except 
https://github.com/apache/carbondata/pull/2374#discussion_r195684966


---

[GitHub] carbondata issue #2377: [CARBONDATA-2611] Added Test Cases for Local Diction...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2377
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6386/



---

[jira] [Resolved] (CARBONDATA-2585) Support Adding Local Dictionary configuration in Create table statement

2018-06-19 Thread kumar vishal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal resolved CARBONDATA-2585.
--
Resolution: Fixed

> Support Adding Local Dictionary configuration in Create table statement
> ---
>
> Key: CARBONDATA-2585
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2585
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: kumar vishal
>Assignee: Akash R Nilugal
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Allow user to pass local dictionary configuration in Create table statement.
> *LOCAL_DICTIONARY_ENABLE :  enable or disable local dictionary generation for 
> a table(default local dictionary generation will be true)*
> {color:#00}*CARBON_LOCALDICT_THRESHOLD: configuring the threshold value 
> for local dictionary generation(default will be 1000)*{color}
> {color:#00}*LOCAL_DICTIONARY_INCLUDE***: list of columns for which user 
> wants to generate local dictionary (default all the no dictionary string data 
> type columns will be considered for generation) {color}
> {color:#00}*LOCAL_DICTIONARY_EXCLUDE***: list of columns for which user 
> does not want to generate local dictionary (default no string datatype no 
> dictionary columns are excluded unless it is configured) {color}
> CREATE TABLE carbontable(
> column1 string,
> column2 string,
> column3 LONG )
> STORED BY 'carbondata'
> TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',*
> '*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (CARBONDATA-2586) Support Showing local dictionary configuration in desc formatted command

2018-06-19 Thread kumar vishal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kumar vishal resolved CARBONDATA-2586.
--
Resolution: Fixed

> Support Showing local dictionary configuration in desc formatted command
> 
>
> Key: CARBONDATA-2586
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2586
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: kumar vishal
>Assignee: Akash R Nilugal
>Priority: Major
>
> Support Showing local dictionary parameter in Desc formatted command
>  # *LOCAL_DICTIONARY_ENABLE*
>  # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
>  # {color:#00}*LOCAL_DICTIONARY_INCLUDE*{color}
>  #  *LOCAL_DICTIONARY_EXCLUDE*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-2586) Support Showing local dictionary configuration in desc formatted command

2018-06-19 Thread Akash R Nilugal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal updated CARBONDATA-2586:

Description: 
Support Showing local dictionary parameter in Desc formatted command
 # *LOCAL_DICTIONARY_ENABLE*

 # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}

 # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color}

 #  

{color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color}

  was:
Support Showing local dictionary parameter in Desc formatted command
 # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
 # {color:#00}*ENABLE_LOCAL_DICT*{color}


> Support Showing local dictionary configuration in desc formatted command
> 
>
> Key: CARBONDATA-2586
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2586
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: kumar vishal
>Assignee: Akash R Nilugal
>Priority: Major
>
> Support Showing local dictionary parameter in Desc formatted command
>  # *LOCAL_DICTIONARY_ENABLE*
>  # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
>  # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color}
>  #  
> {color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-2586) Support Showing local dictionary configuration in desc formatted command

2018-06-19 Thread Akash R Nilugal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal updated CARBONDATA-2586:

Description: 
Support Showing local dictionary parameter in Desc formatted command
 # *LOCAL_DICTIONARY_ENABLE*
 # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
 # {color:#00}*LOCAL_DICTIONARY_INCLUDE*{color}
 #  *LOCAL_DICTIONARY_EXCLUDE*

  was:
Support Showing local dictionary parameter in Desc formatted command
 # *LOCAL_DICTIONARY_ENABLE*
 # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
 # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color}
 #  *LOCAL_DICTIONARY_EXCLUDE***


> Support Showing local dictionary configuration in desc formatted command
> 
>
> Key: CARBONDATA-2586
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2586
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: kumar vishal
>Assignee: Akash R Nilugal
>Priority: Major
>
> Support Showing local dictionary parameter in Desc formatted command
>  # *LOCAL_DICTIONARY_ENABLE*
>  # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
>  # {color:#00}*LOCAL_DICTIONARY_INCLUDE*{color}
>  #  *LOCAL_DICTIONARY_EXCLUDE*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-2586) Support Showing local dictionary configuration in desc formatted command

2018-06-19 Thread Akash R Nilugal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal updated CARBONDATA-2586:

Description: 
Support Showing local dictionary parameter in Desc formatted command
 # *LOCAL_DICTIONARY_ENABLE*
 # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
 # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color}
 #  *LOCAL_DICTIONARY_EXCLUDE***

  was:
Support Showing local dictionary parameter in Desc formatted command
 # *LOCAL_DICTIONARY_ENABLE*

 # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}

 # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color}

 #  

{color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color}


> Support Showing local dictionary configuration in desc formatted command
> 
>
> Key: CARBONDATA-2586
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2586
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: kumar vishal
>Assignee: Akash R Nilugal
>Priority: Major
>
> Support Showing local dictionary parameter in Desc formatted command
>  # *LOCAL_DICTIONARY_ENABLE*
>  # {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
>  # **{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color}
>  #  *LOCAL_DICTIONARY_EXCLUDE***



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2383: [CARBONDATA-2615][32K] Support page size less than 3...

2018-06-19 Thread kumarvishal09

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/2383
  
@xuchuanyin I think better to restrict based on number of bytes 67104 for 
each column value, as user may not know how many character will be present , so 
its hard for the user to configure blocklet size. 


---

[jira] [Updated] (CARBONDATA-2585) Support Adding Local Dictionary configuration in Create table statement

2018-06-19 Thread Akash R Nilugal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal updated CARBONDATA-2585:

Description: 
Allow user to pass local dictionary configuration in Create table statement.

*LOCAL_DICTIONARY_ENABLE :  enable or disable local dictionary generation for a 
table(default local dictionary generation will be true)*

{color:#00}*CARBON_LOCALDICT_THRESHOLD: configuring the threshold value for 
local dictionary generation(default will be 1000)*{color}

{color:#00}*LOCAL_DICTIONARY_INCLUDE***: list of columns for which user 
wants to generate local dictionary (default all the no dictionary string data 
type columns will be considered for generation) {color}

{color:#00}*LOCAL_DICTIONARY_EXCLUDE***: list of columns for which user 
does not want to generate local dictionary (default no string datatype no 
dictionary columns are excluded unless it is configured) {color}

CREATE TABLE carbontable(

column1 string,

column2 string,

column3 LONG )

STORED BY 'carbondata'

TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',*

'*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*')

  was:
Allow user to pass local dictionary configuration in Create table statement.

*LOCAL_DICTIONARY_ENABLE*

{color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}

{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color}

{color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color}

CREATE TABLE carbontable(

column1 string,

column2 string,

column3 LONG )

STORED BY 'carbondata'

TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',*

'*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*')


> Support Adding Local Dictionary configuration in Create table statement
> ---
>
> Key: CARBONDATA-2585
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2585
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: kumar vishal
>Assignee: Akash R Nilugal
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Allow user to pass local dictionary configuration in Create table statement.
> *LOCAL_DICTIONARY_ENABLE :  enable or disable local dictionary generation for 
> a table(default local dictionary generation will be true)*
> {color:#00}*CARBON_LOCALDICT_THRESHOLD: configuring the threshold value 
> for local dictionary generation(default will be 1000)*{color}
> {color:#00}*LOCAL_DICTIONARY_INCLUDE***: list of columns for which user 
> wants to generate local dictionary (default all the no dictionary string data 
> type columns will be considered for generation) {color}
> {color:#00}*LOCAL_DICTIONARY_EXCLUDE***: list of columns for which user 
> does not want to generate local dictionary (default no string datatype no 
> dictionary columns are excluded unless it is configured) {color}
> CREATE TABLE carbontable(
> column1 string,
> column2 string,
> column3 LONG )
> STORED BY 'carbondata'
> TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',*
> '*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (CARBONDATA-2585) Support Adding Local Dictionary configuration in Create table statement

2018-06-19 Thread Akash R Nilugal (JIRA)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal updated CARBONDATA-2585:

Description: 
Allow user to pass local dictionary configuration in Create table statement.

*LOCAL_DICTIONARY_ENABLE*

{color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}

{color:#00}*LOCAL_DICTIONARY_INCLUDE***{color}

{color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color}

CREATE TABLE carbontable(

column1 string,

column2 string,

column3 LONG )

STORED BY 'carbondata'

TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',*

'*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*')

  was:
Allow user to pass local dictionary configuration in Create table statement.

{color:#00}*ENABLE_LOCAL_DICT*{color}

{color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}

{color:#00}CREATE TABLE carbontable({color}

{color:#00} column1 string,{color}

{color:#00} column2 string,{color}

{color:#00} column3 LONG ){color}

{color:#00} STORED BY 'carbondata'{color}

{color:#00}TBLPROPERTIES('{color}{color:#00}*ENABLE_LOCAL_DICT*{color}{color:#00}'='{color}{color:#00}*true*{color}{color:#00}',{color}{color:#00}*CARBON_LOCALDICT_THRESHOLD=1000'*{color}{color:#00}){color}


> Support Adding Local Dictionary configuration in Create table statement
> ---
>
> Key: CARBONDATA-2585
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2585
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: kumar vishal
>Assignee: Akash R Nilugal
>Priority: Major
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Allow user to pass local dictionary configuration in Create table statement.
> *LOCAL_DICTIONARY_ENABLE*
> {color:#00}*CARBON_LOCALDICT_THRESHOLD*{color}
> {color:#00}*LOCAL_DICTIONARY_INCLUDE***{color}
> {color:#00}*LOCAL_DICTIONARY_EXCLUDE***{color}
> CREATE TABLE carbontable(
> column1 string,
> column2 string,
> column3 LONG )
> STORED BY 'carbondata'
> TBLPROPERTIES('*LOCAL_DICTIONARY_ENABLE*'='*true*',*’LOCAL_DICTIONARY_THRESHOLD=1000',*
> '*LOCAL_DICTIONARY_INCLUDE*'='*column1*','*LOCAL_DICTIONARY_EXCLUDE*'='*column2*')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

2018-06-19 Thread kumarvishal09

Github user kumarvishal09 commented on the issue:

https://github.com/apache/carbondata/pull/2375
  
LGTM


---

[GitHub] carbondata issue #2375: [CARBONDATA-2585][CARBONDATA-2586][Local Dictionary]...

2018-06-19 Thread ravipesala

Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/2375
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/5326/



---

[GitHub] carbondata issue #2328: [CARBONDATA-2504][STREAM] Support StreamSQL for stre...

2018-06-19 Thread CarbonDataQA

Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/2328
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6383/



---

1 2 3 >

1 - 100 of 215 matches

Mail list logo