[GitHub] incubator-carbondata issue #626: [WIP]Fixed loading issues in TPC-DS data fo...

2017-03-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/626
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1019/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (CARBONDATA-691) After Compaction records count are mismatched.

2017-03-06 Thread Ravindra Pesala (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ravindra Pesala resolved CARBONDATA-691.

   Resolution: Fixed
Fix Version/s: 1.0.1-incubating

> After Compaction records count are mismatched.
> --
>
> Key: CARBONDATA-691
> URL: https://issues.apache.org/jira/browse/CARBONDATA-691
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load, data-query, docs
>Affects Versions: 1.0.0-incubating
>Reporter: Babulal
>Assignee: sounak chakraborty
> Fix For: 1.0.1-incubating
>
> Attachments: createLoadcmd.txt, driverlog.txt
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Spark version - Spark 1.6.2 and spark2.1 
> After Compaction data showing is wrong.
> create table and load 4 times s( compaction threshold is 4,3)
> Load 4 times same data .each load 105 records as attached in file . 
> --+--+
> | SegmentSequenceId  |   Status   | Load Start Time  |  Load End 
> Time   |
> +++--+--+--+
> | 3  | Compacted  | 2017-02-01 14:07:51.922  | 2017-02-01 
> 14:07:52.591  |
> | 2  | Compacted  | 2017-02-01 14:07:33.481  | 2017-02-01 
> 14:07:34.443  |
> | 1  | Compacted  | 2017-02-01 14:07:23.495  | 2017-02-01 
> 14:07:24.167  |
> | 0.1| Success| 2017-02-01 14:07:52.815  | 2017-02-01 
> 14:07:57.201  |
> | 0  | Compacted  | 2017-02-01 14:07:07.541  | 2017-02-01 
> 14:07:11.983  |
> +++--+--+--+
> 5 rows selected (0.021 seconds)
> 0: jdbc:hive2://8.99.61.4:23040> select count(*) from 
> Comp_VMALL_DICTIONARY_INCLUDE_7;
> +---+--+
> | count(1)  |
> +---+--+
> | 1680  |
> +---+--+
> 1 row selected (4.468 seconds)
> 0: jdbc:hive2://8.99.61.4:23040> select count(imei) from 
> Comp_VMALL_DICTIONARY_INCLUDE_7;
> +--+--+
> | count(imei)  |
> +--+--+
> | 1680 |
> +--+--+
> Expected :-  total records should be 420 . 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] incubator-carbondata pull request #604: [CARBONDATA-691] After Compaction re...

2017-03-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/604


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #604: [CARBONDATA-691] After Compaction records c...

2017-03-06 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/incubator-carbondata/pull/604
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #626: [WIP]Fixed loading issues in TPC-DS ...

2017-03-06 Thread ravipesala
GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/626

[WIP]Fixed loading issues in TPC-DS data for V3 format



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata 
dictionary-server-issue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/626.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #626


commit 187d5e8f61f9a401e6e21a64db8cc68326c50287
Author: ravipesala 
Date:   2017-03-07T06:37:48Z

Fixed loading issues in TPC-DS data




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (CARBONDATA-750) Improve exception information description while user input wrong creation table script

2017-03-06 Thread anubhav tarar (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

anubhav tarar reassigned CARBONDATA-750:


Assignee: anubhav tarar

> Improve exception information description while user input wrong creation 
> table script
> --
>
> Key: CARBONDATA-750
> URL: https://issues.apache.org/jira/browse/CARBONDATA-750
> Project: CarbonData
>  Issue Type: Improvement
>  Components: sql
>Reporter: Liang Chen
>Assignee: anubhav tarar
>Priority: Minor
>
> 1. Use wrong creation table script:
> scala> carbon.sql("CREATE TABLE carbontable1 (id,int,age string,year,int) 
> STORED BY 'carbondata'")
> java.lang.RuntimeException: [1.1] failure: identifier matching regex 
> (?i)ALTER expected
> CREATE TABLE carbontable1 (id,int,age string,year,int) STORED BY 'carbondata'
> ^
>   at scala.sys.package$.error(package.scala:27)
>   at 
> org.apache.spark.sql.parser.CarbonSpark2SqlParser.parse(CarbonSpark2SqlParser.scala:45)
>   at 
> org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan(CarbonSparkSqlParser.scala:51)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
> 2.Need improve the exception information description, like : unexpected "," 
> found
> CREATE TABLE carbontable1 (id,int,age string,year,int) STORED BY 
>  ^



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] incubator-carbondata issue #618: [CARBONDATA-734] Support the syntax of 'STO...

2017-03-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/618
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1018/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #618: [CARBONDATA-734] Support the syntax of 'STO...

2017-03-06 Thread watermen
Github user watermen commented on the issue:

https://github.com/apache/incubator-carbondata/pull/618
  
@ravipesala Please review the testcase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #625: [CARBONDATA-743] Remove redundant Ca...

2017-03-06 Thread lionelcao
Github user lionelcao commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/625#discussion_r104575573
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/CarbonFilters.scala
 ---
@@ -1,397 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.carbondata.spark
-
-import scala.collection.mutable.ArrayBuffer
-
-import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.optimizer.AttributeReferenceWrapper
-import org.apache.spark.sql.sources
-import org.apache.spark.sql.types.StructType
-
-import org.apache.carbondata.core.metadata.datatype.DataType
-import org.apache.carbondata.core.metadata.schema.table.CarbonTable
-import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
-import org.apache.carbondata.core.scan.expression.{ColumnExpression => 
CarbonColumnExpression, Expression => CarbonExpression, LiteralExpression => 
CarbonLiteralExpression}
-import org.apache.carbondata.core.scan.expression.conditional._
-import org.apache.carbondata.core.scan.expression.logical.{AndExpression, 
FalseExpression, OrExpression}
-import org.apache.carbondata.spark.util.CarbonScalaUtil
-
-/**
- * All filter conversions are done here.
- */
-object CarbonFilters {
-
-  /**
-   * Converts data sources filters to carbon filter predicates.
-   */
-  def createCarbonFilter(schema: StructType,
-  predicate: sources.Filter): Option[CarbonExpression] = {
-val dataTypeOf = schema.map(f => f.name -> f.dataType).toMap
-
-def createFilter(predicate: sources.Filter): Option[CarbonExpression] 
= {
-  predicate match {
-
-case sources.EqualTo(name, value) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.Not(sources.EqualTo(name, value)) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.EqualNullSafe(name, value) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.Not(sources.EqualNullSafe(name, value)) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.GreaterThan(name, value) =>
-  Some(new GreaterThanExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.LessThan(name, value) =>
-  Some(new LessThanExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.GreaterThanOrEqual(name, value) =>
-  Some(new GreaterThanEqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.LessThanOrEqual(name, value) =>
-  Some(new LessThanEqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.In(name, values) =>
-  Some(new InExpression(getCarbonExpression(name),
-new ListExpression(
-  convertToJavaList(values.map(f => 
getCarbonLiteralExpression(name, f)).toList
-case sources.Not(sources.In(name, values)) =>
-  Some(new NotInExpression(getCarbonExpression(name),
-new ListExpression(
-  convertToJavaList(values.map(f => 
getCarbonLiteralExpression(name, f)).toList
-
-case sources.IsNull(name) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, null), true))
-case sources.IsNotNull(name) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-

[GitHub] incubator-carbondata pull request #625: [CARBONDATA-743] Remove redundant Ca...

2017-03-06 Thread lionelcao
Github user lionelcao commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/625#discussion_r104575582
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/CarbonFilters.scala
 ---
@@ -1,397 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.carbondata.spark
-
-import scala.collection.mutable.ArrayBuffer
-
-import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.optimizer.AttributeReferenceWrapper
-import org.apache.spark.sql.sources
-import org.apache.spark.sql.types.StructType
-
-import org.apache.carbondata.core.metadata.datatype.DataType
-import org.apache.carbondata.core.metadata.schema.table.CarbonTable
-import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
-import org.apache.carbondata.core.scan.expression.{ColumnExpression => 
CarbonColumnExpression, Expression => CarbonExpression, LiteralExpression => 
CarbonLiteralExpression}
-import org.apache.carbondata.core.scan.expression.conditional._
-import org.apache.carbondata.core.scan.expression.logical.{AndExpression, 
FalseExpression, OrExpression}
-import org.apache.carbondata.spark.util.CarbonScalaUtil
-
-/**
- * All filter conversions are done here.
- */
-object CarbonFilters {
-
-  /**
-   * Converts data sources filters to carbon filter predicates.
-   */
-  def createCarbonFilter(schema: StructType,
-  predicate: sources.Filter): Option[CarbonExpression] = {
-val dataTypeOf = schema.map(f => f.name -> f.dataType).toMap
-
-def createFilter(predicate: sources.Filter): Option[CarbonExpression] 
= {
-  predicate match {
-
-case sources.EqualTo(name, value) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.Not(sources.EqualTo(name, value)) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.EqualNullSafe(name, value) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.Not(sources.EqualNullSafe(name, value)) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.GreaterThan(name, value) =>
-  Some(new GreaterThanExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.LessThan(name, value) =>
-  Some(new LessThanExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.GreaterThanOrEqual(name, value) =>
-  Some(new GreaterThanEqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.LessThanOrEqual(name, value) =>
-  Some(new LessThanEqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.In(name, values) =>
-  Some(new InExpression(getCarbonExpression(name),
-new ListExpression(
-  convertToJavaList(values.map(f => 
getCarbonLiteralExpression(name, f)).toList
-case sources.Not(sources.In(name, values)) =>
-  Some(new NotInExpression(getCarbonExpression(name),
-new ListExpression(
-  convertToJavaList(values.map(f => 
getCarbonLiteralExpression(name, f)).toList
-
-case sources.IsNull(name) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, null), true))
-case sources.IsNotNull(name) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-

[GitHub] incubator-carbondata pull request #625: [CARBONDATA-743] Remove redundant Ca...

2017-03-06 Thread lionelcao
Github user lionelcao commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/625#discussion_r104578046
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/CarbonFilters.scala
 ---
@@ -1,397 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.carbondata.spark
-
-import scala.collection.mutable.ArrayBuffer
-
-import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.optimizer.AttributeReferenceWrapper
-import org.apache.spark.sql.sources
-import org.apache.spark.sql.types.StructType
-
-import org.apache.carbondata.core.metadata.datatype.DataType
-import org.apache.carbondata.core.metadata.schema.table.CarbonTable
-import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
-import org.apache.carbondata.core.scan.expression.{ColumnExpression => 
CarbonColumnExpression, Expression => CarbonExpression, LiteralExpression => 
CarbonLiteralExpression}
-import org.apache.carbondata.core.scan.expression.conditional._
-import org.apache.carbondata.core.scan.expression.logical.{AndExpression, 
FalseExpression, OrExpression}
-import org.apache.carbondata.spark.util.CarbonScalaUtil
-
-/**
- * All filter conversions are done here.
- */
-object CarbonFilters {
-
-  /**
-   * Converts data sources filters to carbon filter predicates.
-   */
-  def createCarbonFilter(schema: StructType,
-  predicate: sources.Filter): Option[CarbonExpression] = {
-val dataTypeOf = schema.map(f => f.name -> f.dataType).toMap
-
-def createFilter(predicate: sources.Filter): Option[CarbonExpression] 
= {
-  predicate match {
-
-case sources.EqualTo(name, value) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.Not(sources.EqualTo(name, value)) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.EqualNullSafe(name, value) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.Not(sources.EqualNullSafe(name, value)) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.GreaterThan(name, value) =>
-  Some(new GreaterThanExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.LessThan(name, value) =>
-  Some(new LessThanExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.GreaterThanOrEqual(name, value) =>
-  Some(new GreaterThanEqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.LessThanOrEqual(name, value) =>
-  Some(new LessThanEqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.In(name, values) =>
-  Some(new InExpression(getCarbonExpression(name),
-new ListExpression(
-  convertToJavaList(values.map(f => 
getCarbonLiteralExpression(name, f)).toList
-case sources.Not(sources.In(name, values)) =>
-  Some(new NotInExpression(getCarbonExpression(name),
-new ListExpression(
-  convertToJavaList(values.map(f => 
getCarbonLiteralExpression(name, f)).toList
-
-case sources.IsNull(name) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, null), true))
-case sources.IsNotNull(name) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-

[GitHub] incubator-carbondata pull request #614: [CARBONDATA-714]Documented how to ha...

2017-03-06 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/614#discussion_r104550978
  
--- Diff: docs/faq.md ---
@@ -18,30 +18,57 @@
 -->
 
 # FAQs
-* **Auto Compaction not Working**
 
-The Property carbon.enable.auto.load.merge in carbon.properties need 
to be set to true.
+* [What are Bad Records?](#what-are-bad-records)
+* [Where are Bad Records Stored in 
CarbonData?](#where-are-bad-records-stored-in-carbondata)
+* [How to handle Bad Records?](#how-to-handle-bad-records)
+* [How to resolve store location can’t be 
found?](#how-to-resolve-store-location-can-not-be-found)
+* [What is Carbon Lock Type?](#what-is-carbon-lock-type)
+* [How to resolve Abstract Method 
Error?](#how-to-resolve-abstract-method-error)
 
-* **Getting Abstract method error**
+## What are Bad Records?
+Records that fail to get loaded into the CarbonData due to data type 
incompatibility or are empty or have incompatible format are classified as Bad 
Records.
 
-You need to specify the spark version while using Maven to build 
project.
+## Where are Bad Records Stored in CarbonData?
+The bad records are stored at the location set in 
carbon.badRecords.location in carbon.properties file.
+By default **carbon.badRecords.location** specifies the following location 
``/opt/Carbon/Spark/badrecords``.
 
-* **Getting NotImplementedException for subquery using IN and EXISTS**
+## How to handle Bad Records?
+While loading data we can specify the approach to handle Bad Records. In 
order to analyse the cause of the Bad Records the parameter 
``BAD_RECORDS_LOGGER_ENABLE`` must be set to value ``TRUE``. There are three 
approaches to handle Bad Records which can be specified  by the parameter 
``BAD_RECORDS_ACTION``.
 
-Subquery with in and exists not supported in CarbonData.
-
-* **Getting Exceptions on creating  a view**
-
-View not supported in CarbonData.
-
-* **How to verify if ColumnGroups have been created as desired.**
+- To pad the incorrect values of the csv rows with NULL value and load the 
data in CarbonData, set the following in the query :
+```
+'BAD_RECORDS_ACTION'='FORCE'
+```
--- End diff --

Please add "How to ignore the bad records" ?
Please find the detail discussion at here : 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/data-lost-when-loading-data-from-csv-file-to-carbon-table-td7554.html



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #614: [CARBONDATA-714]Documented how to ha...

2017-03-06 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/614#discussion_r104550787
  
--- Diff: docs/faq.md ---
@@ -18,30 +18,57 @@
 -->
 
 # FAQs
-* **Auto Compaction not Working**
 
-The Property carbon.enable.auto.load.merge in carbon.properties need 
to be set to true.
+* [What are Bad Records?](#what-are-bad-records)
+* [Where are Bad Records Stored in 
CarbonData?](#where-are-bad-records-stored-in-carbondata)
+* [How to handle Bad Records?](#how-to-handle-bad-records)
+* [How to resolve store location can’t be 
found?](#how-to-resolve-store-location-can-not-be-found)
+* [What is Carbon Lock Type?](#what-is-carbon-lock-type)
+* [How to resolve Abstract Method 
Error?](#how-to-resolve-abstract-method-error)
 
-* **Getting Abstract method error**
+## What are Bad Records?
+Records that fail to get loaded into the CarbonData due to data type 
incompatibility or are empty or have incompatible format are classified as Bad 
Records.
 
-You need to specify the spark version while using Maven to build 
project.
+## Where are Bad Records Stored in CarbonData?
+The bad records are stored at the location set in 
carbon.badRecords.location in carbon.properties file.
+By default **carbon.badRecords.location** specifies the following location 
``/opt/Carbon/Spark/badrecords``.
 
-* **Getting NotImplementedException for subquery using IN and EXISTS**
+## How to handle Bad Records?
--- End diff --

This is "how to enable bad record logging".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #624: [CARBONDATA-747][WIP] Add simple per...

2017-03-06 Thread jackylk
Github user jackylk commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/624#discussion_r104566012
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala 
---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object CompareTest {
+
+  val parquetTableName = "comparetest_parquet"
+  val carbonTableName = "comparetest_carbon"
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+import spark.implicits._
+spark.sparkContext.parallelize(1 to 10 * 1000 * 1000, 4)
+.map(x => ("i" + x, "p" + x % 10, "j" + x % 100, x, x + 1, (x + 7) 
% 21, (x + 5) / 43, x
+* 5))
+.toDF("id", "country", "city", "c4", "c5", "c6", "c7", "c8")
--- End diff --

ok, I found decimal is not supported for dataframe.write, I will  raise a 
JIRA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #624: [CARBONDATA-747][WIP] Add simple per...

2017-03-06 Thread jackylk
Github user jackylk commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/624#discussion_r104565728
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala 
---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object CompareTest {
+
+  val parquetTableName = "comparetest_parquet"
+  val carbonTableName = "comparetest_carbon"
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+import spark.implicits._
+spark.sparkContext.parallelize(1 to 10 * 1000 * 1000, 4)
+.map(x => ("i" + x, "p" + x % 10, "j" + x % 100, x, x + 1, (x + 7) 
% 21, (x + 5) / 43, x
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #624: [CARBONDATA-747][WIP] Add simple per...

2017-03-06 Thread jackylk
Github user jackylk commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/624#discussion_r104564975
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala 
---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object CompareTest {
+
+  val parquetTableName = "comparetest_parquet"
+  val carbonTableName = "comparetest_carbon"
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+import spark.implicits._
+spark.sparkContext.parallelize(1 to 10 * 1000 * 1000, 4)
+.map(x => ("i" + x, "p" + x % 10, "j" + x % 100, x, x + 1, (x + 7) 
% 21, (x + 5) / 43, x
+* 5))
+.toDF("id", "country", "city", "c4", "c5", "c6", "c7", "c8")
+  }
+
+  private def loadParquetTable(spark: SparkSession, input: DataFrame): 
Long = timeit {
+input.write.mode(SaveMode.Overwrite).parquet(parquetTableName)
+  }
+
+  private def loadCarbonTable(spark: SparkSession, input: DataFrame): Long 
= {
+spark.sql(s"drop table if exists $carbonTableName")
+timeit {
+  input.write
+  .format("carbondata")
+  .option("tableName", carbonTableName)
+  .option("tempCSV", "false")
+  .option("single_pass", "true")
+  .option("dictionary_exclude", "id") // id is high cardinality 
column
+  .mode(SaveMode.Overwrite)
+  .save()
+}
+  }
+
+  private def prepareTable(spark: SparkSession): Unit = {
+val df = generateDataFrame(spark).cache()
+println(s"loading dataframe into table, schema: ${df.schema}")
+val loadParquetTime = loadParquetTable(spark, df)
+val loadCarbonTime = loadCarbonTable(spark, df)
+println(s"load completed, time: $loadParquetTime, $loadCarbonTime")
+
spark.read.parquet(parquetTableName).registerTempTable(parquetTableName)
+  }
+
+  private def runQuery(spark: SparkSession): Unit = {
+val test = Array(
+  "select count(*) from $table",
+  "select sum(c4) from $table",
+  "select sum(c4), sum(c5) from $table",
+  "select sum(c4), sum(c5), sum(c6) from $table",
+  "select sum(c4), sum(c5), sum(c6), sum(c7) from $table",
+  "select sum(c4), sum(c5), sum(c6), sum(c7), avg(c8) from $table",
+  "select * from $table where id = 'i999' ",
+  "select * from $table where country = 'p9' ",
+  "select * from $table where city = 'j99' ",
+  "select * from $table where c4 < 1000 "
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #624: [CARBONDATA-747][WIP] Add simple per...

2017-03-06 Thread jackylk
Github user jackylk commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/624#discussion_r104564727
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala 
---
@@ -0,0 +1,347 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+import org.apache.spark.sql.types._
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+/**
+ * A query test case
+ * @param sqlText SQL statement
+ * @param queryType type of query: scan, filter, aggregate, topN
+ * @param desc description of the goal of this test case
+ */
+case class Query(sqlText: String, queryType: String, desc: String)
+
+// scalastyle:off println
+object CompareTest {
+
+  def parquetTableName: String = "comparetest_parquet"
+  def carbonTableName(version: String): String = 
s"comparetest_carbonV$version"
+
+  // Table schema:
+  // +-+---+-+-++
+  // | Column name | Data type | Cardinality | Column type | Dictionary |
+  // +-+---+-+-++
+  // | id  | string| 10,000,000  | dimension   | no |
+  // +-+---+-+-++
+  // | country | string| 1103| dimension   | yes|
+  // +-+---+-+-++
+  // | city| string| 13  | dimension   | yes|
+  // +-+---+-+-++
+  // | c4  | short | NA  | measure | no |
+  // +-+---+-+-++
+  // | c5  | int   | NA  | measure | no |
+  // +-+---+-+-++
+  // | c6  | big int   | NA  | measure | no |
+  // +-+---+-+-++
+  // | c7  | double| NA  | measure | no |
+  // +-+---+-+-++
+  // | c8  | double| NA  | measure | no |
+  // +-+---+-+-++
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+val rdd = spark.sparkContext
+.parallelize(1 to 10 * 1000 * 1000, 4)
+.map { x =>
+  (x.toString, "p" + x % 1103, "j" + x % 13, (x % 31).toShort, x, 
x.toLong * 1000,
+  x.toDouble / 13, x.toDouble / 71 )
+}.map { x =>
+  Row(x._1, x._2, x._3, x._4, x._5, x._6, x._7, x._8)
+}
+val schema = StructType(
+  Seq(
+StructField("id", StringType, nullable = false),
+StructField("country", StringType, nullable = false),
+StructField("city", StringType, nullable = false),
+StructField("c4", ShortType, nullable = true),
+StructField("c5", IntegerType, nullable = true),
+StructField("c6", LongType, nullable = true),
+StructField("c7", DoubleType, nullable = true),
+StructField("c8", DoubleType, nullable = true)
+  )
+)
+spark.createDataFrame(rdd, schema)
+  }
+
+  // performance test queries
+  val queries: Array[Query] = Array(
+Query(
+  "select count(*) from $table",
+  "warm up",
+  "warm up query"
+),
+// 
===
+// ==   FULL SCAN  
 ==
+// 

[jira] [Created] (CARBONDATA-750) Improve exception information description while user input wrong creation table script

2017-03-06 Thread Liang Chen (JIRA)
Liang Chen created CARBONDATA-750:
-

 Summary: Improve exception information description while user 
input wrong creation table script
 Key: CARBONDATA-750
 URL: https://issues.apache.org/jira/browse/CARBONDATA-750
 Project: CarbonData
  Issue Type: Improvement
  Components: sql
Reporter: Liang Chen
Priority: Minor


1. Use wrong creation table script:
scala> carbon.sql("CREATE TABLE carbontable1 (id,int,age string,year,int) 
STORED BY 'carbondata'")
java.lang.RuntimeException: [1.1] failure: identifier matching regex (?i)ALTER 
expected

CREATE TABLE carbontable1 (id,int,age string,year,int) STORED BY 'carbondata'
^
  at scala.sys.package$.error(package.scala:27)
  at 
org.apache.spark.sql.parser.CarbonSpark2SqlParser.parse(CarbonSpark2SqlParser.scala:45)
  at 
org.apache.spark.sql.parser.CarbonSparkSqlParser.parsePlan(CarbonSparkSqlParser.scala:51)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)

2.Need improve the exception information description, like : unexpected "," 
found
CREATE TABLE carbontable1 (id,int,age string,year,int) STORED BY 
 ^





--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-749) Unexpected error log message while dropping carbon table

2017-03-06 Thread Liang Chen (JIRA)
Liang Chen created CARBONDATA-749:
-

 Summary: Unexpected error log message while dropping carbon table
 Key: CARBONDATA-749
 URL: https://issues.apache.org/jira/browse/CARBONDATA-749
 Project: CarbonData
  Issue Type: Bug
  Components: sql
Affects Versions: 1.0.0-incubating
Reporter: Liang Chen
Priority: Minor


1.Create a table with the below script:
carbon.sql("CREATE TABLE carbontable1 (id int, age string, year string) STORED 
BY 'carbondata'")
2.Drop table "carbontable1" with the below script:
carbon.sql("drop table carbontable1")

Unexpected error log message as below:
AUDIT 07-03 07:50:11,944 - [AppledeMacBook-Pro.local][apple][Thread-1]Deleting 
table [carbontable1] under database [default]
AUDIT 07-03 07:50:12,086 - [AppledeMacBook-Pro.local][apple][Thread-1]Creating 
Table with Database name [default] and Table name [carbontable1]
AUDIT 07-03 07:50:12,095 - [AppledeMacBook-Pro.local][apple][Thread-1]Table 
creation with Database name [default] and Table name [carbontable1] failed. 
Table [carbontable1] already exists under database [default]
WARN  07-03 07:50:12,095 - 
org.spark_project.guava.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: Table [carbontable1] already exists under database 
[default]
org.spark_project.guava.util.concurrent.UncheckedExecutionException: 
java.lang.RuntimeException: Table [carbontable1] already exists under database 
[default]
at 
org.spark_project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
at org.spark_project.guava.cache.LocalCache.get(LocalCache.java:4000)
at 
org.spark_project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at 
org.spark_project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at 
org.spark_project.guava.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4880)
at 
org.spark_project.guava.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898)
at 
org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:110)
at 
org.apache.spark.sql.hive.HiveSessionCatalog.lookupRelation(HiveSessionCatalog.scala:69)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:578)
at org.apache.spark.sql.SparkSession.table(SparkSession.scala:574)
at 
org.apache.spark.sql.execution.command.DropTableCommand.run(ddl.scala:203)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
at org.apache.spark.sql.Dataset.(Dataset.scala:185)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
at 
org.apache.spark.sql.hive.CarbonHiveMetadataUtil$.invalidateAndDropTable(CarbonHiveMetadataUtil.scala:44)
at 
org.apache.spark.sql.hive.CarbonMetastore.dropTable(CarbonMetastore.scala:435)
at 
org.apache.spark.sql.execution.command.CarbonDropTableCommand.run(carbonTableSchema.scala:665)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at 

[GitHub] incubator-carbondata pull request #614: [CARBONDATA-714]Documented how to ha...

2017-03-06 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/614#discussion_r104548399
  
--- Diff: docs/faq.md ---
@@ -18,30 +18,57 @@
 -->
 
 # FAQs
-* **Auto Compaction not Working**
 
-The Property carbon.enable.auto.load.merge in carbon.properties need 
to be set to true.
+* [What are Bad Records?](#what-are-bad-records)
+* [Where are Bad Records Stored in 
CarbonData?](#where-are-bad-records-stored-in-carbondata)
+* [How to handle Bad Records?](#how-to-handle-bad-records)
+* [How to resolve store location can’t be 
found?](#how-to-resolve-store-location-can-not-be-found)
+* [What is Carbon Lock Type?](#what-is-carbon-lock-type)
+* [How to resolve Abstract Method 
Error?](#how-to-resolve-abstract-method-error)
 
-* **Getting Abstract method error**
+## What are Bad Records?
+Records that fail to get loaded into the CarbonData due to data type 
incompatibility or are empty or have incompatible format are classified as Bad 
Records.
 
-You need to specify the spark version while using Maven to build 
project.
+## Where are Bad Records Stored in CarbonData?
+The bad records are stored at the location set in 
carbon.badRecords.location in carbon.properties file.
+By default **carbon.badRecords.location** specifies the following location 
``/opt/Carbon/Spark/badrecords``.
 
-* **Getting NotImplementedException for subquery using IN and EXISTS**
+## How to handle Bad Records?
+While loading data we can specify the approach to handle Bad Records. In 
order to analyse the cause of the Bad Records the parameter 
``BAD_RECORDS_LOGGER_ENABLE`` must be set to value ``TRUE``. There are three 
approaches to handle Bad Records which can be specified  by the parameter 
``BAD_RECORDS_ACTION``.
 
-Subquery with in and exists not supported in CarbonData.
-
-* **Getting Exceptions on creating  a view**
-
-View not supported in CarbonData.
-
-* **How to verify if ColumnGroups have been created as desired.**
+- To pad the incorrect values of the csv rows with NULL value and load the 
data in CarbonData, set the following in the query :
+```
+'BAD_RECORDS_ACTION'='FORCE'
+```
+
+- To write the Bad Records without padding incorrect values with NULL in 
the raw csv (set in the parameter **carbon.badRecords.location**), set the 
following in the query :
+```
+'BAD_RECORDS_ACTION'='REDIRECT'
+```
+
+- To ignore the Bad Records from getting stored in the raw csv, we need to 
set the following in the query :
+```
+'BAD_RECORDS_ACTION'='INDIRECT'
+```
+
+## How to resolve store location can not be found?
--- End diff --

Seems the title should be : How to specify storelocation while creating 
carbonsession


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #614: [CARBONDATA-714]Documented how to ha...

2017-03-06 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/614#discussion_r104547671
  
--- Diff: docs/faq.md ---
@@ -18,30 +18,57 @@
 -->
 
 # FAQs
-* **Auto Compaction not Working**
 
-The Property carbon.enable.auto.load.merge in carbon.properties need 
to be set to true.
+* [What are Bad Records?](#what-are-bad-records)
+* [Where are Bad Records Stored in 
CarbonData?](#where-are-bad-records-stored-in-carbondata)
+* [How to handle Bad Records?](#how-to-handle-bad-records)
+* [How to resolve store location can’t be 
found?](#how-to-resolve-store-location-can-not-be-found)
+* [What is Carbon Lock Type?](#what-is-carbon-lock-type)
+* [How to resolve Abstract Method 
Error?](#how-to-resolve-abstract-method-error)
 
-* **Getting Abstract method error**
+## What are Bad Records?
+Records that fail to get loaded into the CarbonData due to data type 
incompatibility or are empty or have incompatible format are classified as Bad 
Records.
 
-You need to specify the spark version while using Maven to build 
project.
+## Where are Bad Records Stored in CarbonData?
+The bad records are stored at the location set in 
carbon.badRecords.location in carbon.properties file.
+By default **carbon.badRecords.location** specifies the following location 
``/opt/Carbon/Spark/badrecords``.
 
-* **Getting NotImplementedException for subquery using IN and EXISTS**
+## How to handle Bad Records?
+While loading data we can specify the approach to handle Bad Records. In 
order to analyse the cause of the Bad Records the parameter 
``BAD_RECORDS_LOGGER_ENABLE`` must be set to value ``TRUE``. There are three 
approaches to handle Bad Records which can be specified  by the parameter 
``BAD_RECORDS_ACTION``.
 
-Subquery with in and exists not supported in CarbonData.
-
-* **Getting Exceptions on creating  a view**
-
-View not supported in CarbonData.
-
-* **How to verify if ColumnGroups have been created as desired.**
+- To pad the incorrect values of the csv rows with NULL value and load the 
data in CarbonData, set the following in the query :
+```
+'BAD_RECORDS_ACTION'='FORCE'
+```
+
+- To write the Bad Records without padding incorrect values with NULL in 
the raw csv (set in the parameter **carbon.badRecords.location**), set the 
following in the query :
+```
+'BAD_RECORDS_ACTION'='REDIRECT'
+```
+
+- To ignore the Bad Records from getting stored in the raw csv, we need to 
set the following in the query :
+```
+'BAD_RECORDS_ACTION'='INDIRECT'
+```
+
+## How to resolve store location can not be found?
+The store location specified while creating carbon session is used by the 
CarbonData to store the meta data like the schema, dictionary files, dictionary 
meta data and sort indexes.
+
+Try creating ``carbonsession`` with ``storepath`` specified in the 
following manner :
+```
+val carbon = 
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession()
+```
+Example:
+```
+val carbon = 
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("hdfs://localhost:9000/carbon/store
 ")
+```
+
+## What is Carbon Lock Type?
--- End diff --

For users, which scenario need to set this parameter for lock?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #622: [CARBONDATA-744] The property "spark...

2017-03-06 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/622#discussion_r104545841
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonScanRDD.scala
 ---
@@ -116,8 +116,8 @@ class CarbonScanRDD(
   i += 1
   result.add(partition)
 }
-  } else if 
(sparkContext.getConf.contains("spark.carbon.custom.distribution") &&
- 
sparkContext.getConf.getBoolean("spark.carbon.custom.distribution", false)) {
+  } else if (java.lang.Boolean
+
.parseBoolean(CarbonProperties.getInstance().getProperty("carbon.custom.distribution")))
 {
--- End diff --

The PR's title mentions that the property is 
"spark.carbon.custom.distribution", but here you change the property name to 
"carbon.custom.distribution", why ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #624: [CARBONDATA-747][WIP] Add simple performanc...

2017-03-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/624
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1017/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #624: [CARBONDATA-747][WIP] Add simple performanc...

2017-03-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/624
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1016/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #625: [CARBONDATA-743] Remove redundant Ca...

2017-03-06 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/625#discussion_r104424923
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/CarbonFilters.scala
 ---
@@ -1,397 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.carbondata.spark
-
-import scala.collection.mutable.ArrayBuffer
-
-import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.optimizer.AttributeReferenceWrapper
-import org.apache.spark.sql.sources
-import org.apache.spark.sql.types.StructType
-
-import org.apache.carbondata.core.metadata.datatype.DataType
-import org.apache.carbondata.core.metadata.schema.table.CarbonTable
-import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
-import org.apache.carbondata.core.scan.expression.{ColumnExpression => 
CarbonColumnExpression, Expression => CarbonExpression, LiteralExpression => 
CarbonLiteralExpression}
-import org.apache.carbondata.core.scan.expression.conditional._
-import org.apache.carbondata.core.scan.expression.logical.{AndExpression, 
FalseExpression, OrExpression}
-import org.apache.carbondata.spark.util.CarbonScalaUtil
-
-/**
- * All filter conversions are done here.
- */
-object CarbonFilters {
-
-  /**
-   * Converts data sources filters to carbon filter predicates.
-   */
-  def createCarbonFilter(schema: StructType,
-  predicate: sources.Filter): Option[CarbonExpression] = {
-val dataTypeOf = schema.map(f => f.name -> f.dataType).toMap
-
-def createFilter(predicate: sources.Filter): Option[CarbonExpression] 
= {
-  predicate match {
-
-case sources.EqualTo(name, value) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.Not(sources.EqualTo(name, value)) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.EqualNullSafe(name, value) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.Not(sources.EqualNullSafe(name, value)) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.GreaterThan(name, value) =>
-  Some(new GreaterThanExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.LessThan(name, value) =>
-  Some(new LessThanExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.GreaterThanOrEqual(name, value) =>
-  Some(new GreaterThanEqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.LessThanOrEqual(name, value) =>
-  Some(new LessThanEqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.In(name, values) =>
-  Some(new InExpression(getCarbonExpression(name),
-new ListExpression(
-  convertToJavaList(values.map(f => 
getCarbonLiteralExpression(name, f)).toList
-case sources.Not(sources.In(name, values)) =>
-  Some(new NotInExpression(getCarbonExpression(name),
-new ListExpression(
-  convertToJavaList(values.map(f => 
getCarbonLiteralExpression(name, f)).toList
-
-case sources.IsNull(name) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, null), true))
-case sources.IsNotNull(name) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-

[GitHub] incubator-carbondata pull request #625: [CARBONDATA-743] Remove redundant Ca...

2017-03-06 Thread chenliang613
Github user chenliang613 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/625#discussion_r104424040
  
--- Diff: 
integration/spark2/src/main/scala/org/apache/carbondata/spark/CarbonFilters.scala
 ---
@@ -1,397 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the "License"); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.carbondata.spark
-
-import scala.collection.mutable.ArrayBuffer
-
-import org.apache.spark.sql.catalyst.expressions._
-import org.apache.spark.sql.optimizer.AttributeReferenceWrapper
-import org.apache.spark.sql.sources
-import org.apache.spark.sql.types.StructType
-
-import org.apache.carbondata.core.metadata.datatype.DataType
-import org.apache.carbondata.core.metadata.schema.table.CarbonTable
-import org.apache.carbondata.core.metadata.schema.table.column.CarbonColumn
-import org.apache.carbondata.core.scan.expression.{ColumnExpression => 
CarbonColumnExpression, Expression => CarbonExpression, LiteralExpression => 
CarbonLiteralExpression}
-import org.apache.carbondata.core.scan.expression.conditional._
-import org.apache.carbondata.core.scan.expression.logical.{AndExpression, 
FalseExpression, OrExpression}
-import org.apache.carbondata.spark.util.CarbonScalaUtil
-
-/**
- * All filter conversions are done here.
- */
-object CarbonFilters {
-
-  /**
-   * Converts data sources filters to carbon filter predicates.
-   */
-  def createCarbonFilter(schema: StructType,
-  predicate: sources.Filter): Option[CarbonExpression] = {
-val dataTypeOf = schema.map(f => f.name -> f.dataType).toMap
-
-def createFilter(predicate: sources.Filter): Option[CarbonExpression] 
= {
-  predicate match {
-
-case sources.EqualTo(name, value) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.Not(sources.EqualTo(name, value)) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.EqualNullSafe(name, value) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.Not(sources.EqualNullSafe(name, value)) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.GreaterThan(name, value) =>
-  Some(new GreaterThanExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.LessThan(name, value) =>
-  Some(new LessThanExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.GreaterThanOrEqual(name, value) =>
-  Some(new GreaterThanEqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-case sources.LessThanOrEqual(name, value) =>
-  Some(new LessThanEqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, value)))
-
-case sources.In(name, values) =>
-  Some(new InExpression(getCarbonExpression(name),
-new ListExpression(
-  convertToJavaList(values.map(f => 
getCarbonLiteralExpression(name, f)).toList
-case sources.Not(sources.In(name, values)) =>
-  Some(new NotInExpression(getCarbonExpression(name),
-new ListExpression(
-  convertToJavaList(values.map(f => 
getCarbonLiteralExpression(name, f)).toList
-
-case sources.IsNull(name) =>
-  Some(new EqualToExpression(getCarbonExpression(name),
-getCarbonLiteralExpression(name, null), true))
-case sources.IsNotNull(name) =>
-  Some(new NotEqualsExpression(getCarbonExpression(name),
-

[jira] [Closed] (CARBONDATA-657) We are not able to create table with shared dictionary columns in spark 2.1

2017-03-06 Thread Payal (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Payal closed CARBONDATA-657.

Resolution: Invalid

> We are not able to create table with shared dictionary columns in spark 2.1
> ---
>
> Key: CARBONDATA-657
> URL: https://issues.apache.org/jira/browse/CARBONDATA-657
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 1.0.0-incubating
> Environment: Spark-2.1
>Reporter: Payal
>Assignee: anubhav tarar
>Priority: Minor
>
> We are not able to create table with shared dictionary columns not working 
> with spark-2.1 but  it is working fine with spark 1.6 
> spark 1.6 logs 
> 0: jdbc:hive2://localhost:1> CREATE TABLE uniq_shared_dictionary (CUST_ID 
> int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ 
> timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
> decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, 
> Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 
> 'org.apache.carbondata.format' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,Double_COLUMN2,DECIMAL_COLUMN2','columnproperties.CUST_ID.shared_column'='shared.CUST_ID','columnproperties.decimal_column2.shared_column'='shared.decimal_column2');
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> in spark 2.1 logs ---
> 0: jdbc:hive2://hadoop-master:1> CREATE TABLE uniq_shared_dictionary 
> (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ 
> timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 
> decimal(30,10), DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, 
> Double_COLUMN2 double,INTEGER_COLUMN1 int) STORED BY 
> 'org.apache.carbondata.format' 
> TBLPROPERTIES('DICTIONARY_INCLUDE'='CUST_ID,Double_COLUMN2,DECIMAL_COLUMN2','columnproperties.CUST_ID.shared_column'='shared.CUST_ID','columnproperties.decimal_column2.shared_column'='shared.decimal_column2');
> Error: org.apache.carbondata.spark.exception.MalformedCarbonCommandException: 
> Invalid table properties columnproperties.cust_id.shared_column 
> (state=,code=0)
> LOGS
> ERROR 18-01 13:31:18,147 - Error executing query, currentState RUNNING, 
> org.apache.carbondata.spark.exception.MalformedCarbonCommandException: 
> Invalid table properties columnproperties.cust_id.shared_column
> at 
> org.apache.carbondata.spark.util.CommonUtil$$anonfun$validateTblProperties$1.apply(CommonUtil.scala:141)
> at 
> org.apache.carbondata.spark.util.CommonUtil$$anonfun$validateTblProperties$1.apply(CommonUtil.scala:137)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.con llection.AbstractIterable.foreach(Iterable.scala:54)
> at 
> org.apache.carbondata.spark.util.CommonUtil$.validateTblProperties(CommonUtil.scala:137)
> at 
> org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable(CarbonSparkSqlParser.scala:135)
> at 
> org.apache.spark.sql.parser.CarbonSqlAstBuilder.visitCreateTable(CarbonSparkSqlParser.scala:60)
> at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser$CreateTableContext.accept(SqlBaseParser.java:503)
> at 
> org.antlr.v4.runtime.tree.AbstractParseTreeVisitor.visit(AbstractParseTreeVisitor.java:42)
> at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66)
> at 
> org.apache.spark.sql.catalyst.parser.AstBuilder$$anonfun$visitSingleStatement$1.apply(AstBuilder.scala:66)
> at 
> org.apache.spark.sql.catalyst.parser.ParserUtils$.withOrigin(ParserUtils.scala:93)
> at 
> org.apache.spark.sql.catalyst.parser.AstBuilder.visitSingleStatement(AstBuilder.scala:65)
> at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:54)
> at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser$$anonfun$parsePlan$1.apply(ParseDriver.scala:53)
> at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:82)
> at 
> org.apache.spark.sql.parser.CarbonSparkSqlParser.parse(CarbonSparkSqlParser.scala:45)
> but if we give column name in lower case in spark 2.1 it works fine
> spark 2.1
> CREATE TABLE uniq_shared_dictionary (cust_id int,CUST_NAME 
> String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, 
> BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), 
> decimal_column2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 
> double,INTEGER_COLUMN1 int) STORED BY 'org.apache.carbondata.format' 
> 

[GitHub] incubator-carbondata issue #616: [CARBONDATA-708] Fixed Between Filter Issue...

2017-03-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/616
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1015/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #620: [CARBONDATA-742]Added batch sort to ...

2017-03-06 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/620#discussion_r104385171
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/sort/impl/UnsafeBatchParallelReadMergeSorterImpl.java
 ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.carbondata.processing.newflow.sort.impl;
+
+import java.util.Iterator;
+import java.util.List;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.carbondata.common.CarbonIterator;
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import org.apache.carbondata.processing.newflow.row.CarbonSortBatch;
+import org.apache.carbondata.processing.newflow.sort.Sorter;
+import 
org.apache.carbondata.processing.newflow.sort.unsafe.UnsafeCarbonRowPage;
+import 
org.apache.carbondata.processing.newflow.sort.unsafe.UnsafeSortDataRows;
+import 
org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeIntermediateMerger;
+import 
org.apache.carbondata.processing.newflow.sort.unsafe.merger.UnsafeSingleThreadFinalSortFilesMerger;
+import 
org.apache.carbondata.processing.sortandgroupby.exception.CarbonSortKeyAndGroupByException;
+import 
org.apache.carbondata.processing.sortandgroupby.sortdata.SortParameters;
+import 
org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+
+/**
+ * It parallely reads data from array of iterates and do merge sort.
+ * It sorts data in batches and send to the next step.
+ */
+public class UnsafeBatchParallelReadMergeSorterImpl implements Sorter {
--- End diff --

Yes we do sort in-memory, it sorts the data chunk by chunk (default size 64 
MB) and kept them in memory, once the batch memory reaches then it starts merge 
sort and gives to the data writer. This approach is faster than sort the big 
batch of records once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #624: [CARBONDATA-747][WIP] Add simple per...

2017-03-06 Thread jarray888
Github user jarray888 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/624#discussion_r104374666
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala 
---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object CompareTest {
+
+  val parquetTableName = "comparetest_parquet"
+  val carbonTableName = "comparetest_carbon"
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+import spark.implicits._
+spark.sparkContext.parallelize(1 to 10 * 1000 * 1000, 4)
+.map(x => ("i" + x, "p" + x % 10, "j" + x % 100, x, x + 1, (x + 7) 
% 21, (x + 5) / 43, x
+* 5))
+.toDF("id", "country", "city", "c4", "c5", "c6", "c7", "c8")
--- End diff --

can you add a column using decimal data type?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #624: [CARBONDATA-747][WIP] Add simple per...

2017-03-06 Thread jarray888
Github user jarray888 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/624#discussion_r104370904
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala 
---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object CompareTest {
+
+  val parquetTableName = "comparetest_parquet"
+  val carbonTableName = "comparetest_carbon"
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+import spark.implicits._
+spark.sparkContext.parallelize(1 to 10 * 1000 * 1000, 4)
+.map(x => ("i" + x, "p" + x % 10, "j" + x % 100, x, x + 1, (x + 7) 
% 21, (x + 5) / 43, x
--- End diff --

To simulate a real-life data, please make the data unsorted, like
`map(x => ("i" + randon number, "p" + x % 13, "j" + x % 97, ...)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #624: [CARBONDATA-747][WIP] Add simple per...

2017-03-06 Thread jarray888
Github user jarray888 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/624#discussion_r104369883
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala 
---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object CompareTest {
+
+  val parquetTableName = "comparetest_parquet"
+  val carbonTableName = "comparetest_carbon"
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+import spark.implicits._
+spark.sparkContext.parallelize(1 to 10 * 1000 * 1000, 4)
+.map(x => ("i" + x, "p" + x % 10, "j" + x % 100, x, x + 1, (x + 7) 
% 21, (x + 5) / 43, x
+* 5))
+.toDF("id", "country", "city", "c4", "c5", "c6", "c7", "c8")
+  }
+
+  private def loadParquetTable(spark: SparkSession, input: DataFrame): 
Long = timeit {
+input.write.mode(SaveMode.Overwrite).parquet(parquetTableName)
--- End diff --

suggest to use last char of id column to do partition on parquet, so the 
comparison is fare.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata pull request #624: [CARBONDATA-747][WIP] Add simple per...

2017-03-06 Thread jarray888
Github user jarray888 commented on a diff in the pull request:


https://github.com/apache/incubator-carbondata/pull/624#discussion_r104369679
  
--- Diff: 
examples/spark2/src/main/scala/org/apache/carbondata/examples/CompareTest.scala 
---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.examples
+
+import org.apache.spark.sql.{DataFrame, Row, SaveMode, SparkSession}
+
+import org.apache.carbondata.core.util.CarbonProperties
+
+// scalastyle:off println
+object CompareTest {
+
+  val parquetTableName = "comparetest_parquet"
+  val carbonTableName = "comparetest_carbon"
+
+  private def generateDataFrame(spark: SparkSession): DataFrame = {
+import spark.implicits._
+spark.sparkContext.parallelize(1 to 10 * 1000 * 1000, 4)
+.map(x => ("i" + x, "p" + x % 10, "j" + x % 100, x, x + 1, (x + 7) 
% 21, (x + 5) / 43, x
+* 5))
+.toDF("id", "country", "city", "c4", "c5", "c6", "c7", "c8")
+  }
+
+  private def loadParquetTable(spark: SparkSession, input: DataFrame): 
Long = timeit {
+input.write.mode(SaveMode.Overwrite).parquet(parquetTableName)
+  }
+
+  private def loadCarbonTable(spark: SparkSession, input: DataFrame): Long 
= {
+spark.sql(s"drop table if exists $carbonTableName")
+timeit {
+  input.write
+  .format("carbondata")
+  .option("tableName", carbonTableName)
+  .option("tempCSV", "false")
+  .option("single_pass", "true")
+  .option("dictionary_exclude", "id") // id is high cardinality 
column
+  .mode(SaveMode.Overwrite)
+  .save()
+}
+  }
+
+  private def prepareTable(spark: SparkSession): Unit = {
+val df = generateDataFrame(spark).cache()
+println(s"loading dataframe into table, schema: ${df.schema}")
+val loadParquetTime = loadParquetTable(spark, df)
+val loadCarbonTime = loadCarbonTable(spark, df)
+println(s"load completed, time: $loadParquetTime, $loadCarbonTime")
+
spark.read.parquet(parquetTableName).registerTempTable(parquetTableName)
+  }
+
+  private def runQuery(spark: SparkSession): Unit = {
+val test = Array(
+  "select count(*) from $table",
+  "select sum(c4) from $table",
+  "select sum(c4), sum(c5) from $table",
+  "select sum(c4), sum(c5), sum(c6) from $table",
+  "select sum(c4), sum(c5), sum(c6), sum(c7) from $table",
+  "select sum(c4), sum(c5), sum(c6), sum(c7), avg(c8) from $table",
+  "select * from $table where id = 'i999' ",
+  "select * from $table where country = 'p9' ",
+  "select * from $table where city = 'j99' ",
+  "select * from $table where c4 < 1000 "
--- End diff --

please add more testcase, for example:
 "select sum(c4) from $table where id like 'i1%' "
 "select sum(c4) from $table where id like '%10' "
 "select sum(c4) from $table where id like '%xyz%' "



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-carbondata issue #625: [CARBONDATA-743] Remove redundant CarbonFil...

2017-03-06 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/incubator-carbondata/pull/625
  
Build Success with Spark 1.6.2, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder/1014/



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---