[GitHub] carbondata issue #1439: [CARBONDATA-1628] Re-factory LoadTableCommand to reu...

2017-10-28 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1439
  
Can you tag the PR title with [Streaming] as discussed in maillist


---


[GitHub] carbondata pull request #1437: [CARBONDATA-1618] Fix issue of not support ta...

2017-10-28 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1437#discussion_r147570132
  
--- Diff: 
integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableWithTableComment.scala
 ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.createTable
+
+import org.apache.spark.sql.test.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+
+/**
+ * test functionality for create table with table comment
+ */
+class TestCreateTableWithTableComment extends QueryTest with 
BeforeAndAfterAll {
+
+  override def beforeAll {
+sql("use default")
+sql("drop table if exists withTableComment")
+sql("drop table if exists withoutTableComment")
+  }
+
+  test("test create table with table comment") {
+sql(
+  s"""
+ | create table withTableComment(
+ | id int,
+ | name string
+ | )
+ | comment "This table has table comment"
+ | STORED BY 'carbondata'
+   """.stripMargin
+)
+
+val result = sql("describe formatted withTableComment")
+
+checkExistence(result, true, "Comment:")
+checkExistence(result, true, "This table has table comment")
+  }
+
+  test("test create table without table comment") {
+sql(
+  s"""
+ | create table withoutTableComment(
+ | id int,
+ | name string
+ | )
+ | STORED BY 'carbondata'
+   """.stripMargin
+)
+
+val result = sql("describe formatted withoutTableComment")
+
+checkExistence(result, true, "Comment:")
--- End diff --

Can you assert the string include "This table has table comment" also?


---


[GitHub] carbondata issue #1412: [CARBONDATA-1510] UDF test case added

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1412
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1361/



---


[GitHub] carbondata issue #1412: [CARBONDATA-1510] UDF test case added

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1412
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/724/



---


[GitHub] carbondata issue #1429: [WIP] Add StructType and ArrayType class

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1429
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1360/



---


[jira] [Resolved] (CARBONDATA-1653) Rename aggType to measureType

2017-10-28 Thread Liang Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Chen resolved CARBONDATA-1653.

Resolution: Fixed

> Rename aggType to measureType
> -
>
> Key: CARBONDATA-1653
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1653
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Minor
> Fix For: 1.3.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There are many 'aggType' in the code but its meaning is not clear



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1653) Rename aggType to measureType

2017-10-28 Thread Liang Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Chen updated CARBONDATA-1653:
---
Priority: Minor  (was: Major)

> Rename aggType to measureType
> -
>
> Key: CARBONDATA-1653
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1653
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Minor
> Fix For: 1.3.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There are many 'aggType' in the code but its meaning is not clear



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] carbondata pull request #1444: [CARBONDATA-1653] Rename aggType to measureDa...

2017-10-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1444


---


[GitHub] carbondata issue #1444: [CARBONDATA-1653] Rename aggType to measureDataType

2017-10-28 Thread chenliang613
Github user chenliang613 commented on the issue:

https://github.com/apache/carbondata/pull/1444
  
LGTM


---


[GitHub] carbondata issue #1414: [CARBONDATA-1574] No_Inverted is applied for all new...

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1414
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1358/



---


[jira] [Updated] (CARBONDATA-1655) getSplits function is very slow !!!

2017-10-28 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1655:
--
Description: 
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:258)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
```
{code}

spark-sql> select dt from  dm_test.table_carbondata limit 1;
NULL
Time taken: 20.94 seconds, Fetched 1 row(s)

If the query don't contains sort column, prune should return quickly


  was:
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 

[jira] [Updated] (CARBONDATA-1655) getSplits function is very slow !!!

2017-10-28 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1655:
--
Description: 
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:258)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
```
{code}

spark-sql> select dt from  dm_test.table_carbondata limit 1;
NULL
Time taken: 20.94 seconds, Fetched 1 row(s)

If the query don't contains sort column, prune should return quickly!!! 


  was:
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 

[GitHub] carbondata issue #1442: [CARBONDATA-1652] Add examples for Carbon usage when...

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1442
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1356/



---


[jira] [Updated] (CARBONDATA-1655) getSplits function is very slow !!!

2017-10-28 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1655:
--
Description: 
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:258)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
```
{code}

spark-sql> select dt from  dm_test.table_carbondata limit 1;
NULL
Time taken: 20.94 seconds, Fetched 1 row(s)


  was:
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 

[jira] [Created] (CARBONDATA-1655) getSplits function is very slow !!!

2017-10-28 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1655:
-

 Summary: getSplits function is very slow !!!
 Key: CARBONDATA-1655
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1655
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Reporter: cen yuhai


I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:258)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
```
{code}

spark-sql> select dt from  dm_test.dm_trd_order_wide_carbondata limit 1;
NULL
Time taken: 20.94 seconds, Fetched 1 row(s)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] carbondata issue #1291: [CARBONDATA-1343] Hive can't query data when the car...

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1291
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1355/



---


[GitHub] carbondata issue #1386: [CARBONDATA-1513] bad-record for complex data type s...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1386
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/723/



---


[GitHub] carbondata issue #1429: [WIP] Add StructType and ArrayType class

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1429
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/722/



---


[GitHub] carbondata pull request #1442: [CARBONDATA-1652] Add examples for Carbon usa...

2017-10-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/carbondata/pull/1442


---


[GitHub] carbondata issue #1424: [CARBONDATA-1602] Remove unused declaration in spark...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1424
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/720/



---


[GitHub] carbondata issue #1291: [CARBONDATA-1343] Hive can't query data when the car...

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1291
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1352/



---


[GitHub] carbondata issue #1291: [CARBONDATA-1343] Hive can't query data when the car...

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1291
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1351/



---


[GitHub] carbondata issue #1442: [CARBONDATA-1652] Add examples for Carbon usage when...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1442
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/719/



---


[GitHub] carbondata issue #1386: [CARBONDATA-1513] bad-record for complex data type s...

2017-10-28 Thread rahulforallp
Github user rahulforallp commented on the issue:

https://github.com/apache/carbondata/pull/1386
  
retest this please


---


[GitHub] carbondata issue #1414: [CARBONDATA-1574] No_Inverted is applied for all new...

2017-10-28 Thread rahulforallp
Github user rahulforallp commented on the issue:

https://github.com/apache/carbondata/pull/1414
  
retest this please


---


[GitHub] carbondata issue #1424: [CARBONDATA-1602] Remove unused declaration in spark...

2017-10-28 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1424
  
retest this please


---


[GitHub] carbondata issue #1424: [CARBONDATA-1602] Remove unused declaration in spark...

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1424
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1350/



---


[GitHub] carbondata issue #1291: [CARBONDATA-1343] Hive can't query data when the car...

2017-10-28 Thread cenyuhai
Github user cenyuhai commented on the issue:

https://github.com/apache/carbondata/pull/1291
  
@chenliang613 


---


[GitHub] carbondata issue #1291: [CARBONDATA-1343] Hive can't query data when the car...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1291
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/718/



---


[GitHub] carbondata issue #1436: [CARBONDATA-1617] Merging carbonindex files within s...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1436
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/717/



---


[GitHub] carbondata issue #1436: [CARBONDATA-1617] Merging carbonindex files within s...

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1436
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1349/



---


[GitHub] carbondata issue #1424: [CARBONDATA-1602] Remove unused declaration in spark...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1424
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/715/



---


[GitHub] carbondata issue #1291: [CARBONDATA-1343] Hive can't query data when the car...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1291
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/716/



---


[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1436
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1348/



---


[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1436
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/714/



---


[jira] [Updated] (CARBONDATA-1654) NullPointerException when insert overwrite table

2017-10-28 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1654:
--
Summary: NullPointerException when insert overwrite table  (was: 
NullPointerException when insert overwrite talbe )

> NullPointerException when insert overwrite table
> 
>
> Key: CARBONDATA-1654
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1654
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.2.0
> Environment: spark 2.1.1 carbondata 1.2.0
>Reporter: cen yuhai
>Priority: Critical
>
> carbon.sql("insert overwrite table carbondata_table select * from hive_table 
> where dt = '2017-10-10' ").collect
> carbondata wanto find directory Segment_1, but there is Segment_2
> {code}
> [Stage 0:>  (0 + 504) / 
> 504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- 
> SparkUI-174]: The following warnings have been detected: WARNING: The 
> (sub)resource method stageData in 
> org.apache.spark.status.api.v1.OneStageResource contains empty path 
> annotation.
> 17/10/28 19:25:20 ERROR 
> [org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) 
> -- main]: main Exception occurred:File does not exist: 
> hdfs://bipcluster/user/master/carbon/store/dm_test/dm_trd_order_wide_carbondata/Fact/Part0/Segment_1
> 17/10/28 19:25:22 ERROR 
> [org.apache.spark.sql.execution.command.LoadTable(143) -- main]: main 
> java.lang.NullPointerException
> at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)
> at 
> org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)
> at 
> org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)
> at 
> org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)
> at 
> org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
> at 
> org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)
> at 
> org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331)
> at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)
> at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)
> at 
> org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)
> at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)
> at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)
> at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
> at org.apache.spark.sql.Dataset.(Dataset.scala:180)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619)
> at 
> $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36)
> at 
> $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49)
> at $line23.$read$$iw$$iw$$iw$$iw.(:51)
> at $line23.$read$$iw$$iw$$iw.(:53)
> at $line23.$read$$iw$$iw.(:55)
> at $line23.$read$$iw.(:57)
> at $line23.$read.(:59)
> at $line23.$read$.(:63)
> at $line23.$read$.()
> at $line23.$eval$.$print$lzycompute(:7)
> at $line23.$eval$.$print(:6)
> at $line23.$eval.$print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 

[GitHub] carbondata pull request #1424: [CARBONDATA-1602] Remove unused declaration i...

2017-10-28 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1424#discussion_r147552684
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/rdd/CarbonIUDMergerRDD.scala
 ---
@@ -59,18 +59,8 @@ class CarbonIUDMergerRDD[K, V](
 val jobConf: JobConf = new JobConf(new Configuration)
 val job: Job = new Job(jobConf)
 val format = 
CarbonInputFormatUtil.createCarbonInputFormat(absoluteTableIdentifier, job)
-var defaultParallelism = sparkContext.defaultParallelism
-val result = new util.ArrayList[Partition](defaultParallelism)
-
-// mapping of the node and block list.
-var nodeMapping: util.Map[String, util.List[Distributable]] = new
-util.HashMap[String, util.List[Distributable]]
-
-var noOfBlocks = 0
-
-val taskInfoList = new util.ArrayList[Distributable]
-
-var blocksOfLastSegment: List[TableBlockInfo] = null
--- End diff --

All these variables are not used


---


[jira] [Updated] (CARBONDATA-1654) NullPointerException when insert overwrite talbe

2017-10-28 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1654:
--
Description: 
carbondata wanto find directory Segment_1, but there is Segment_2
{code}
[Stage 0:>  (0 + 504) / 
504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- 
SparkUI-174]: The following warnings have been detected: WARNING: The 
(sub)resource method stageData in 
org.apache.spark.status.api.v1.OneStageResource contains empty path annotation.

17/10/28 19:25:20 ERROR 
[org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) -- 
main]: main Exception occurred:File does not exist: 
hdfs://bipcluster/user/master/carbon/store/dm_test/dm_trd_order_wide_carbondata/Fact/Part0/Segment_1
17/10/28 19:25:22 ERROR [org.apache.spark.sql.execution.command.LoadTable(143) 
-- main]: main 
java.lang.NullPointerException
at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)
at 
org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)
at 
org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)
at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)
at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at 
org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)
at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)
at 
org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)
at 
org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)
at 
org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)
at 
org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
at org.apache.spark.sql.Dataset.(Dataset.scala:180)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619)
at 
$line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36)
at 
$line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47)
at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49)
at $line23.$read$$iw$$iw$$iw$$iw.(:51)
at $line23.$read$$iw$$iw$$iw.(:53)
at $line23.$read$$iw$$iw.(:55)
at $line23.$read$$iw.(:57)
at $line23.$read.(:59)
at $line23.$read$.(:63)
at $line23.$read$.()
at $line23.$eval$.$print$lzycompute(:7)
at $line23.$eval$.$print(:6)
at $line23.$eval.$print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at 
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at 
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at 
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at 

[jira] [Created] (CARBONDATA-1654) NullPointerException when insert overwrite talbe

2017-10-28 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1654:
-

 Summary: NullPointerException when insert overwrite talbe 
 Key: CARBONDATA-1654
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1654
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.2.0
 Environment: spark 2.1.1 carbondata 1.2.0
Reporter: cen yuhai
Priority: Critical


{code}
[Stage 0:>  (0 + 504) / 
504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- 
SparkUI-174]: The following warnings have been detected: WARNING: The 
(sub)resource method stageData in 
org.apache.spark.status.api.v1.OneStageResource contains empty path annotation.

17/10/28 19:25:20 ERROR 
[org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) -- 
main]: main Exception occurred:File does not exist: 
hdfs://bipcluster/user/master/carbon/store/dm_test/dm_trd_order_wide_carbondata/Fact/Part0/Segment_1
17/10/28 19:25:22 ERROR [org.apache.spark.sql.execution.command.LoadTable(143) 
-- main]: main 
java.lang.NullPointerException
at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)
at 
org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)
at 
org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)
at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)
at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at 
org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)
at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)
at 
org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)
at 
org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)
at 
org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)
at 
org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
at org.apache.spark.sql.Dataset.(Dataset.scala:180)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619)
at 
$line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36)
at 
$line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47)
at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49)
at $line23.$read$$iw$$iw$$iw$$iw.(:51)
at $line23.$read$$iw$$iw$$iw.(:53)
at $line23.$read$$iw$$iw.(:55)
at $line23.$read$$iw.(:57)
at $line23.$read.(:59)
at $line23.$read$.(:63)
at $line23.$read$.()
at $line23.$eval$.$print$lzycompute(:7)
at $line23.$eval$.$print(:6)
at $line23.$eval.$print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at 
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at 
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at 
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at 

[GitHub] carbondata issue #1424: [CARBONDATA-1602] Remove unused declaration in spark...

2017-10-28 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1424
  
retest this please


---


[GitHub] carbondata issue #1444: [CARBONDATA-1653] Rename aggType to measureDataType

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1444
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1347/



---


[GitHub] carbondata issue #1436: [WIP][CARBONDATA-1617] Merging carbonindex files wit...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1436
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/712/



---


[GitHub] carbondata issue #1444: [CARBONDATA-1653] Rename aggType to measureDataType

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1444
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/711/



---


[GitHub] carbondata issue #1442: [CARBONDATA-1652] Add examples for Carbon usage when...

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1442
  
SDV Build Success , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1346/



---


[GitHub] carbondata issue #1443: [CARBONDATA-1524][CARBONDATA-1525] Added support for...

2017-10-28 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1443
  
Please tag the PR title with [AggTable]


---


[GitHub] carbondata pull request #1444: [CARBONDATA-1653] Rename aggType to measureDa...

2017-10-28 Thread jackylk
GitHub user jackylk opened a pull request:

https://github.com/apache/carbondata/pull/1444

[CARBONDATA-1653] Rename aggType to measureDataType

There are many `aggType` in the code, they should be rename to 
`measureDataType` for better readiability

 - [X] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
   
 - [X] Make sure to add PR description

 - [X ] Any interfaces changed?
   None

 - [X] Any backward compatibility impacted?
   None

 - [X ] Document update required?
   None

 - [X ] Testing done
  Yes
  No new testcase is added
 
 - [X] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata aggtype

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/carbondata/pull/1444.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1444


commit 0a0781a2d21317bbdf92f20cfdb6606043eeda92
Author: Jacky Li 
Date:   2017-10-28T10:56:38Z

rename aggType




---


[GitHub] carbondata issue #1442: [CARBONDATA-1652] Add examples for Carbon usage when...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1442
  
Build Success with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/710/



---


[GitHub] carbondata pull request #1439: [CARBONDATA-1628] Re-factory LoadTableCommand...

2017-10-28 Thread jackylk
Github user jackylk commented on a diff in the pull request:

https://github.com/apache/carbondata/pull/1439#discussion_r147551479
  
--- Diff: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/DataLoadingUtil.scala
 ---
@@ -0,0 +1,300 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.util
+
+import scala.collection.immutable
+import scala.collection.JavaConverters._
+import scala.collection.mutable
+
+import org.apache.commons.lang3.StringUtils
+
+import org.apache.carbondata.common.constants.LoggerAction
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.{CarbonCommonConstants, 
CarbonLoadOptionConstants}
+import org.apache.carbondata.core.metadata.schema.table.CarbonTable
+import org.apache.carbondata.core.util.{CarbonProperties, CarbonUtil}
+import 
org.apache.carbondata.processing.loading.constants.DataLoadProcessorConstants
+import 
org.apache.carbondata.processing.loading.model.{CarbonDataLoadSchema, 
CarbonLoadModel}
+import org.apache.carbondata.processing.util.TableOptionConstant
+import 
org.apache.carbondata.spark.exception.MalformedCarbonCommandException
+import org.apache.carbondata.spark.load.ValidateUtil
+
+/**
+ * the util object of data loading
+ */
+object DataLoadingUtil {
--- End diff --

Can you move this class and 
org.apache.carbondata.spark.exception.MalformedCarbonCommandException,  
org.apache.carbondata.spark.load.ValidateUtil to carbon-processing module?
You can still keep it as scala code. Since it is not depend on spark, I 
think it is better to move to processing module


---


[GitHub] carbondata issue #1439: [CARBONDATA-1628] Re-factory LoadTableCommand to reu...

2017-10-28 Thread jackylk
Github user jackylk commented on the issue:

https://github.com/apache/carbondata/pull/1439
  
LGTM


---


[jira] [Created] (CARBONDATA-1652) Add example for spark integration

2017-10-28 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-1652:


 Summary: Add example for spark integration
 Key: CARBONDATA-1652
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1652
 Project: CarbonData
  Issue Type: Improvement
Reporter: Jacky Li


It is good to have more examples for user reference.
This PR adds back examples from spark-example module in earlier spark 1 
integration



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (CARBONDATA-1652) Add example for spark integration

2017-10-28 Thread Jacky Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacky Li reassigned CARBONDATA-1652:


 Assignee: Jacky Li
Fix Version/s: 1.3.0

> Add example for spark integration
> -
>
> Key: CARBONDATA-1652
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1652
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jacky Li
>Assignee: Jacky Li
> Fix For: 1.3.0
>
>
> It is good to have more examples for user reference.
> This PR adds back examples from spark-example module in earlier spark 1 
> integration



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[GitHub] carbondata issue #1443: [CARBONDATA-1524][CARBONDATA-1525] Added support for...

2017-10-28 Thread ravipesala
Github user ravipesala commented on the issue:

https://github.com/apache/carbondata/pull/1443
  
SDV Build Fail , Please check CI 
http://144.76.159.231:8080/job/ApacheSDVTests/1345/



---


[GitHub] carbondata issue #1443: [CARBONDATA-1524][CARBONDATA-1525] Added support for...

2017-10-28 Thread CarbonDataQA
Github user CarbonDataQA commented on the issue:

https://github.com/apache/carbondata/pull/1443
  
Build Failed  with Spark 2.1.0, Please check CI 
http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/709/



---


[GitHub] carbondata issue #1443: [CARBONDATA-1524][CARBONDATA-1525] Added support for...

2017-10-28 Thread kunal642
Github user kunal642 commented on the issue:

https://github.com/apache/carbondata/pull/1443
  
retest this please


---