[jira] [Commented] (CARBONDATA-1657) Partition column is empty when insert from a hive table

2017-11-01 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233951#comment-16233951
 ] 

cen yuhai commented on CARBONDATA-1657:
---

dt's datatype is string

> Partition column is empty when insert from a hive table
> ---
>
> Key: CARBONDATA-1657
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1657
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.2.0
> Environment: carbonata1.2.0 spark 2.1.1
>Reporter: cen yuhai
>Priority: Critical
>
> I create table a carbon table, the schema is like a hive table(dt is the 
> partition column).
> And then
> {code}
> insert overwrite table dm_test.dm_trd_wide_carbondata select * from 
> hive_table where dt='2017-10-10';
> insert overwrite table dm_test.dm_trd_wide_parquet select * from hive_table 
> where dt='2017-10-10';
> {code}
> {code}
> spark-sql> select dt from dm_test.dm_trd_wide_parquet limit 10;
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> Time taken: 1.259 seconds, Fetched 10 row(s)
> spark-sql> select dt from dm_test.dm_trd_wide_carbondata limit 10;
> NULL
> NULL
> NULL
> NULL
> NULL
> NULL
> NULL
> NULL
> NULL
> NULL
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (CARBONDATA-1657) Partition column is empty when insert from a hive table

2017-11-01 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233951#comment-16233951
 ] 

cen yuhai edited comment on CARBONDATA-1657 at 11/1/17 11:36 AM:
-

the datatype of dt is string


was (Author: cenyuhai):
dt's datatype is string

> Partition column is empty when insert from a hive table
> ---
>
> Key: CARBONDATA-1657
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1657
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.2.0
> Environment: carbonata1.2.0 spark 2.1.1
>Reporter: cen yuhai
>Priority: Critical
>
> I create table a carbon table, the schema is like a hive table(dt is the 
> partition column).
> And then
> {code}
> insert overwrite table dm_test.dm_trd_wide_carbondata select * from 
> hive_table where dt='2017-10-10';
> insert overwrite table dm_test.dm_trd_wide_parquet select * from hive_table 
> where dt='2017-10-10';
> {code}
> {code}
> spark-sql> select dt from dm_test.dm_trd_wide_parquet limit 10;
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> 2017-10-10
> Time taken: 1.259 seconds, Fetched 10 row(s)
> spark-sql> select dt from dm_test.dm_trd_wide_carbondata limit 10;
> NULL
> NULL
> NULL
> NULL
> NULL
> NULL
> NULL
> NULL
> NULL
> NULL
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1654) NullPointerException when insert overwrite table

2017-10-30 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226131#comment-16226131
 ] 

cen yuhai commented on CARBONDATA-1654:
---

I update the description

> NullPointerException when insert overwrite table
> 
>
> Key: CARBONDATA-1654
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1654
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.2.0
> Environment: spark 2.1.1 carbondata 1.2.0
>Reporter: cen yuhai
>Priority: Critical
>
> carbon.sql("insert overwrite table carbondata_table select * from hive_table 
> where dt = '2017-10-10' ").collect
> carbondata wanto find directory Segment_1, but there is Segment_2
> {code}
> [Stage 0:>  (0 + 504) / 
> 504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- 
> SparkUI-174]: The following warnings have been detected: WARNING: The 
> (sub)resource method stageData in 
> org.apache.spark.status.api.v1.OneStageResource contains empty path 
> annotation.
> 17/10/28 19:25:20 ERROR 
> [org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) 
> -- main]: main Exception occurred:File does not exist: 
> hdfs://bipcluster/user/master/carbon/store/dm_test/carbondata_table/Fact/Part0/Segment_1
> 17/10/28 19:25:22 ERROR 
> [org.apache.spark.sql.execution.command.LoadTable(143) -- main]: main 
> java.lang.NullPointerException
> at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)
> at 
> org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)
> at 
> org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)
> at 
> org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)
> at 
> org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
> at 
> org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)
> at 
> org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331)
> at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)
> at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)
> at 
> org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)
> at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)
> at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)
> at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
> at org.apache.spark.sql.Dataset.(Dataset.scala:180)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619)
> at 
> $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36)
> at 
> $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49)
> at $line23.$read$$iw$$iw$$iw$$iw.(:51)
> at $line23.$read$$iw$$iw$$iw.(:53)
> at $line23.$read$$iw$$iw.(:55)
> at $line23.$read$$iw.(:57)
> at $line23.$read.(:59)
> at $line23.$read$.(:63)
> at $line23.$read$.()
> at $line23.$eval$.$print$lzycompute(:7)
> at $line23.$eval$.$print(:6)
> at $line23.$eval.$print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> 

[jira] [Updated] (CARBONDATA-1654) NullPointerException when insert overwrite table

2017-10-30 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1654:
--
Description: 
carbon.sql("insert overwrite table carbondata_table select * from hive_table 
where dt = '2017-10-10' ").collect
carbondata wanto find directory Segment_1, but there is Segment_2
{code}
[Stage 0:>  (0 + 504) / 
504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- 
SparkUI-174]: The following warnings have been detected: WARNING: The 
(sub)resource method stageData in 
org.apache.spark.status.api.v1.OneStageResource contains empty path annotation.

17/10/28 19:25:20 ERROR 
[org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) -- 
main]: main Exception occurred:File does not exist: 
hdfs://bipcluster/user/master/carbon/store/dm_test/carbondata_table/Fact/Part0/Segment_1
17/10/28 19:25:22 ERROR [org.apache.spark.sql.execution.command.LoadTable(143) 
-- main]: main 
java.lang.NullPointerException
at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)
at 
org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)
at 
org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)
at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)
at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at 
org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)
at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)
at 
org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)
at 
org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)
at 
org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)
at 
org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
at org.apache.spark.sql.Dataset.(Dataset.scala:180)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619)
at 
$line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36)
at 
$line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47)
at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49)
at $line23.$read$$iw$$iw$$iw$$iw.(:51)
at $line23.$read$$iw$$iw$$iw.(:53)
at $line23.$read$$iw$$iw.(:55)
at $line23.$read$$iw.(:57)
at $line23.$read.(:59)
at $line23.$read$.(:63)
at $line23.$read$.()
at $line23.$eval$.$print$lzycompute(:7)
at $line23.$eval$.$print(:6)
at $line23.$eval.$print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at 
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at 
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at 
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at 

[jira] [Created] (CARBONDATA-1657) Partition column is empty when insert from a hive table

2017-10-30 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1657:
-

 Summary: Partition column is empty when insert from a hive table
 Key: CARBONDATA-1657
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1657
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.2.0
 Environment: carbonata1.2.0 spark 2.1.1
Reporter: cen yuhai
Priority: Critical


I create table a carbon table, the schema is like a hive table(dt is the 
partition column).
And then
{code}
insert overwrite table dm_test.dm_trd_wide_carbondata select * from hive_table 
where dt='2017-10-10';
insert overwrite table dm_test.dm_trd_wide_parquet select * from hive_table 
where dt='2017-10-10';
{code}


{code}
spark-sql> select dt from dm_test.dm_trd_wide_parquet limit 10;
2017-10-10
2017-10-10
2017-10-10
2017-10-10
2017-10-10
2017-10-10
2017-10-10
2017-10-10
2017-10-10
2017-10-10
Time taken: 1.259 seconds, Fetched 10 row(s)
spark-sql> select dt from dm_test.dm_trd_wide_carbondata limit 10;
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
NULL
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1654) NullPointerException when insert overwrite table

2017-10-30 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224453#comment-16224453
 ] 

cen yuhai commented on CARBONDATA-1654:
---

No, I can't. Why I should upate schema?

> NullPointerException when insert overwrite table
> 
>
> Key: CARBONDATA-1654
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1654
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.2.0
> Environment: spark 2.1.1 carbondata 1.2.0
>Reporter: cen yuhai
>Priority: Critical
>
> carbon.sql("insert overwrite table carbondata_table select * from hive_table 
> where dt = '2017-10-10' ").collect
> carbondata wanto find directory Segment_1, but there is Segment_2
> {code}
> [Stage 0:>  (0 + 504) / 
> 504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- 
> SparkUI-174]: The following warnings have been detected: WARNING: The 
> (sub)resource method stageData in 
> org.apache.spark.status.api.v1.OneStageResource contains empty path 
> annotation.
> 17/10/28 19:25:20 ERROR 
> [org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) 
> -- main]: main Exception occurred:File does not exist: 
> hdfs://bipcluster/user/master/carbon/store/dm_test/carbondata_table/Fact/Part0/Segment_1
> 17/10/28 19:25:22 ERROR 
> [org.apache.spark.sql.execution.command.LoadTable(143) -- main]: main 
> java.lang.NullPointerException
> at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)
> at 
> org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)
> at 
> org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)
> at 
> org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)
> at 
> org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
> at 
> org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)
> at 
> org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331)
> at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)
> at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)
> at 
> org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)
> at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)
> at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)
> at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
> at org.apache.spark.sql.Dataset.(Dataset.scala:180)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619)
> at 
> $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36)
> at 
> $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49)
> at $line23.$read$$iw$$iw$$iw$$iw.(:51)
> at $line23.$read$$iw$$iw$$iw.(:53)
> at $line23.$read$$iw$$iw.(:55)
> at $line23.$read$$iw.(:57)
> at $line23.$read.(:59)
> at $line23.$read$.(:63)
> at $line23.$read$.()
> at $line23.$eval$.$print$lzycompute(:7)
> at $line23.$eval$.$print(:6)
> at $line23.$eval.$print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> 

[jira] [Updated] (CARBONDATA-1655) getSplits function is very slow !!!

2017-10-28 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1655:
--
Description: 
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:258)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
```
{code}

spark-sql> select dt from  dm_test.table_carbondata limit 1;
NULL
Time taken: 20.94 seconds, Fetched 1 row(s)

If the query don't contains sort column, prune should return quickly


  was:
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 

[jira] [Updated] (CARBONDATA-1655) getSplits function is very slow !!!

2017-10-28 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1655:
--
Description: 
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:258)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
```
{code}

spark-sql> select dt from  dm_test.table_carbondata limit 1;
NULL
Time taken: 20.94 seconds, Fetched 1 row(s)

If the query don't contains sort column, prune should return quickly!!! 


  was:
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 

[jira] [Updated] (CARBONDATA-1655) getSplits function is very slow !!!

2017-10-28 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1655:
--
Description: 
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:258)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
```
{code}

spark-sql> select dt from  dm_test.table_carbondata limit 1;
NULL
Time taken: 20.94 seconds, Fetched 1 row(s)


  was:
I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 

[jira] [Created] (CARBONDATA-1655) getSplits function is very slow !!!

2017-10-28 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1655:
-

 Summary: getSplits function is very slow !!!
 Key: CARBONDATA-1655
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1655
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Reporter: cen yuhai


I have a table which has 4 billion records, I find that the getSplits function 
is too slow!
getSplit spent 20s!!!
{code}
"main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable 
[0x7fcc992b6000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80)
at 
org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170)
at 
org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310)
at 
org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268)
at 
org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453)
at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:258)
at 
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258)
at scala.Option.getOrElse(Option.scala:121)
```
{code}

spark-sql> select dt from  dm_test.dm_trd_order_wide_carbondata limit 1;
NULL
Time taken: 20.94 seconds, Fetched 1 row(s)




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1654) NullPointerException when insert overwrite table

2017-10-28 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1654:
--
Summary: NullPointerException when insert overwrite table  (was: 
NullPointerException when insert overwrite talbe )

> NullPointerException when insert overwrite table
> 
>
> Key: CARBONDATA-1654
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1654
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 1.2.0
> Environment: spark 2.1.1 carbondata 1.2.0
>Reporter: cen yuhai
>Priority: Critical
>
> carbon.sql("insert overwrite table carbondata_table select * from hive_table 
> where dt = '2017-10-10' ").collect
> carbondata wanto find directory Segment_1, but there is Segment_2
> {code}
> [Stage 0:>  (0 + 504) / 
> 504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- 
> SparkUI-174]: The following warnings have been detected: WARNING: The 
> (sub)resource method stageData in 
> org.apache.spark.status.api.v1.OneStageResource contains empty path 
> annotation.
> 17/10/28 19:25:20 ERROR 
> [org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) 
> -- main]: main Exception occurred:File does not exist: 
> hdfs://bipcluster/user/master/carbon/store/dm_test/dm_trd_order_wide_carbondata/Fact/Part0/Segment_1
> 17/10/28 19:25:22 ERROR 
> [org.apache.spark.sql.execution.command.LoadTable(143) -- main]: main 
> java.lang.NullPointerException
> at 
> org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)
> at 
> org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)
> at 
> org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)
> at 
> org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)
> at 
> org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
> at 
> org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)
> at 
> org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331)
> at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)
> at 
> org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)
> at 
> org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)
> at 
> org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)
> at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)
> at 
> org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
> at org.apache.spark.sql.Dataset.(Dataset.scala:180)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
> at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619)
> at 
> $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36)
> at 
> $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47)
> at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49)
> at $line23.$read$$iw$$iw$$iw$$iw.(:51)
> at $line23.$read$$iw$$iw$$iw.(:53)
> at $line23.$read$$iw$$iw.(:55)
> at $line23.$read$$iw.(:57)
> at $line23.$read.(:59)
> at $line23.$read$.(:63)
> at $line23.$read$.()
> at $line23.$eval$.$print$lzycompute(:7)
> at $line23.$eval$.$print(:6)
> at $line23.$eval.$print()
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 

[jira] [Updated] (CARBONDATA-1654) NullPointerException when insert overwrite talbe

2017-10-28 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1654:
--
Description: 
carbondata wanto find directory Segment_1, but there is Segment_2
{code}
[Stage 0:>  (0 + 504) / 
504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- 
SparkUI-174]: The following warnings have been detected: WARNING: The 
(sub)resource method stageData in 
org.apache.spark.status.api.v1.OneStageResource contains empty path annotation.

17/10/28 19:25:20 ERROR 
[org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) -- 
main]: main Exception occurred:File does not exist: 
hdfs://bipcluster/user/master/carbon/store/dm_test/dm_trd_order_wide_carbondata/Fact/Part0/Segment_1
17/10/28 19:25:22 ERROR [org.apache.spark.sql.execution.command.LoadTable(143) 
-- main]: main 
java.lang.NullPointerException
at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)
at 
org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)
at 
org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)
at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)
at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at 
org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)
at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)
at 
org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)
at 
org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)
at 
org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)
at 
org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
at org.apache.spark.sql.Dataset.(Dataset.scala:180)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619)
at 
$line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36)
at 
$line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47)
at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49)
at $line23.$read$$iw$$iw$$iw$$iw.(:51)
at $line23.$read$$iw$$iw$$iw.(:53)
at $line23.$read$$iw$$iw.(:55)
at $line23.$read$$iw.(:57)
at $line23.$read.(:59)
at $line23.$read$.(:63)
at $line23.$read$.()
at $line23.$eval$.$print$lzycompute(:7)
at $line23.$eval$.$print(:6)
at $line23.$eval.$print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at 
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at 
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at 
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at 

[jira] [Created] (CARBONDATA-1654) NullPointerException when insert overwrite talbe

2017-10-28 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1654:
-

 Summary: NullPointerException when insert overwrite talbe 
 Key: CARBONDATA-1654
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1654
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.2.0
 Environment: spark 2.1.1 carbondata 1.2.0
Reporter: cen yuhai
Priority: Critical


{code}
[Stage 0:>  (0 + 504) / 
504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- 
SparkUI-174]: The following warnings have been detected: WARNING: The 
(sub)resource method stageData in 
org.apache.spark.status.api.v1.OneStageResource contains empty path annotation.

17/10/28 19:25:20 ERROR 
[org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) -- 
main]: main Exception occurred:File does not exist: 
hdfs://bipcluster/user/master/carbon/store/dm_test/dm_trd_order_wide_carbondata/Fact/Part0/Segment_1
17/10/28 19:25:22 ERROR [org.apache.spark.sql.execution.command.LoadTable(143) 
-- main]: main 
java.lang.NullPointerException
at 
org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88)
at 
org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364)
at 
org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93)
at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326)
at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at 
org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322)
at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595)
at 
org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107)
at 
org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046)
at 
org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754)
at 
org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651)
at 
org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67)
at org.apache.spark.sql.Dataset.(Dataset.scala:180)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619)
at 
$line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36)
at 
$line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45)
at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47)
at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49)
at $line23.$read$$iw$$iw$$iw$$iw.(:51)
at $line23.$read$$iw$$iw$$iw.(:53)
at $line23.$read$$iw$$iw.(:55)
at $line23.$read$$iw.(:57)
at $line23.$read.(:59)
at $line23.$read$.(:63)
at $line23.$read$.()
at $line23.$eval$.$print$lzycompute(:7)
at $line23.$eval$.$print(:6)
at $line23.$eval.$print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at 
scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at 
scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at 
scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at 
scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at 

[jira] [Updated] (CARBONDATA-727) Hive integration

2017-10-14 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-727:
-
Attachment: the future of hive integration.png

> Hive integration
> 
>
> Key: CARBONDATA-727
> URL: https://issues.apache.org/jira/browse/CARBONDATA-727
> Project: CarbonData
>  Issue Type: New Feature
>  Components: hive-integration
>Affects Versions: NONE
>Reporter: cen yuhai
>Assignee: cen yuhai
> Attachments: the future of hive integration.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Now hive is widely used in warehouse. I think we should support hive



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1377) Implement hive partition

2017-10-14 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1377:
--
Attachment: the future of hive integration.png

> Implement hive partition
> 
>
> Key: CARBONDATA-1377
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1377
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: hive-integration
>Reporter: cen yuhai
>Assignee: cen yuhai
> Attachments: the future of hive integration.png
>
>
> Current partition implement is like database, If I want to use carbon to 
> replace parquet massively, we must make the usage of carbon the same with 
> parquet/orc.
> Hive users should able to switch to CarbonData for all the new partitions 
> being created. Hive support format to be specified at partition level. 
> Example:
> {code:sql}
> create table rtestpartition (col1 string, col2 int) partitioned by (col3 int) 
> stored as parquet;
> insert into rtestpartition partition(col3=10) select "pqt", 1;
> insert into rtestpartition partition(col3=20) select "pqt", 1;
> insert into rtestpartition partition(col3=10) select "pqt", 1;
> insert into rtestpartition partition(col3=20) select "pqt", 1;
> {code}
> {noformat}
> hive creates folder like
> /db1/table1/col3=10/0001_file.pqt
> /db1/table1/col3=10/0002_file.pqt
> /db1/table1/col3=20/0001_file.pqt
> /db1/table1/col3=20/0002_file.pqt
> {noformat}
> Hive users can now change new partitions to CarbonData, how ever old 
> partitions still be with parquet and require migration scripts to move to 
> CarbonData format.
> {code:sql}
> alter table rtestpartition set fileformat carbondata;
> insert into rtestpartition partition(col3=30) select "cdata", 1;
> insert into rtestpartition partition(col3=40) select "cdata", 1;
> {code}
> {noformat}
> hive creates folder like
> /db1/table1/col3=10/0001_file.pqt
> /db1/table1/col3=10/0002_file.pqt
> /db1/table1/col3=20/0001_file.pqt
> /db1/table1/col3=20/0002_file.pqt
> /db1/table1/col3=30/
> /db1/table1/col3=40/
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (CARBONDATA-1377) Implement hive partition

2017-10-10 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai reassigned CARBONDATA-1377:
-

Assignee: cen yuhai

> Implement hive partition
> 
>
> Key: CARBONDATA-1377
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1377
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: hive-integration
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> Current partition implement is like database, If I want to use carbon to 
> replace parquet massively, we must make the usage of carbon the same with 
> parquet/orc.
> Hive users should able to switch to CarbonData for all the new partitions 
> being created. Hive support format to be specified at partition level. 
> Example:
> {code:sql}
> create table rtestpartition (col1 string, col2 int) partitioned by (col3 int) 
> stored as parquet;
> insert into rtestpartition partition(col3=10) select "pqt", 1;
> insert into rtestpartition partition(col3=20) select "pqt", 1;
> insert into rtestpartition partition(col3=10) select "pqt", 1;
> insert into rtestpartition partition(col3=20) select "pqt", 1;
> {code}
> {noformat}
> hive creates folder like
> /db1/table1/col3=10/0001_file.pqt
> /db1/table1/col3=10/0002_file.pqt
> /db1/table1/col3=20/0001_file.pqt
> /db1/table1/col3=20/0002_file.pqt
> {noformat}
> Hive users can now change new partitions to CarbonData, how ever old 
> partitions still be with parquet and require migration scripts to move to 
> CarbonData format.
> {code:sql}
> alter table rtestpartition set fileformat carbondata;
> insert into rtestpartition partition(col3=30) select "cdata", 1;
> insert into rtestpartition partition(col3=40) select "cdata", 1;
> {code}
> {noformat}
> hive creates folder like
> /db1/table1/col3=10/0001_file.pqt
> /db1/table1/col3=10/0002_file.pqt
> /db1/table1/col3=20/0001_file.pqt
> /db1/table1/col3=20/0002_file.pqt
> /db1/table1/col3=30/
> /db1/table1/col3=40/
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (CARBONDATA-1362) ArrayIndexOutOfBoundsException when decoing decimal type

2017-10-08 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai closed CARBONDATA-1362.
-
Resolution: Not A Problem

> ArrayIndexOutOfBoundsException when decoing decimal type
> 
>
> Key: CARBONDATA-1362
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1362
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Reporter: cen yuhai
>
> {code}
> ava.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:231)
>   at 
> org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator.close(AbstractDetailQueryResultIterator.java:306)
>   at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.finish(AbstractQueryExecutor.java:544)
>   at 
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.close(VectorizedCarbonRecordReader.java:132)
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:215)
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:213)
>   at 
> org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:123)
>   at 
> org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:97)
>   at 
> org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:95)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:95)
>   at org.apache.spark.scheduler.Task.run(Task.scala:117)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:188)
>   at 
> org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:226)
>   ... 16 more
> Caused by: java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:62)
>   at 
> org.apache.carbondata.core.scan.scanner.AbstractBlockletScanner.scanBlocklet(AbstractBlockletScanner.java:100)
>   at 
> org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:191)
>   at 
> org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:178)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   ... 3 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.carbondata.core.util.DataTypeUtil.byteToBigDecimal(DataTypeUtil.java:210)
>   at 
> org.apache.carbondata.core.metadata.ColumnPageCodecMeta.deserialize(ColumnPageCodecMeta.java:217)
>   at 
> org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.decodeMeasure(CompressedMeasureChunkFileBasedReaderV3.java:236)
>   at 
> org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.convertToMeasureChunk(CompressedMeasureChunkFileBasedReaderV3.java:219)
>   at 
> org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (CARBONDATA-1378) Support create carbon table in Hive

2017-09-17 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai reassigned CARBONDATA-1378:
-

Assignee: cen yuhai

> Support create carbon table in Hive
> ---
>
> Key: CARBONDATA-1378
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1378
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: hive-integration
>Reporter: cen yuhai
>Assignee: cen yuhai
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> Support create carbon table in Hive



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1477) wrong values shown when fetching date type values in hive

2017-09-13 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1477:
--
Description: 
{code} 
 import org.apache.spark.sql.CarbonSession._

val carbonSession = SparkSession
  .builder()
  .master("local")
  .appName("HiveExample")
  .config("carbonSession.sql.warehouse.dir", warehouse).enableHiveSupport()
  .getOrCreateCarbonSession(
store)

carbonSession.sql("""DROP TABLE IF EXISTS 
HIVE_CARBON_EXAMPLE""".stripMargin)

carbonSession
  .sql(
"""CREATE TABLE HIVE_CARBON_EXAMPLE (ID int,NAME string,SALARY 
double,JOININGDATE date) STORED BY
  |'CARBONDATA' """
  .stripMargin)

carbonSession.sql(
  s"""

   LOAD DATA LOCAL INPATH 
'$rootPath/integration/hive/src/main/resources/data.csv' INTO
   TABLE
 HIVE_CARBON_EXAMPLE
   """)
carbonSession.sql("SELECT * FROM HIVE_CARBON_EXAMPLE").show()

carbonSession.stop()

try {
  Class.forName(driverName)
}
catch {
  case classNotFoundException: ClassNotFoundException =>
classNotFoundException.printStackTrace()
}


HiveEmbeddedServer.start()
val port = HiveEmbeddedServer.getFreePort
val connection = 
DriverManager.getConnection(s"jdbc:hive2://localhost:8000/default", "", "")
val statement: Statement = connection.createStatement

logger.info(s"HIVE CLI IS STARTED ON PORT $port ==")

statement.execute("CREATE TABLE IF NOT EXISTS " + "HIVE_CARBON_EXAMPLE " +
  " (ID int, NAME string,SALARY double,JOININGDATE date)")
statement
  .execute(
"ALTER TABLE HIVE_CARBON_EXAMPLE SET FILEFORMAT INPUTFORMAT 
\"org.apache.carbondata." +
"hive.MapredCarbonInputFormat\"OUTPUTFORMAT 
\"org.apache.carbondata.hive." +
"MapredCarbonOutputFormat\"SERDE \"org.apache.carbondata.hive." +
"CarbonHiveSerDe\" ")

statement
  .execute(
"ALTER TABLE HIVE_CARBON_EXAMPLE SET LOCATION " +
s"'file:///$store/default/hive_carbon_example' ")

val sql = "SELECT * FROM HIVE_CARBON_EXAMPLE"

val resultSet: ResultSet = statement.executeQuery(sql)

var rowsFetched = 0

while (resultSet.next) {
 println("*"+resultSet.getString("JOININGDATE"))
}
println(s"**Total Number Of Rows Fetched ** $rowsFetched")

logger.info("Fetching the Individual Columns ")



HiveEmbeddedServer.stop()

{code} 
actual result 
*null
*1970-01-01

values in my csv are
ID,NAME,SALARY,JOININGDATE
1,'liang',20,2016-03-14
2,'anubhav',2,2019/03/17



  was:

  import org.apache.spark.sql.CarbonSession._

val carbonSession = SparkSession
  .builder()
  .master("local")
  .appName("HiveExample")
  .config("carbonSession.sql.warehouse.dir", warehouse).enableHiveSupport()
  .getOrCreateCarbonSession(
store)

carbonSession.sql("""DROP TABLE IF EXISTS 
HIVE_CARBON_EXAMPLE""".stripMargin)

carbonSession
  .sql(
"""CREATE TABLE HIVE_CARBON_EXAMPLE (ID int,NAME string,SALARY 
double,JOININGDATE date) STORED BY
  |'CARBONDATA' """
  .stripMargin)

carbonSession.sql(
  s"""

   LOAD DATA LOCAL INPATH 
'$rootPath/integration/hive/src/main/resources/data.csv' INTO
   TABLE
 HIVE_CARBON_EXAMPLE
   """)
carbonSession.sql("SELECT * FROM HIVE_CARBON_EXAMPLE").show()

carbonSession.stop()

try {
  Class.forName(driverName)
}
catch {
  case classNotFoundException: ClassNotFoundException =>
classNotFoundException.printStackTrace()
}


HiveEmbeddedServer.start()
val port = HiveEmbeddedServer.getFreePort
val connection = 
DriverManager.getConnection(s"jdbc:hive2://localhost:8000/default", "", "")
val statement: Statement = connection.createStatement

logger.info(s"HIVE CLI IS STARTED ON PORT $port ==")

statement.execute("CREATE TABLE IF NOT EXISTS " + "HIVE_CARBON_EXAMPLE " +
  " (ID int, NAME string,SALARY double,JOININGDATE date)")
statement
  .execute(
"ALTER TABLE HIVE_CARBON_EXAMPLE SET FILEFORMAT INPUTFORMAT 
\"org.apache.carbondata." +
"hive.MapredCarbonInputFormat\"OUTPUTFORMAT 
\"org.apache.carbondata.hive." +
"MapredCarbonOutputFormat\"SERDE \"org.apache.carbondata.hive." +
"CarbonHiveSerDe\" ")

statement
  .execute(
"ALTER TABLE HIVE_CARBON_EXAMPLE SET LOCATION " +
s"'file:///$store/default/hive_carbon_example' ")

val sql = "SELECT * FROM HIVE_CARBON_EXAMPLE"

val resultSet: ResultSet = statement.executeQuery(sql)

var rowsFetched = 0

while (resultSet.next) {
 println("*"+resultSet.getString("JOININGDATE"))
}
println(s"**Total Number Of Rows Fetched 

[jira] [Created] (CARBONDATA-1378) Support create carbon table in Hive

2017-08-13 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1378:
-

 Summary: Support create carbon table in Hive
 Key: CARBONDATA-1378
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1378
 Project: CarbonData
  Issue Type: Sub-task
Reporter: cen yuhai


Support create carbon table in Hive



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1377) Implement hive partition

2017-08-13 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1377:
-

 Summary: Implement hive partition
 Key: CARBONDATA-1377
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1377
 Project: CarbonData
  Issue Type: Sub-task
Reporter: cen yuhai


Current partition implement is like database, If I want to use carbon to 
replace parquet massively, we must make the usage of carbon the same with 
parquet/orc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1375) clean hive pom

2017-08-13 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1375:
-

 Summary: clean hive pom
 Key: CARBONDATA-1375
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1375
 Project: CarbonData
  Issue Type: Bug
  Components: hive-integration
Reporter: cen yuhai


the hive pom contains some unnecessary dependencies



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1374) Can't insert carbon if the source table contains array datatype

2017-08-13 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1374:
--
Summary: Can't insert carbon if the source table contains array datatype  
(was: Can't insert carbon if the source table contains array data)

> Can't insert carbon if the source table contains array datatype
> ---
>
> Key: CARBONDATA-1374
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1374
> Project: CarbonData
>  Issue Type: Bug
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> {code}
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -1
>         at 
> java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:872)
>         at java.lang.StringBuilder.substring(StringBuilder.java:72)
>         at 
> scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166)
>         at 
> org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:126)
>         at 
> org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD$$anonfun$internalCompute$1.apply$mcVI$sp(CarbonGlobalDictionaryRDD.scala:295)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>         at 
> org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD.internalCompute(CarbonGlobalDictionaryRDD.scala:294)
>         at 
> org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>         at org.apache.spark.scheduler.Task.run(Task.scala:104)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (CARBONDATA-1374) Can't insert carbon if the source table contains array data

2017-08-13 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai reassigned CARBONDATA-1374:
-

Assignee: cen yuhai

> Can't insert carbon if the source table contains array data
> ---
>
> Key: CARBONDATA-1374
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1374
> Project: CarbonData
>  Issue Type: Bug
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> {code}
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -1
>         at 
> java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:872)
>         at java.lang.StringBuilder.substring(StringBuilder.java:72)
>         at 
> scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166)
>         at 
> org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:126)
>         at 
> org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD$$anonfun$internalCompute$1.apply$mcVI$sp(CarbonGlobalDictionaryRDD.scala:295)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>         at 
> org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD.internalCompute(CarbonGlobalDictionaryRDD.scala:294)
>         at 
> org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50)
>         at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
>         at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
>         at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>         at org.apache.spark.scheduler.Task.run(Task.scala:104)
>         at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1374) Can't insert carbon if the source table contains array data

2017-08-13 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1374:
-

 Summary: Can't insert carbon if the source table contains array 
data
 Key: CARBONDATA-1374
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1374
 Project: CarbonData
  Issue Type: Bug
Reporter: cen yuhai


{code}
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
range: -1
        at 
java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:872)
        at java.lang.StringBuilder.substring(StringBuilder.java:72)
        at 
scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166)
        at 
org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:126)
        at 
org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD$$anonfun$internalCompute$1.apply$mcVI$sp(CarbonGlobalDictionaryRDD.scala:295)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
        at 
org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD.internalCompute(CarbonGlobalDictionaryRDD.scala:294)
        at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
        at org.apache.spark.scheduler.Task.run(Task.scala:104)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1362) ArrayIndexOutOfBoundsException when decoing decimal type

2017-08-05 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115420#comment-16115420
 ] 

cen yuhai commented on CARBONDATA-1362:
---

I use the old code to create carbon table and then load data, and then I update 
my code to master, query the data will throws this exception. If I recreate the 
table, it will be ok 

> ArrayIndexOutOfBoundsException when decoing decimal type
> 
>
> Key: CARBONDATA-1362
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1362
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Reporter: cen yuhai
>
> {code}
> ava.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:231)
>   at 
> org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator.close(AbstractDetailQueryResultIterator.java:306)
>   at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.finish(AbstractQueryExecutor.java:544)
>   at 
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.close(VectorizedCarbonRecordReader.java:132)
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:215)
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:213)
>   at 
> org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:123)
>   at 
> org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:97)
>   at 
> org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:95)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:95)
>   at org.apache.spark.scheduler.Task.run(Task.scala:117)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0
>   at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:188)
>   at 
> org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:226)
>   ... 16 more
> Caused by: java.lang.RuntimeException: 
> java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:62)
>   at 
> org.apache.carbondata.core.scan.scanner.AbstractBlockletScanner.scanBlocklet(AbstractBlockletScanner.java:100)
>   at 
> org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:191)
>   at 
> org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:178)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   ... 3 more
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
>   at 
> org.apache.carbondata.core.util.DataTypeUtil.byteToBigDecimal(DataTypeUtil.java:210)
>   at 
> org.apache.carbondata.core.metadata.ColumnPageCodecMeta.deserialize(ColumnPageCodecMeta.java:217)
>   at 
> org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.decodeMeasure(CompressedMeasureChunkFileBasedReaderV3.java:236)
>   at 
> org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.convertToMeasureChunk(CompressedMeasureChunkFileBasedReaderV3.java:219)
>   at 
> org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:59)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1362) ArrayIndexOutOfBoundsException when decoing decimal type

2017-08-05 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1362:
-

 Summary: ArrayIndexOutOfBoundsException when decoing decimal type
 Key: CARBONDATA-1362
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1362
 Project: CarbonData
  Issue Type: Bug
  Components: core
Reporter: cen yuhai



{code}
ava.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:231)
at 
org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator.close(AbstractDetailQueryResultIterator.java:306)
at 
org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.finish(AbstractQueryExecutor.java:544)
at 
org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.close(VectorizedCarbonRecordReader.java:132)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:215)
at 
org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:213)
at 
org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:123)
at 
org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:97)
at 
org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:95)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at 
org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:95)
at org.apache.spark.scheduler.Task.run(Task.scala:117)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
java.lang.ArrayIndexOutOfBoundsException: 0
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:188)
at 
org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:226)
... 16 more
Caused by: java.lang.RuntimeException: 
java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:62)
at 
org.apache.carbondata.core.scan.scanner.AbstractBlockletScanner.scanBlocklet(AbstractBlockletScanner.java:100)
at 
org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:191)
at 
org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:178)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
... 3 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
at 
org.apache.carbondata.core.util.DataTypeUtil.byteToBigDecimal(DataTypeUtil.java:210)
at 
org.apache.carbondata.core.metadata.ColumnPageCodecMeta.deserialize(ColumnPageCodecMeta.java:217)
at 
org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.decodeMeasure(CompressedMeasureChunkFileBasedReaderV3.java:236)
at 
org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.convertToMeasureChunk(CompressedMeasureChunkFileBasedReaderV3.java:219)
at 
org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:59)
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Closed] (CARBONDATA-1153) Can not add column

2017-08-05 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai closed CARBONDATA-1153.
-
Resolution: Not A Problem

> Can not add column
> --
>
> Key: CARBONDATA-1153
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1153
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.2.0
>Reporter: cen yuhai
>
> Sometimes it will throws exception as below. why can't I add column? no one 
> are altering the table... 
> {code}
> scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)")
> 17/06/11 22:09:13 AUDIT 
> [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: 
> [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add 
> columns request has been received for temp.yuhai_carbon
> 17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- 
> task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job
> 17/06/11 22:10:22 ERROR 
> [org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: 
> main Alter table add columns failed :Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): 
> java.lang.RuntimeException: Dictionary file test1 is locked for updation. 
> Please try after some time
> at scala.sys.package$.error(package.scala:27)
> at 
> org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857)
> at 
> org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83)
> at 
> org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
> at org.apache.spark.scheduler.Task.run(Task.scala:104)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1343) Hive can't query data when the carbon table info is store in hive metastore

2017-07-30 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1343:
--
Description: 
{code}
set spark.carbon.hive.schema.store=true in spark-defaults.conf
spark-shell --jars 
carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar

import org.apache.spark.sql.SparkSession 
import org.apache.spark.sql.CarbonSession._ 
val rootPath = "hdfs://mycluster/user/master/carbon" 
val storeLocation = s"$rootPath/store" 
val warehouse = s"$rootPath/warehouse" 
val metastoredb = s"$rootPath/metastore_db" 

val carbon 
=SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
 metastoredb) 
carbon.sql("create table temp.hive_carbon(id short, name string, scale decimal, 
country string, salary double) STORED BY 'carbondata'") 
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO 
TABLE temp.hive_carbon") 

start hive cli

set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;

select * from temp.hive_carbon;
{code}

{code}
17/07/30 19:33:07 ERROR [CliDriver(1097) -- 
53ea0b98-bcf0-4b86-a167-58ce570df284 main]: Failed with exception 
java.io.IOException:java.io.IOException: File does not exist: 
hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema
java.io.IOException: java.io.IOException: File does not exist: 
hdfs://bipcluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2187)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: File does not exist: 
hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema
at 
org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.populateCarbonTable(CarbonInputFormat.java:147)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.getCarbonTable(CarbonInputFormat.java:124)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.getAbsoluteTableIdentifier(CarbonInputFormat.java:221)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:234)
at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getSplits(MapredCarbonInputFormat.java:51)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
... 15 more
{code}

  was:
{code}
set spark.carbon.hive.schema.store=true in spark-defaults.conf
spark-shell --jars 
carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
import org.apache.spark.sql.SparkSession 
import org.apache.spark.sql.CarbonSession._ 
val rootPath = "hdfs://mycluster/user/master/carbon" 
val storeLocation = s"$rootPath/store" 
val warehouse = s"$rootPath/warehouse" 
val metastoredb = s"$rootPath/metastore_db" 

val carbon 
=SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
 metastoredb) 
carbon.sql("create table temp.hive_carbon(id short, name string, scale decimal, 
country string, salary double) STORED BY 'carbondata'") 
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO 
TABLE temp.hive_carbon") 

start hive cli

set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;

select * from temp.hive_carbon;
{code}

{code}
17/07/30 19:33:07 ERROR [CliDriver(1097) -- 

[jira] [Updated] (CARBONDATA-1343) Hive can't query data when the carbon table info is store in hive metastore

2017-07-30 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1343:
--
Description: 
{code}
set spark.carbon.hive.schema.store=true in spark-defaults.conf
spark-shell --jars 
carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
import org.apache.spark.sql.SparkSession 
import org.apache.spark.sql.CarbonSession._ 
val rootPath = "hdfs://mycluster/user/master/carbon" 
val storeLocation = s"$rootPath/store" 
val warehouse = s"$rootPath/warehouse" 
val metastoredb = s"$rootPath/metastore_db" 

val carbon 
=SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
 metastoredb) 
carbon.sql("create table temp.hive_carbon(id short, name string, scale decimal, 
country string, salary double) STORED BY 'carbondata'") 
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO 
TABLE temp.hive_carbon") 

start hive cli

set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;

select * from temp.hive_carbon;
{code}

{code}
17/07/30 19:33:07 ERROR [CliDriver(1097) -- 
53ea0b98-bcf0-4b86-a167-58ce570df284 main]: Failed with exception 
java.io.IOException:java.io.IOException: File does not exist: 
hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema
java.io.IOException: java.io.IOException: File does not exist: 
hdfs://bipcluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2187)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: File does not exist: 
hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema
at 
org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.populateCarbonTable(CarbonInputFormat.java:147)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.getCarbonTable(CarbonInputFormat.java:124)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.getAbsoluteTableIdentifier(CarbonInputFormat.java:221)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:234)
at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getSplits(MapredCarbonInputFormat.java:51)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
... 15 more
{code}

  was:
set spark.carbon.hive.schema.store=true in spark-defaults.conf
spark-shell --jars 
carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
import org.apache.spark.sql.SparkSession 
import org.apache.spark.sql.CarbonSession._ 
val rootPath = "hdfs://mycluster/user/master/carbon" 
val storeLocation = s"$rootPath/store" 
val warehouse = s"$rootPath/warehouse" 
val metastoredb = s"$rootPath/metastore_db" 

val carbon 
=SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
 metastoredb) 
carbon.sql("create table temp.hive_carbon(id short, name string, scale decimal, 
country string, salary double) STORED BY 'carbondata'") 
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO 
TABLE temp.hive_carbon") 

start hive cli
```
set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;

select * from temp.hive_carbon;
{code}
17/07/30 19:33:07 ERROR [CliDriver(1097) -- 
53ea0b98-bcf0-4b86-a167-58ce570df284 main]: 

[jira] [Commented] (CARBONDATA-1343) Hive can't query data when the carbon table info is store in hive metastore

2017-07-30 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106484#comment-16106484
 ] 

cen yuhai commented on CARBONDATA-1343:
---

I am working on it

> Hive can't query data when the carbon table info is store in hive metastore
> ---
>
> Key: CARBONDATA-1343
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1343
> Project: CarbonData
>  Issue Type: Bug
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> set spark.carbon.hive.schema.store=true in spark-defaults.conf
> spark-shell --jars 
> carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
> import org.apache.spark.sql.SparkSession 
> import org.apache.spark.sql.CarbonSession._ 
> val rootPath = "hdfs://mycluster/user/master/carbon" 
> val storeLocation = s"$rootPath/store" 
> val warehouse = s"$rootPath/warehouse" 
> val metastoredb = s"$rootPath/metastore_db" 
> val carbon 
> =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
>  metastoredb) 
> carbon.sql("create table temp.hive_carbon(id short, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'") 
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv 
> INTO TABLE temp.hive_carbon") 
> start hive cli
> ```
> set hive.mapred.supports.subdirectories=true;
> set mapreduce.input.fileinputformat.input.dir.recursive=true;
> select * from temp.hive_carbon;
> {code}
> 17/07/30 19:33:07 ERROR [CliDriver(1097) -- 
> 53ea0b98-bcf0-4b86-a167-58ce570df284 main]: Failed with exception 
> java.io.IOException:java.io.IOException: File does not exist: 
> hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema
> java.io.IOException: java.io.IOException: File does not exist: 
> hdfs://bipcluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2187)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: File does not exist: 
> hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema
> at 
> org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
> at 
> org.apache.carbondata.hadoop.CarbonInputFormat.populateCarbonTable(CarbonInputFormat.java:147)
> at 
> org.apache.carbondata.hadoop.CarbonInputFormat.getCarbonTable(CarbonInputFormat.java:124)
> at 
> org.apache.carbondata.hadoop.CarbonInputFormat.getAbsoluteTableIdentifier(CarbonInputFormat.java:221)
> at 
> org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:234)
> at 
> org.apache.carbondata.hive.MapredCarbonInputFormat.getSplits(MapredCarbonInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
> ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1343) Hive can't query data when the carbon table info is store in hive metastore

2017-07-30 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1343:
-

 Summary: Hive can't query data when the carbon table info is store 
in hive metastore
 Key: CARBONDATA-1343
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1343
 Project: CarbonData
  Issue Type: Bug
Reporter: cen yuhai


set spark.carbon.hive.schema.store=true in spark-defaults.conf
spark-shell --jars 
carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
import org.apache.spark.sql.SparkSession 
import org.apache.spark.sql.CarbonSession._ 
val rootPath = "hdfs://mycluster/user/master/carbon" 
val storeLocation = s"$rootPath/store" 
val warehouse = s"$rootPath/warehouse" 
val metastoredb = s"$rootPath/metastore_db" 

val carbon 
=SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
 metastoredb) 
carbon.sql("create table temp.hive_carbon(id short, name string, scale decimal, 
country string, salary double) STORED BY 'carbondata'") 
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO 
TABLE temp.hive_carbon") 

start hive cli
```
set hive.mapred.supports.subdirectories=true;
set mapreduce.input.fileinputformat.input.dir.recursive=true;

select * from temp.hive_carbon;
{code}
17/07/30 19:33:07 ERROR [CliDriver(1097) -- 
53ea0b98-bcf0-4b86-a167-58ce570df284 main]: Failed with exception 
java.io.IOException:java.io.IOException: File does not exist: 
hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema
java.io.IOException: java.io.IOException: File does not exist: 
hdfs://bipcluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2187)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: File does not exist: 
hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema
at 
org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.populateCarbonTable(CarbonInputFormat.java:147)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.getCarbonTable(CarbonInputFormat.java:124)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.getAbsoluteTableIdentifier(CarbonInputFormat.java:221)
at 
org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:234)
at 
org.apache.carbondata.hive.MapredCarbonInputFormat.getSplits(MapredCarbonInputFormat.java:51)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304)
at 
org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
... 15 more
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (CARBONDATA-1343) Hive can't query data when the carbon table info is store in hive metastore

2017-07-30 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai reassigned CARBONDATA-1343:
-

Assignee: cen yuhai

> Hive can't query data when the carbon table info is store in hive metastore
> ---
>
> Key: CARBONDATA-1343
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1343
> Project: CarbonData
>  Issue Type: Bug
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> set spark.carbon.hive.schema.store=true in spark-defaults.conf
> spark-shell --jars 
> carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
> import org.apache.spark.sql.SparkSession 
> import org.apache.spark.sql.CarbonSession._ 
> val rootPath = "hdfs://mycluster/user/master/carbon" 
> val storeLocation = s"$rootPath/store" 
> val warehouse = s"$rootPath/warehouse" 
> val metastoredb = s"$rootPath/metastore_db" 
> val carbon 
> =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
>  metastoredb) 
> carbon.sql("create table temp.hive_carbon(id short, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'") 
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv 
> INTO TABLE temp.hive_carbon") 
> start hive cli
> ```
> set hive.mapred.supports.subdirectories=true;
> set mapreduce.input.fileinputformat.input.dir.recursive=true;
> select * from temp.hive_carbon;
> {code}
> 17/07/30 19:33:07 ERROR [CliDriver(1097) -- 
> 53ea0b98-bcf0-4b86-a167-58ce570df284 main]: Failed with exception 
> java.io.IOException:java.io.IOException: File does not exist: 
> hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema
> java.io.IOException: java.io.IOException: File does not exist: 
> hdfs://bipcluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428)
> at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
> at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2187)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
> at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
> at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
> at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
> at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> Caused by: java.io.IOException: File does not exist: 
> hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema
> at 
> org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
> at 
> org.apache.carbondata.hadoop.CarbonInputFormat.populateCarbonTable(CarbonInputFormat.java:147)
> at 
> org.apache.carbondata.hadoop.CarbonInputFormat.getCarbonTable(CarbonInputFormat.java:124)
> at 
> org.apache.carbondata.hadoop.CarbonInputFormat.getAbsoluteTableIdentifier(CarbonInputFormat.java:221)
> at 
> org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:234)
> at 
> org.apache.carbondata.hive.MapredCarbonInputFormat.getSplits(MapredCarbonInputFormat.java:51)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304)
> at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459)
> ... 15 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1338) Spark can not query data when 'spark.carbon.hive.schema.store' is true

2017-07-30 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1338:
--
Summary: Spark can not query data when 'spark.carbon.hive.schema.store' is 
true  (was: Can not query data when 'spark.carbon.hive.schema.store' is true)

> Spark can not query data when 'spark.carbon.hive.schema.store' is true
> --
>
> Key: CARBONDATA-1338
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1338
> Project: CarbonData
>  Issue Type: Bug
>Reporter: cen yuhai
>Assignee: cen yuhai
> Fix For: 1.2.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> My step is as blow: 
> {code} 
> set spark.carbon.hive.schema.store=true in spark-defaults.conf
> spark-shell --jars 
> carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
> import org.apache.spark.sql.SparkSession 
> import org.apache.spark.sql.CarbonSession._ 
> val rootPath = "hdfs://mycluster/user/master/carbon" 
> val storeLocation = s"$rootPath/store" 
> val warehouse = s"$rootPath/warehouse" 
> val metastoredb = s"$rootPath/metastore_db" 
> val carbon 
> =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
>  metastoredb) 
> carbon.sql("create table temp.yuhai_carbon(id short, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'") 
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv 
> INTO TABLE temp.yuhai_carbon") 
> carbon.sql("select * from temp.yuhai_carbon").show 
> {code} 
> Exception: 
> {code} 
> Caused by: java.io.IOException: File does not exist: 
> hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema 
>   at 
> org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441)
>  
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191)
>  
>   at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) 
>   at org.apache.spark.scheduler.Task.run(Task.scala:104) 
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) 
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
>   at java.lang.Thread.run(Thread.java:745) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (CARBONDATA-1338) Spark can not query data when 'spark.carbon.hive.schema.store' is true

2017-07-30 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai resolved CARBONDATA-1338.
---
   Resolution: Fixed
Fix Version/s: 1.2.0

> Spark can not query data when 'spark.carbon.hive.schema.store' is true
> --
>
> Key: CARBONDATA-1338
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1338
> Project: CarbonData
>  Issue Type: Bug
>Reporter: cen yuhai
>Assignee: cen yuhai
> Fix For: 1.2.0
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> My step is as blow: 
> {code} 
> set spark.carbon.hive.schema.store=true in spark-defaults.conf
> spark-shell --jars 
> carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
> import org.apache.spark.sql.SparkSession 
> import org.apache.spark.sql.CarbonSession._ 
> val rootPath = "hdfs://mycluster/user/master/carbon" 
> val storeLocation = s"$rootPath/store" 
> val warehouse = s"$rootPath/warehouse" 
> val metastoredb = s"$rootPath/metastore_db" 
> val carbon 
> =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
>  metastoredb) 
> carbon.sql("create table temp.yuhai_carbon(id short, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'") 
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv 
> INTO TABLE temp.yuhai_carbon") 
> carbon.sql("select * from temp.yuhai_carbon").show 
> {code} 
> Exception: 
> {code} 
> Caused by: java.io.IOException: File does not exist: 
> hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema 
>   at 
> org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441)
>  
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191)
>  
>   at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) 
>   at org.apache.spark.scheduler.Task.run(Task.scala:104) 
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) 
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
>   at java.lang.Thread.run(Thread.java:745) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true

2017-07-29 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1338:
--
Description: 
My step is as blow: 
{code} 
set spark.carbon.hive.schema.store=true in spark-defaults.conf
spark-shell --jars 
carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
import org.apache.spark.sql.SparkSession 
import org.apache.spark.sql.CarbonSession._ 
val rootPath = "hdfs://mycluster/user/master/carbon" 
val storeLocation = s"$rootPath/store" 
val warehouse = s"$rootPath/warehouse" 
val metastoredb = s"$rootPath/metastore_db" 

val carbon 
=SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
 metastoredb) 
carbon.sql("create table temp.yuhai_carbon(id short, name string, scale 
decimal, country string, salary double) STORED BY 'carbondata'") 
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO 
TABLE temp.yuhai_carbon") 
carbon.sql("select * from temp.yuhai_carbon").show 
{code} 
Exception: 
{code} 
Caused by: java.io.IOException: File does not exist: 
hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema 
  at 
org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
 
  at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142)
 
  at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441)
 
  at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191)
 
  at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) 
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) 
  at org.apache.spark.scheduler.Task.run(Task.scala:104) 
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) 
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
  at java.lang.Thread.run(Thread.java:745) 
{code}

  was:
My step is as blow: 
{code} 
set spark.carbon.hive.schema.store=true in carbon.properties 
spark-shell --jars 
carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
 --files carbon.properties 

import org.apache.spark.sql.SparkSession 
import org.apache.spark.sql.CarbonSession._ 
val rootPath = "hdfs://mycluster/user/master/carbon" 
val storeLocation = s"$rootPath/store" 
val warehouse = s"$rootPath/warehouse" 
val metastoredb = s"$rootPath/metastore_db" 

val carbon 
=SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
 metastoredb) 
carbon.sql("create table temp.yuhai_carbon(id short, name string, scale 
decimal, country string, salary double) STORED BY 'carbondata'") 
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO 
TABLE temp.yuhai_carbon") 
carbon.sql("select * from temp.yuhai_carbon").show 
{code} 
Exception: 
{code} 
Caused by: java.io.IOException: File does not exist: 
hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema 
  at 
org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
 
  at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142)
 
  at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441)
 
  at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191)
 
  at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) 
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) 
  at org.apache.spark.scheduler.Task.run(Task.scala:104) 
  

[jira] [Commented] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true

2017-07-29 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106044#comment-16106044
 ] 

cen yuhai commented on CARBONDATA-1338:
---

I am working on it

> Can not query data when 'spark.carbon.hive.schema.store' is true
> 
>
> Key: CARBONDATA-1338
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1338
> Project: CarbonData
>  Issue Type: Bug
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> My step is as blow: 
> {code} 
> set spark.carbon.hive.schema.store=true in carbon.properties 
> spark-shell --jars 
> carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
>  --files carbon.properties 
> import org.apache.spark.sql.SparkSession 
> import org.apache.spark.sql.CarbonSession._ 
> val rootPath = "hdfs://mycluster/user/master/carbon" 
> val storeLocation = s"$rootPath/store" 
> val warehouse = s"$rootPath/warehouse" 
> val metastoredb = s"$rootPath/metastore_db" 
> val carbon 
> =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
>  metastoredb) 
> carbon.sql("create table temp.yuhai_carbon(id short, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'") 
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv 
> INTO TABLE temp.yuhai_carbon") 
> carbon.sql("select * from temp.yuhai_carbon").show 
> {code} 
> Exception: 
> {code} 
> Caused by: java.io.IOException: File does not exist: 
> hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema 
>   at 
> org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441)
>  
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191)
>  
>   at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) 
>   at org.apache.spark.scheduler.Task.run(Task.scala:104) 
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) 
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
>   at java.lang.Thread.run(Thread.java:745) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true

2017-07-29 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai reassigned CARBONDATA-1338:
-

Assignee: cen yuhai

> Can not query data when 'spark.carbon.hive.schema.store' is true
> 
>
> Key: CARBONDATA-1338
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1338
> Project: CarbonData
>  Issue Type: Bug
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> My step is as blow: 
> {code} 
> set spark.carbon.hive.schema.store=true in carbon.properties 
> spark-shell --jars 
> carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
>  --files carbon.properties 
> import org.apache.spark.sql.SparkSession 
> import org.apache.spark.sql.CarbonSession._ 
> val rootPath = "hdfs://mycluster/user/master/carbon" 
> val storeLocation = s"$rootPath/store" 
> val warehouse = s"$rootPath/warehouse" 
> val metastoredb = s"$rootPath/metastore_db" 
> val carbon 
> =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
>  metastoredb) 
> carbon.sql("create table temp.yuhai_carbon(id short, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'") 
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv 
> INTO TABLE temp.yuhai_carbon") 
> carbon.sql("select * from temp.yuhai_carbon").show 
> {code} 
> Exception: 
> {code} 
> Caused by: java.io.IOException: File does not exist: 
> hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema 
>   at 
> org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441)
>  
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191)
>  
>   at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) 
>   at org.apache.spark.scheduler.Task.run(Task.scala:104) 
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) 
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
>   at java.lang.Thread.run(Thread.java:745) 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true

2017-07-29 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1338:
-

 Summary: Can not query data when 'spark.carbon.hive.schema.store' 
is true
 Key: CARBONDATA-1338
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1338
 Project: CarbonData
  Issue Type: Bug
Reporter: cen yuhai






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true

2017-07-29 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1338:
--
Docs Text:   (was: My step is as blow:
{code}
set spark.carbon.hive.schema.store=true in carbon.properties
spark-shell --jars 
carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
 --files carbon.properties

import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val rootPath = "hdfs://mycluster/user/master/carbon"
val storeLocation = s"$rootPath/store"
val warehouse = s"$rootPath/warehouse"
val metastoredb = s"$rootPath/metastore_db"

val carbon 
=SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
 metastoredb)
carbon.sql("create table temp.yuhai_carbon(id short, name string, scale 
decimal, country string, salary double) STORED BY 'carbondata'")
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv' INTO 
TABLE temp.yuhai_carbon")
carbon.sql("select * from temp.yuhai_carbon").show
{code}
Exception:
{code}
Caused by: java.io.IOException: File does not exist: 
hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema
  at 
org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
  at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142)
  at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441)
  at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191)
  at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
  at org.apache.spark.scheduler.Task.run(Task.scala:104)
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
{code})

> Can not query data when 'spark.carbon.hive.schema.store' is true
> 
>
> Key: CARBONDATA-1338
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1338
> Project: CarbonData
>  Issue Type: Bug
>Reporter: cen yuhai
>
> My step is as blow: 
> {code} 
> set spark.carbon.hive.schema.store=true in carbon.properties 
> spark-shell --jars 
> carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
>  --files carbon.properties 
> import org.apache.spark.sql.SparkSession 
> import org.apache.spark.sql.CarbonSession._ 
> val rootPath = "hdfs://mycluster/user/master/carbon" 
> val storeLocation = s"$rootPath/store" 
> val warehouse = s"$rootPath/warehouse" 
> val metastoredb = s"$rootPath/metastore_db" 
> val carbon 
> =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
>  metastoredb) 
> carbon.sql("create table temp.yuhai_carbon(id short, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'") 
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv 
> INTO TABLE temp.yuhai_carbon") 
> carbon.sql("select * from temp.yuhai_carbon").show 
> {code} 
> Exception: 
> {code} 
> Caused by: java.io.IOException: File does not exist: 
> hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema 
>   at 
> org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441)
>  
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191)
>  
>   at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
>   at 

[jira] [Updated] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true

2017-07-29 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1338:
--
Description: 
My step is as blow: 
{code} 
set spark.carbon.hive.schema.store=true in carbon.properties 
spark-shell --jars 
carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
 --files carbon.properties 

import org.apache.spark.sql.SparkSession 
import org.apache.spark.sql.CarbonSession._ 
val rootPath = "hdfs://mycluster/user/master/carbon" 
val storeLocation = s"$rootPath/store" 
val warehouse = s"$rootPath/warehouse" 
val metastoredb = s"$rootPath/metastore_db" 

val carbon 
=SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
 metastoredb) 
carbon.sql("create table temp.yuhai_carbon(id short, name string, scale 
decimal, country string, salary double) STORED BY 'carbondata'") 
carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO 
TABLE temp.yuhai_carbon") 
carbon.sql("select * from temp.yuhai_carbon").show 
{code} 
Exception: 
{code} 
Caused by: java.io.IOException: File does not exist: 
hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema 
  at 
org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
 
  at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142)
 
  at 
org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441)
 
  at 
org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191)
 
  at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) 
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
  at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) 
  at org.apache.spark.scheduler.Task.run(Task.scala:104) 
  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) 
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
  at java.lang.Thread.run(Thread.java:745) 
{code}

> Can not query data when 'spark.carbon.hive.schema.store' is true
> 
>
> Key: CARBONDATA-1338
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1338
> Project: CarbonData
>  Issue Type: Bug
>Reporter: cen yuhai
>
> My step is as blow: 
> {code} 
> set spark.carbon.hive.schema.store=true in carbon.properties 
> spark-shell --jars 
> carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar
>  --files carbon.properties 
> import org.apache.spark.sql.SparkSession 
> import org.apache.spark.sql.CarbonSession._ 
> val rootPath = "hdfs://mycluster/user/master/carbon" 
> val storeLocation = s"$rootPath/store" 
> val warehouse = s"$rootPath/warehouse" 
> val metastoredb = s"$rootPath/metastore_db" 
> val carbon 
> =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation,
>  metastoredb) 
> carbon.sql("create table temp.yuhai_carbon(id short, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'") 
> carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv 
> INTO TABLE temp.yuhai_carbon") 
> carbon.sql("select * from temp.yuhai_carbon").show 
> {code} 
> Exception: 
> {code} 
> Caused by: java.io.IOException: File does not exist: 
> hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema 
>   at 
> org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142)
>  
>   at 
> org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441)
>  
>   at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191)
>  
>   at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) 
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) 
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) 
>   at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
>   at 

[jira] [Closed] (CARBONDATA-1031) spark-sql can't read the carbon table

2017-07-14 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai closed CARBONDATA-1031.
-
Resolution: Cannot Reproduce

> spark-sql can't read the carbon table
> -
>
> Key: CARBONDATA-1031
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1031
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 1.1.0
>Reporter: cen yuhai
>Assignee: anubhav tarar
>
> I create a carbon table by spark-shell
> And then I use this command  "spark-sql --jars carbon*.jar" to start 
> spark-sql cli.
> When the first time I execute this "select * from temp.test-schema", spark 
> will throw exception. After I execute another command, It will be ok.
> {code}
> 17/05/06 21:43:12 ERROR 
> [org.apache.spark.sql.hive.thriftserver.SparkSQLDriver(91) -- main]: Failed 
> in [select * from temp.test_schema]
> java.lang.AssertionError: assertion failed: No plan for 
> Relation[id#10,name#11,scale#12,country#13,salary#14] 
> CarbonDatasourceHadoopRelation(org.apache.spark.sql.SparkSession@42d9ea3b,[Ljava.lang.String;@70a0e9c6,Map(path
>  -> hdfs:user/hadoop/carbon/store/temp/test_schema, serialization.format 
> -> 1, dbname -> temp, tablepath -> 
> hdfs:user/hadoop/carbon/store/temp/test_schema, tablename -> 
> test_schema),None,ArrayBuffer())
> at scala.Predef$.assert(Predef.scala:170)
> at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
> at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77)
> at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74)
> at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
> at 
> scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
> at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336)
> at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74)
> at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92)
> at 
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84)
> at 
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80)
> at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89)
> at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89)
> at 
> org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:119)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335)
> at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:742)
> at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:186)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:211)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (CARBONDATA-1153) Can not add column

2017-07-09 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079624#comment-16079624
 ] 

cen yuhai commented on CARBONDATA-1153:
---

I found the root cause, I don't have a carbon.properties
{code}
17/07/09 19:08:13 ERROR HdfsFileLock: Executor task launch worker for task 7 
Incomplete HDFS URI, no host: 
hdfs://mycluster../carbon.store/temp/yuhai_carbon/d91e35aa-5f13-499c-adcb-94fc20dcf8fb.lock
java.io.IOException: Incomplete HDFS URI, no host: 
hdfs://mycluster../carbon.store/temp/yuhai_carbon/d91e35aa-5f13-499c-adcb-94fc20dcf8fb.lock
{code}


> Can not add column
> --
>
> Key: CARBONDATA-1153
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1153
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.2.0
>Reporter: cen yuhai
>
> Sometimes it will throws exception as below. why can't I add column? no one 
> are altering the table... 
> {code}
> scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)")
> 17/06/11 22:09:13 AUDIT 
> [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: 
> [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add 
> columns request has been received for temp.yuhai_carbon
> 17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- 
> task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job
> 17/06/11 22:10:22 ERROR 
> [org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: 
> main Alter table add columns failed :Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): 
> java.lang.RuntimeException: Dictionary file test1 is locked for updation. 
> Please try after some time
> at scala.sys.package$.error(package.scala:27)
> at 
> org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857)
> at 
> org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83)
> at 
> org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
> at org.apache.spark.scheduler.Task.run(Task.scala:104)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (CARBONDATA-1153) Can not add column

2017-06-11 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1153:
--
Summary: Can not add column  (was: Can not add column because it is aborted)

> Can not add column
> --
>
> Key: CARBONDATA-1153
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1153
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.2.0
>Reporter: cen yuhai
>
> Sometimes it will throws exception as below. why can't I add column? no one 
> are altering the table... 
> {code}
> scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)")
> 17/06/11 22:09:13 AUDIT 
> [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: 
> [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add 
> columns request has been received for temp.yuhai_carbon
> 17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- 
> task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job
> 17/06/11 22:10:22 ERROR 
> [org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: 
> main Alter table add columns failed :Job aborted due to stage failure: Task 0 
> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): 
> java.lang.RuntimeException: Dictionary file test1 is locked for updation. 
> Please try after some time
> at scala.sys.package$.error(package.scala:27)
> at 
> org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857)
> at 
> org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83)
> at 
> org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
> at org.apache.spark.scheduler.Task.run(Task.scala:104)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (CARBONDATA-1153) Can not add column because it is aborted

2017-06-11 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai updated CARBONDATA-1153:
--
Description: 
Sometimes it will throws exception as below. why can't I add column? no one are 
altering the table... 
{code}
scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)")
17/06/11 22:09:13 AUDIT 
[org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: 
[sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add columns 
request has been received for temp.yuhai_carbon
17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- 
task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job
17/06/11 22:10:22 ERROR 
[org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: 
main Alter table add columns failed :Job aborted due to stage failure: Task 0 
in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
(TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): 
java.lang.RuntimeException: Dictionary file test1 is locked for updation. 
Please try after some time
at scala.sys.package$.error(package.scala:27)
at 
org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857)
at 
org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83)
at 
org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
at org.apache.spark.scheduler.Task.run(Task.scala:104)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}

  was:
why can't I add column? no one are altering the table...
{code}
scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)")
17/06/11 22:09:13 AUDIT 
[org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: 
[sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add columns 
request has been received for temp.yuhai_carbon
17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- 
task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job
17/06/11 22:10:22 ERROR 
[org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: 
main Alter table add columns failed :Job aborted due to stage failure: Task 0 
in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
(TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): 
java.lang.RuntimeException: Dictionary file test1 is locked for updation. 
Please try after some time
at scala.sys.package$.error(package.scala:27)
at 
org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857)
at 
org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83)
at 
org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
at org.apache.spark.scheduler.Task.run(Task.scala:104)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}


> Can not add column because it is aborted
> 
>
> Key: CARBONDATA-1153
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1153
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.2.0
>Reporter: cen yuhai
>
> Sometimes it will throws exception as below. why can't I add column? no one 
> are altering the table... 
> {code}
> scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)")
> 17/06/11 22:09:13 AUDIT 
> [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: 
> [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add 
> columns request has been received for temp.yuhai_carbon
> 17/06/11 22:10:22 ERROR 

[jira] [Created] (CARBONDATA-1153) Can not add column because it is aborted

2017-06-11 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1153:
-

 Summary: Can not add column because it is aborted
 Key: CARBONDATA-1153
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1153
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Affects Versions: 1.2.0
Reporter: cen yuhai


why can't I add column? no one are altering the table...
{code}
scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)")
17/06/11 22:09:13 AUDIT 
[org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: 
[sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add columns 
request has been received for temp.yuhai_carbon
17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- 
task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job
17/06/11 22:10:22 ERROR 
[org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: 
main Alter table add columns failed :Job aborted due to stage failure: Task 0 
in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
(TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): 
java.lang.RuntimeException: Dictionary file test1 is locked for updation. 
Please try after some time
at scala.sys.package$.error(package.scala:27)
at 
org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857)
at 
org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83)
at 
org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:295)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88)
at org.apache.spark.scheduler.Task.run(Task.scala:104)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CARBONDATA-1105) ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf

2017-06-11 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045876#comment-16045876
 ] 

cen yuhai commented on CARBONDATA-1105:
---

I think we should support spark2.1.1, right? spark2.1.0 is not stable

> ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf
> --
>
> Key: CARBONDATA-1105
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1105
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.2.0
> Environment: spark 2.1.1
>Reporter: cen yuhai
>
> I think it is related to SPARK-19944
> https://github.com/apache/spark/pull/17301
> {code}
> scala> carbon.sql("create table temp.test_carbon(id int, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'")
> java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf
>   at 
> org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127)
>   at 
> org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:593)
>   ... 52 elided
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.sql.catalyst.CatalystConf
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   ... 59 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (CARBONDATA-1105) ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf

2017-06-11 Thread cen yuhai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

cen yuhai reassigned CARBONDATA-1105:
-

Assignee: cen yuhai

> ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf
> --
>
> Key: CARBONDATA-1105
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1105
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.2.0
> Environment: spark 2.1.1
>Reporter: cen yuhai
>Assignee: cen yuhai
>
> I think it is related to SPARK-19944
> https://github.com/apache/spark/pull/17301
> {code}
> scala> carbon.sql("create table temp.test_carbon(id int, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'")
> java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf
>   at 
> org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127)
>   at 
> org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:593)
>   ... 52 elided
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.sql.catalyst.CatalystConf
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   ... 59 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CARBONDATA-1102) Selecting Int type in hive from carbon table is showing class cast exception

2017-05-30 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029567#comment-16029567
 ] 

cen yuhai commented on CARBONDATA-1102:
---

I will fix it in CARBON-1008

> Selecting Int type in hive from carbon table is showing class cast exception
> 
>
> Key: CARBONDATA-1102
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1102
> Project: CarbonData
>  Issue Type: Bug
>  Components: hive-integration
>Affects Versions: 1.2.0
> Environment: hive,spark 2.1
>Reporter: anubhav tarar
>Assignee: anubhav tarar
>Priority: Trivial
>
> in carbon
> 0: jdbc:hive2://localhost:1> CREATE TABLE ALLDATATYPETEST(ID INT,NAME 
> STRING,SALARY DECIMAL,MARKS DOUBLE,JOININGDATE DATE,LEAVINGDATE TIMESTAMP) 
> STORED BY 'CARBONDATA' ;
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (3.702 seconds)
> 0: jdbc:hive2://localhost:1> LOAD DATA INPATH 
> 'hdfs://localhost:54310/alldatatypetest.csv' into table alldatatypetest;
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (7.16 seconds)
> 0: jdbc:hive2://localhost:1> SELECT * FROM ALLDATATYPETEST;
> +-++-++--++--+
> | ID  |NAME| SALARY  | MARKS  | JOININGDATE  |  LEAVINGDATE   
> |
> +-++-++--++--+
> | 1   | 'ANUBHAV'  | 20  | 100.0  | 2016-04-14   | 2016-04-14 15:00:09.0  
> |
> | 2   | 'LIANG'| 20  | 100.0  | 2016-04-14   | 2016-04-14 15:00:09.0  
> |
> +-++-++--++--+
> 2 rows selected (1.978 seconds)
> in hive
> hive> CREATE TABLE ALLDATATYPETEST(ID INT,NAME STRING,SALARY DECIMAL,MARKS 
> DOUBLE,JOININGDATE DATE,LEAVINGDATE TIMESTAMP) ROW FORMAT SERDE 
> 'org.apache.carbondata.hive.CarbonHiveSerDe' STORED AS INPUTFORMAT 
> 'org.apache.carbondata.hive.MapredCarbonInputFormat' OUTPUTFORMAT 
> 'org.apache.carbondata.hive.MapredCarbonOutputFormat' TBLPROPERTIES 
> ('spark.sql.sources.provider'='org.apache.spark.sql.CarbonSource');
> OK
> Time taken: 1.934 seconds
> hive> ALTER TABLE ALLDATATYPETEST SET LOCATION 
> 'hdfs://localhost:54310/opt/carbonStore/default/alldatatypetest';
> OK
> Time taken: 1.192 seconds
> hive> SELECT * FROM ALLDATATYPETEST;
> OK
> Failed with exception java.io.IOException:java.lang.ClassCastException: 
> java.lang.Integer cannot be cast to java.lang.Long
> Time taken: 0.174 seconds



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CARBONDATA-1105) ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf

2017-05-30 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028810#comment-16028810
 ] 

cen yuhai commented on CARBONDATA-1105:
---

we should rebuild carbon with spark2.1.1. mvn clean package 
-Dspark.version=2.1.1 -Pspark-2.1 -DskipTests

> ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf
> --
>
> Key: CARBONDATA-1105
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1105
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.2.0
> Environment: spark 2.1.1
>Reporter: cen yuhai
>
> I think it is related to SPARK-19944
> https://github.com/apache/spark/pull/17301
> {code}
> scala> carbon.sql("create table temp.test_carbon(id int, name string, scale 
> decimal, country string, salary double) STORED BY 'carbondata'")
> java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf
>   at 
> org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127)
>   at 
> org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
>   at 
> org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
>   at 
> org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
>   at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:593)
>   ... 52 elided
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.sql.catalyst.CatalystConf
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   ... 59 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (CARBONDATA-1105) ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf

2017-05-30 Thread cen yuhai (JIRA)
cen yuhai created CARBONDATA-1105:
-

 Summary: ClassNotFoundException: 
org.apache.spark.sql.catalyst.CatalystConf
 Key: CARBONDATA-1105
 URL: https://issues.apache.org/jira/browse/CARBONDATA-1105
 Project: CarbonData
  Issue Type: Bug
  Components: core
Affects Versions: 1.2.0
 Environment: spark 2.1.1
Reporter: cen yuhai


I think it is related to SPARK-19944
https://github.com/apache/spark/pull/17301
{code}
scala> carbon.sql("create table temp.test_carbon(id int, name string, scale 
decimal, country string, salary double) STORED BY 'carbondata'")
java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf
  at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127)
  at 
org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126)
  at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69)
  at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67)
  at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50)
  at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63)
  at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:593)
  ... 52 elided
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.sql.catalyst.CatalystConf
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
  ... 59 more
{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)