[jira] [Commented] (CARBONDATA-1657) Partition column is empty when insert from a hive table
[ https://issues.apache.org/jira/browse/CARBONDATA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233951#comment-16233951 ] cen yuhai commented on CARBONDATA-1657: --- dt's datatype is string > Partition column is empty when insert from a hive table > --- > > Key: CARBONDATA-1657 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1657 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.2.0 > Environment: carbonata1.2.0 spark 2.1.1 >Reporter: cen yuhai >Priority: Critical > > I create table a carbon table, the schema is like a hive table(dt is the > partition column). > And then > {code} > insert overwrite table dm_test.dm_trd_wide_carbondata select * from > hive_table where dt='2017-10-10'; > insert overwrite table dm_test.dm_trd_wide_parquet select * from hive_table > where dt='2017-10-10'; > {code} > {code} > spark-sql> select dt from dm_test.dm_trd_wide_parquet limit 10; > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > Time taken: 1.259 seconds, Fetched 10 row(s) > spark-sql> select dt from dm_test.dm_trd_wide_carbondata limit 10; > NULL > NULL > NULL > NULL > NULL > NULL > NULL > NULL > NULL > NULL > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (CARBONDATA-1657) Partition column is empty when insert from a hive table
[ https://issues.apache.org/jira/browse/CARBONDATA-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16233951#comment-16233951 ] cen yuhai edited comment on CARBONDATA-1657 at 11/1/17 11:36 AM: - the datatype of dt is string was (Author: cenyuhai): dt's datatype is string > Partition column is empty when insert from a hive table > --- > > Key: CARBONDATA-1657 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1657 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.2.0 > Environment: carbonata1.2.0 spark 2.1.1 >Reporter: cen yuhai >Priority: Critical > > I create table a carbon table, the schema is like a hive table(dt is the > partition column). > And then > {code} > insert overwrite table dm_test.dm_trd_wide_carbondata select * from > hive_table where dt='2017-10-10'; > insert overwrite table dm_test.dm_trd_wide_parquet select * from hive_table > where dt='2017-10-10'; > {code} > {code} > spark-sql> select dt from dm_test.dm_trd_wide_parquet limit 10; > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > 2017-10-10 > Time taken: 1.259 seconds, Fetched 10 row(s) > spark-sql> select dt from dm_test.dm_trd_wide_carbondata limit 10; > NULL > NULL > NULL > NULL > NULL > NULL > NULL > NULL > NULL > NULL > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1654) NullPointerException when insert overwrite table
[ https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16226131#comment-16226131 ] cen yuhai commented on CARBONDATA-1654: --- I update the description > NullPointerException when insert overwrite table > > > Key: CARBONDATA-1654 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1654 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.2.0 > Environment: spark 2.1.1 carbondata 1.2.0 >Reporter: cen yuhai >Priority: Critical > > carbon.sql("insert overwrite table carbondata_table select * from hive_table > where dt = '2017-10-10' ").collect > carbondata wanto find directory Segment_1, but there is Segment_2 > {code} > [Stage 0:> (0 + 504) / > 504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- > SparkUI-174]: The following warnings have been detected: WARNING: The > (sub)resource method stageData in > org.apache.spark.status.api.v1.OneStageResource contains empty path > annotation. > 17/10/28 19:25:20 ERROR > [org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) > -- main]: main Exception occurred:File does not exist: > hdfs://bipcluster/user/master/carbon/store/dm_test/carbondata_table/Fact/Part0/Segment_1 > 17/10/28 19:25:22 ERROR > [org.apache.spark.sql.execution.command.LoadTable(143) -- main]: main > java.lang.NullPointerException > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88) > at > org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364) > at > org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93) > at > org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326) > at > org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at > org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322) > at > org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107) > at > org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:180) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619) > at > $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36) > at > $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41) > at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43) > at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45) > at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47) > at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49) > at $line23.$read$$iw$$iw$$iw$$iw.(:51) > at $line23.$read$$iw$$iw$$iw.(:53) > at $line23.$read$$iw$$iw.(:55) > at $line23.$read$$iw.(:57) > at $line23.$read.(:59) > at $line23.$read$.(:63) > at $line23.$read$.() > at $line23.$eval$.$print$lzycompute(:7) > at $line23.$eval$.$print(:6) > at $line23.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at >
[jira] [Updated] (CARBONDATA-1654) NullPointerException when insert overwrite table
[ https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1654: -- Description: carbon.sql("insert overwrite table carbondata_table select * from hive_table where dt = '2017-10-10' ").collect carbondata wanto find directory Segment_1, but there is Segment_2 {code} [Stage 0:> (0 + 504) / 504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- SparkUI-174]: The following warnings have been detected: WARNING: The (sub)resource method stageData in org.apache.spark.status.api.v1.OneStageResource contains empty path annotation. 17/10/28 19:25:20 ERROR [org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) -- main]: main Exception occurred:File does not exist: hdfs://bipcluster/user/master/carbon/store/dm_test/carbondata_table/Fact/Part0/Segment_1 17/10/28 19:25:22 ERROR [org.apache.spark.sql.execution.command.LoadTable(143) -- main]: main java.lang.NullPointerException at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88) at org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364) at org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93) at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326) at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322) at org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331) at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595) at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107) at org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754) at org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651) at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) at org.apache.spark.sql.Dataset.(Dataset.scala:180) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47) at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49) at $line23.$read$$iw$$iw$$iw$$iw.(:51) at $line23.$read$$iw$$iw$$iw.(:53) at $line23.$read$$iw$$iw.(:55) at $line23.$read$$iw.(:57) at $line23.$read.(:59) at $line23.$read$.(:63) at $line23.$read$.() at $line23.$eval$.$print$lzycompute(:7) at $line23.$eval$.$print(:6) at $line23.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637) at
[jira] [Created] (CARBONDATA-1657) Partition column is empty when insert from a hive table
cen yuhai created CARBONDATA-1657: - Summary: Partition column is empty when insert from a hive table Key: CARBONDATA-1657 URL: https://issues.apache.org/jira/browse/CARBONDATA-1657 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.2.0 Environment: carbonata1.2.0 spark 2.1.1 Reporter: cen yuhai Priority: Critical I create table a carbon table, the schema is like a hive table(dt is the partition column). And then {code} insert overwrite table dm_test.dm_trd_wide_carbondata select * from hive_table where dt='2017-10-10'; insert overwrite table dm_test.dm_trd_wide_parquet select * from hive_table where dt='2017-10-10'; {code} {code} spark-sql> select dt from dm_test.dm_trd_wide_parquet limit 10; 2017-10-10 2017-10-10 2017-10-10 2017-10-10 2017-10-10 2017-10-10 2017-10-10 2017-10-10 2017-10-10 2017-10-10 Time taken: 1.259 seconds, Fetched 10 row(s) spark-sql> select dt from dm_test.dm_trd_wide_carbondata limit 10; NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1654) NullPointerException when insert overwrite table
[ https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16224453#comment-16224453 ] cen yuhai commented on CARBONDATA-1654: --- No, I can't. Why I should upate schema? > NullPointerException when insert overwrite table > > > Key: CARBONDATA-1654 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1654 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.2.0 > Environment: spark 2.1.1 carbondata 1.2.0 >Reporter: cen yuhai >Priority: Critical > > carbon.sql("insert overwrite table carbondata_table select * from hive_table > where dt = '2017-10-10' ").collect > carbondata wanto find directory Segment_1, but there is Segment_2 > {code} > [Stage 0:> (0 + 504) / > 504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- > SparkUI-174]: The following warnings have been detected: WARNING: The > (sub)resource method stageData in > org.apache.spark.status.api.v1.OneStageResource contains empty path > annotation. > 17/10/28 19:25:20 ERROR > [org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) > -- main]: main Exception occurred:File does not exist: > hdfs://bipcluster/user/master/carbon/store/dm_test/carbondata_table/Fact/Part0/Segment_1 > 17/10/28 19:25:22 ERROR > [org.apache.spark.sql.execution.command.LoadTable(143) -- main]: main > java.lang.NullPointerException > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88) > at > org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364) > at > org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93) > at > org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326) > at > org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at > org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322) > at > org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107) > at > org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:180) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619) > at > $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36) > at > $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41) > at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43) > at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45) > at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47) > at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49) > at $line23.$read$$iw$$iw$$iw$$iw.(:51) > at $line23.$read$$iw$$iw$$iw.(:53) > at $line23.$read$$iw$$iw.(:55) > at $line23.$read$$iw.(:57) > at $line23.$read.(:59) > at $line23.$read$.(:63) > at $line23.$read$.() > at $line23.$eval$.$print$lzycompute(:7) > at $line23.$eval$.$print(:6) > at $line23.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at >
[jira] [Updated] (CARBONDATA-1655) getSplits function is very slow !!!
[ https://issues.apache.org/jira/browse/CARBONDATA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1655: -- Description: I have a table which has 4 billion records, I find that the getSplits function is too slow! getSplit spent 20s!!! {code} "main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable [0x7fcc992b6000] java.lang.Thread.State: RUNNABLE at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61) at org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80) at org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324) at org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:258) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258) at scala.Option.getOrElse(Option.scala:121) ``` {code} spark-sql> select dt from dm_test.table_carbondata limit 1; NULL Time taken: 20.94 seconds, Fetched 1 row(s) If the query don't contains sort column, prune should return quickly was: I have a table which has 4 billion records, I find that the getSplits function is too slow! getSplit spent 20s!!! {code} "main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable [0x7fcc992b6000] java.lang.Thread.State: RUNNABLE at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61) at org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80) at org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66) at
[jira] [Updated] (CARBONDATA-1655) getSplits function is very slow !!!
[ https://issues.apache.org/jira/browse/CARBONDATA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1655: -- Description: I have a table which has 4 billion records, I find that the getSplits function is too slow! getSplit spent 20s!!! {code} "main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable [0x7fcc992b6000] java.lang.Thread.State: RUNNABLE at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61) at org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80) at org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324) at org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:258) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258) at scala.Option.getOrElse(Option.scala:121) ``` {code} spark-sql> select dt from dm_test.table_carbondata limit 1; NULL Time taken: 20.94 seconds, Fetched 1 row(s) If the query don't contains sort column, prune should return quickly!!! was: I have a table which has 4 billion records, I find that the getSplits function is too slow! getSplit spent 20s!!! {code} "main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable [0x7fcc992b6000] java.lang.Thread.State: RUNNABLE at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61) at org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80) at org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66) at
[jira] [Updated] (CARBONDATA-1655) getSplits function is very slow !!!
[ https://issues.apache.org/jira/browse/CARBONDATA-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1655: -- Description: I have a table which has 4 billion records, I find that the getSplits function is too slow! getSplit spent 20s!!! {code} "main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable [0x7fcc992b6000] java.lang.Thread.State: RUNNABLE at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61) at org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80) at org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324) at org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:258) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258) at scala.Option.getOrElse(Option.scala:121) ``` {code} spark-sql> select dt from dm_test.table_carbondata limit 1; NULL Time taken: 20.94 seconds, Fetched 1 row(s) was: I have a table which has 4 billion records, I find that the getSplits function is too slow! getSplit spent 20s!!! {code} "main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable [0x7fcc992b6000] java.lang.Thread.State: RUNNABLE at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61) at org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80) at org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66) at
[jira] [Created] (CARBONDATA-1655) getSplits function is very slow !!!
cen yuhai created CARBONDATA-1655: - Summary: getSplits function is very slow !!! Key: CARBONDATA-1655 URL: https://issues.apache.org/jira/browse/CARBONDATA-1655 Project: CarbonData Issue Type: Bug Components: data-query Reporter: cen yuhai I have a table which has 4 billion records, I find that the getSplits function is too slow! getSplit spent 20s!!! {code} "main" #1 prio=5 os_prio=0 tid=0x7fcc94013000 nid=0x5ed5 runnable [0x7fcc992b6000] java.lang.Thread.State: RUNNABLE at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:155) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getLengthInBytes(UnsafeDataMapRow.java:61) at org.apache.carbondata.core.indexstore.row.DataMapRow.getSizeInBytes(DataMapRow.java:80) at org.apache.carbondata.core.indexstore.row.DataMapRow.getTotalSizeInBytes(DataMapRow.java:70) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getRow(UnsafeDataMapRow.java:89) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getSizeInBytes(UnsafeDataMapRow.java:161) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getPosition(UnsafeDataMapRow.java:170) at org.apache.carbondata.core.indexstore.row.UnsafeDataMapRow.getByteArray(UnsafeDataMapRow.java:43) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.createBlocklet(BlockletDataMap.java:310) at org.apache.carbondata.core.indexstore.blockletindex.BlockletDataMap.prune(BlockletDataMap.java:268) at org.apache.carbondata.core.datamap.TableDataMap.prune(TableDataMap.java:66) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getDataBlocksOfSegment(CarbonTableInputFormat.java:524) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:453) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getSplits(CarbonTableInputFormat.java:324) at org.apache.carbondata.spark.rdd.CarbonScanRDD.getPartitions(CarbonScanRDD.scala:84) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:258) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:260) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:258) at scala.Option.getOrElse(Option.scala:121) ``` {code} spark-sql> select dt from dm_test.dm_trd_order_wide_carbondata limit 1; NULL Time taken: 20.94 seconds, Fetched 1 row(s) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1654) NullPointerException when insert overwrite table
[ https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1654: -- Summary: NullPointerException when insert overwrite table (was: NullPointerException when insert overwrite talbe ) > NullPointerException when insert overwrite table > > > Key: CARBONDATA-1654 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1654 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 1.2.0 > Environment: spark 2.1.1 carbondata 1.2.0 >Reporter: cen yuhai >Priority: Critical > > carbon.sql("insert overwrite table carbondata_table select * from hive_table > where dt = '2017-10-10' ").collect > carbondata wanto find directory Segment_1, but there is Segment_2 > {code} > [Stage 0:> (0 + 504) / > 504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- > SparkUI-174]: The following warnings have been detected: WARNING: The > (sub)resource method stageData in > org.apache.spark.status.api.v1.OneStageResource contains empty path > annotation. > 17/10/28 19:25:20 ERROR > [org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) > -- main]: main Exception occurred:File does not exist: > hdfs://bipcluster/user/master/carbon/store/dm_test/dm_trd_order_wide_carbondata/Fact/Part0/Segment_1 > 17/10/28 19:25:22 ERROR > [org.apache.spark.sql.execution.command.LoadTable(143) -- main]: main > java.lang.NullPointerException > at > org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88) > at > org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364) > at > org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93) > at > org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326) > at > org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) > at > org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322) > at > org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595) > at > org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107) > at > org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046) > at > org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651) > at > org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) > at org.apache.spark.sql.Dataset.(Dataset.scala:180) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619) > at > $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36) > at > $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41) > at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43) > at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45) > at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47) > at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49) > at $line23.$read$$iw$$iw$$iw$$iw.(:51) > at $line23.$read$$iw$$iw$$iw.(:53) > at $line23.$read$$iw$$iw.(:55) > at $line23.$read$$iw.(:57) > at $line23.$read.(:59) > at $line23.$read$.(:63) > at $line23.$read$.() > at $line23.$eval$.$print$lzycompute(:7) > at $line23.$eval$.$print(:6) > at $line23.$eval.$print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at
[jira] [Updated] (CARBONDATA-1654) NullPointerException when insert overwrite talbe
[ https://issues.apache.org/jira/browse/CARBONDATA-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1654: -- Description: carbondata wanto find directory Segment_1, but there is Segment_2 {code} [Stage 0:> (0 + 504) / 504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- SparkUI-174]: The following warnings have been detected: WARNING: The (sub)resource method stageData in org.apache.spark.status.api.v1.OneStageResource contains empty path annotation. 17/10/28 19:25:20 ERROR [org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) -- main]: main Exception occurred:File does not exist: hdfs://bipcluster/user/master/carbon/store/dm_test/dm_trd_order_wide_carbondata/Fact/Part0/Segment_1 17/10/28 19:25:22 ERROR [org.apache.spark.sql.execution.command.LoadTable(143) -- main]: main java.lang.NullPointerException at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88) at org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364) at org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93) at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326) at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322) at org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331) at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595) at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107) at org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754) at org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651) at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) at org.apache.spark.sql.Dataset.(Dataset.scala:180) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47) at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49) at $line23.$read$$iw$$iw$$iw$$iw.(:51) at $line23.$read$$iw$$iw$$iw.(:53) at $line23.$read$$iw$$iw.(:55) at $line23.$read$$iw.(:57) at $line23.$read.(:59) at $line23.$read$.(:63) at $line23.$read$.() at $line23.$eval$.$print$lzycompute(:7) at $line23.$eval$.$print(:6) at $line23.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) at
[jira] [Created] (CARBONDATA-1654) NullPointerException when insert overwrite talbe
cen yuhai created CARBONDATA-1654: - Summary: NullPointerException when insert overwrite talbe Key: CARBONDATA-1654 URL: https://issues.apache.org/jira/browse/CARBONDATA-1654 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.2.0 Environment: spark 2.1.1 carbondata 1.2.0 Reporter: cen yuhai Priority: Critical {code} [Stage 0:> (0 + 504) / 504]17/10/28 19:11:28 WARN [org.glassfish.jersey.internal.Errors(191) -- SparkUI-174]: The following warnings have been detected: WARNING: The (sub)resource method stageData in org.apache.spark.status.api.v1.OneStageResource contains empty path annotation. 17/10/28 19:25:20 ERROR [org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile(141) -- main]: main Exception occurred:File does not exist: hdfs://bipcluster/user/master/carbon/store/dm_test/dm_trd_order_wide_carbondata/Fact/Part0/Segment_1 17/10/28 19:25:22 ERROR [org.apache.spark.sql.execution.command.LoadTable(143) -- main]: main java.lang.NullPointerException at org.apache.carbondata.core.datastore.filesystem.AbstractDFSCarbonFile.isDirectory(AbstractDFSCarbonFile.java:88) at org.apache.carbondata.core.util.CarbonUtil.deleteRecursive(CarbonUtil.java:364) at org.apache.carbondata.core.util.CarbonUtil.access$100(CarbonUtil.java:93) at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:326) at org.apache.carbondata.core.util.CarbonUtil$2.run(CarbonUtil.java:322) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.carbondata.core.util.CarbonUtil.deleteFoldersAndFiles(CarbonUtil.java:322) at org.apache.carbondata.spark.load.CarbonLoaderUtil.recordLoadMetadata(CarbonLoaderUtil.java:331) at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.updateStatus$1(CarbonDataRDDFactory.scala:595) at org.apache.carbondata.spark.rdd.CarbonDataRDDFactory$.loadCarbonData(CarbonDataRDDFactory.scala:1107) at org.apache.spark.sql.execution.command.LoadTable.processData(carbonTableSchema.scala:1046) at org.apache.spark.sql.execution.command.LoadTable.run(carbonTableSchema.scala:754) at org.apache.spark.sql.execution.command.LoadTableByInsert.processData(carbonTableSchema.scala:651) at org.apache.spark.sql.execution.command.LoadTableByInsert.run(carbonTableSchema.scala:637) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:67) at org.apache.spark.sql.Dataset.(Dataset.scala:180) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:619) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:36) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:41) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:43) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.(:45) at $line23.$read$$iw$$iw$$iw$$iw$$iw$$iw.(:47) at $line23.$read$$iw$$iw$$iw$$iw$$iw.(:49) at $line23.$read$$iw$$iw$$iw$$iw.(:51) at $line23.$read$$iw$$iw$$iw.(:53) at $line23.$read$$iw$$iw.(:55) at $line23.$read$$iw.(:57) at $line23.$read.(:59) at $line23.$read$.(:63) at $line23.$read$.() at $line23.$eval$.$print$lzycompute(:7) at $line23.$eval$.$print(:6) at $line23.$eval.$print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at
[jira] [Updated] (CARBONDATA-727) Hive integration
[ https://issues.apache.org/jira/browse/CARBONDATA-727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-727: - Attachment: the future of hive integration.png > Hive integration > > > Key: CARBONDATA-727 > URL: https://issues.apache.org/jira/browse/CARBONDATA-727 > Project: CarbonData > Issue Type: New Feature > Components: hive-integration >Affects Versions: NONE >Reporter: cen yuhai >Assignee: cen yuhai > Attachments: the future of hive integration.png > > Time Spent: 4.5h > Remaining Estimate: 0h > > Now hive is widely used in warehouse. I think we should support hive -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1377) Implement hive partition
[ https://issues.apache.org/jira/browse/CARBONDATA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1377: -- Attachment: the future of hive integration.png > Implement hive partition > > > Key: CARBONDATA-1377 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1377 > Project: CarbonData > Issue Type: Sub-task > Components: hive-integration >Reporter: cen yuhai >Assignee: cen yuhai > Attachments: the future of hive integration.png > > > Current partition implement is like database, If I want to use carbon to > replace parquet massively, we must make the usage of carbon the same with > parquet/orc. > Hive users should able to switch to CarbonData for all the new partitions > being created. Hive support format to be specified at partition level. > Example: > {code:sql} > create table rtestpartition (col1 string, col2 int) partitioned by (col3 int) > stored as parquet; > insert into rtestpartition partition(col3=10) select "pqt", 1; > insert into rtestpartition partition(col3=20) select "pqt", 1; > insert into rtestpartition partition(col3=10) select "pqt", 1; > insert into rtestpartition partition(col3=20) select "pqt", 1; > {code} > {noformat} > hive creates folder like > /db1/table1/col3=10/0001_file.pqt > /db1/table1/col3=10/0002_file.pqt > /db1/table1/col3=20/0001_file.pqt > /db1/table1/col3=20/0002_file.pqt > {noformat} > Hive users can now change new partitions to CarbonData, how ever old > partitions still be with parquet and require migration scripts to move to > CarbonData format. > {code:sql} > alter table rtestpartition set fileformat carbondata; > insert into rtestpartition partition(col3=30) select "cdata", 1; > insert into rtestpartition partition(col3=40) select "cdata", 1; > {code} > {noformat} > hive creates folder like > /db1/table1/col3=10/0001_file.pqt > /db1/table1/col3=10/0002_file.pqt > /db1/table1/col3=20/0001_file.pqt > /db1/table1/col3=20/0002_file.pqt > /db1/table1/col3=30/ > /db1/table1/col3=40/ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1377) Implement hive partition
[ https://issues.apache.org/jira/browse/CARBONDATA-1377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai reassigned CARBONDATA-1377: - Assignee: cen yuhai > Implement hive partition > > > Key: CARBONDATA-1377 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1377 > Project: CarbonData > Issue Type: Sub-task > Components: hive-integration >Reporter: cen yuhai >Assignee: cen yuhai > > Current partition implement is like database, If I want to use carbon to > replace parquet massively, we must make the usage of carbon the same with > parquet/orc. > Hive users should able to switch to CarbonData for all the new partitions > being created. Hive support format to be specified at partition level. > Example: > {code:sql} > create table rtestpartition (col1 string, col2 int) partitioned by (col3 int) > stored as parquet; > insert into rtestpartition partition(col3=10) select "pqt", 1; > insert into rtestpartition partition(col3=20) select "pqt", 1; > insert into rtestpartition partition(col3=10) select "pqt", 1; > insert into rtestpartition partition(col3=20) select "pqt", 1; > {code} > {noformat} > hive creates folder like > /db1/table1/col3=10/0001_file.pqt > /db1/table1/col3=10/0002_file.pqt > /db1/table1/col3=20/0001_file.pqt > /db1/table1/col3=20/0002_file.pqt > {noformat} > Hive users can now change new partitions to CarbonData, how ever old > partitions still be with parquet and require migration scripts to move to > CarbonData format. > {code:sql} > alter table rtestpartition set fileformat carbondata; > insert into rtestpartition partition(col3=30) select "cdata", 1; > insert into rtestpartition partition(col3=40) select "cdata", 1; > {code} > {noformat} > hive creates folder like > /db1/table1/col3=10/0001_file.pqt > /db1/table1/col3=10/0002_file.pqt > /db1/table1/col3=20/0001_file.pqt > /db1/table1/col3=20/0002_file.pqt > /db1/table1/col3=30/ > /db1/table1/col3=40/ > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1362) ArrayIndexOutOfBoundsException when decoing decimal type
[ https://issues.apache.org/jira/browse/CARBONDATA-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai closed CARBONDATA-1362. - Resolution: Not A Problem > ArrayIndexOutOfBoundsException when decoing decimal type > > > Key: CARBONDATA-1362 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1362 > Project: CarbonData > Issue Type: Bug > Components: core >Reporter: cen yuhai > > {code} > ava.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:231) > at > org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator.close(AbstractDetailQueryResultIterator.java:306) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.finish(AbstractQueryExecutor.java:544) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.close(VectorizedCarbonRecordReader.java:132) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:215) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:213) > at > org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:123) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:97) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:95) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:95) > at org.apache.spark.scheduler.Task.run(Task.scala:117) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:226) > ... 16 more > Caused by: java.lang.RuntimeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:62) > at > org.apache.carbondata.core.scan.scanner.AbstractBlockletScanner.scanBlocklet(AbstractBlockletScanner.java:100) > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:191) > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:178) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ... 3 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.carbondata.core.util.DataTypeUtil.byteToBigDecimal(DataTypeUtil.java:210) > at > org.apache.carbondata.core.metadata.ColumnPageCodecMeta.deserialize(ColumnPageCodecMeta.java:217) > at > org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.decodeMeasure(CompressedMeasureChunkFileBasedReaderV3.java:236) > at > org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.convertToMeasureChunk(CompressedMeasureChunkFileBasedReaderV3.java:219) > at > org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:59) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1378) Support create carbon table in Hive
[ https://issues.apache.org/jira/browse/CARBONDATA-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai reassigned CARBONDATA-1378: - Assignee: cen yuhai > Support create carbon table in Hive > --- > > Key: CARBONDATA-1378 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1378 > Project: CarbonData > Issue Type: Sub-task > Components: hive-integration >Reporter: cen yuhai >Assignee: cen yuhai > Time Spent: 40m > Remaining Estimate: 0h > > Support create carbon table in Hive -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1477) wrong values shown when fetching date type values in hive
[ https://issues.apache.org/jira/browse/CARBONDATA-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1477: -- Description: {code} import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession .builder() .master("local") .appName("HiveExample") .config("carbonSession.sql.warehouse.dir", warehouse).enableHiveSupport() .getOrCreateCarbonSession( store) carbonSession.sql("""DROP TABLE IF EXISTS HIVE_CARBON_EXAMPLE""".stripMargin) carbonSession .sql( """CREATE TABLE HIVE_CARBON_EXAMPLE (ID int,NAME string,SALARY double,JOININGDATE date) STORED BY |'CARBONDATA' """ .stripMargin) carbonSession.sql( s""" LOAD DATA LOCAL INPATH '$rootPath/integration/hive/src/main/resources/data.csv' INTO TABLE HIVE_CARBON_EXAMPLE """) carbonSession.sql("SELECT * FROM HIVE_CARBON_EXAMPLE").show() carbonSession.stop() try { Class.forName(driverName) } catch { case classNotFoundException: ClassNotFoundException => classNotFoundException.printStackTrace() } HiveEmbeddedServer.start() val port = HiveEmbeddedServer.getFreePort val connection = DriverManager.getConnection(s"jdbc:hive2://localhost:8000/default", "", "") val statement: Statement = connection.createStatement logger.info(s"HIVE CLI IS STARTED ON PORT $port ==") statement.execute("CREATE TABLE IF NOT EXISTS " + "HIVE_CARBON_EXAMPLE " + " (ID int, NAME string,SALARY double,JOININGDATE date)") statement .execute( "ALTER TABLE HIVE_CARBON_EXAMPLE SET FILEFORMAT INPUTFORMAT \"org.apache.carbondata." + "hive.MapredCarbonInputFormat\"OUTPUTFORMAT \"org.apache.carbondata.hive." + "MapredCarbonOutputFormat\"SERDE \"org.apache.carbondata.hive." + "CarbonHiveSerDe\" ") statement .execute( "ALTER TABLE HIVE_CARBON_EXAMPLE SET LOCATION " + s"'file:///$store/default/hive_carbon_example' ") val sql = "SELECT * FROM HIVE_CARBON_EXAMPLE" val resultSet: ResultSet = statement.executeQuery(sql) var rowsFetched = 0 while (resultSet.next) { println("*"+resultSet.getString("JOININGDATE")) } println(s"**Total Number Of Rows Fetched ** $rowsFetched") logger.info("Fetching the Individual Columns ") HiveEmbeddedServer.stop() {code} actual result *null *1970-01-01 values in my csv are ID,NAME,SALARY,JOININGDATE 1,'liang',20,2016-03-14 2,'anubhav',2,2019/03/17 was: import org.apache.spark.sql.CarbonSession._ val carbonSession = SparkSession .builder() .master("local") .appName("HiveExample") .config("carbonSession.sql.warehouse.dir", warehouse).enableHiveSupport() .getOrCreateCarbonSession( store) carbonSession.sql("""DROP TABLE IF EXISTS HIVE_CARBON_EXAMPLE""".stripMargin) carbonSession .sql( """CREATE TABLE HIVE_CARBON_EXAMPLE (ID int,NAME string,SALARY double,JOININGDATE date) STORED BY |'CARBONDATA' """ .stripMargin) carbonSession.sql( s""" LOAD DATA LOCAL INPATH '$rootPath/integration/hive/src/main/resources/data.csv' INTO TABLE HIVE_CARBON_EXAMPLE """) carbonSession.sql("SELECT * FROM HIVE_CARBON_EXAMPLE").show() carbonSession.stop() try { Class.forName(driverName) } catch { case classNotFoundException: ClassNotFoundException => classNotFoundException.printStackTrace() } HiveEmbeddedServer.start() val port = HiveEmbeddedServer.getFreePort val connection = DriverManager.getConnection(s"jdbc:hive2://localhost:8000/default", "", "") val statement: Statement = connection.createStatement logger.info(s"HIVE CLI IS STARTED ON PORT $port ==") statement.execute("CREATE TABLE IF NOT EXISTS " + "HIVE_CARBON_EXAMPLE " + " (ID int, NAME string,SALARY double,JOININGDATE date)") statement .execute( "ALTER TABLE HIVE_CARBON_EXAMPLE SET FILEFORMAT INPUTFORMAT \"org.apache.carbondata." + "hive.MapredCarbonInputFormat\"OUTPUTFORMAT \"org.apache.carbondata.hive." + "MapredCarbonOutputFormat\"SERDE \"org.apache.carbondata.hive." + "CarbonHiveSerDe\" ") statement .execute( "ALTER TABLE HIVE_CARBON_EXAMPLE SET LOCATION " + s"'file:///$store/default/hive_carbon_example' ") val sql = "SELECT * FROM HIVE_CARBON_EXAMPLE" val resultSet: ResultSet = statement.executeQuery(sql) var rowsFetched = 0 while (resultSet.next) { println("*"+resultSet.getString("JOININGDATE")) } println(s"**Total Number Of Rows Fetched
[jira] [Created] (CARBONDATA-1378) Support create carbon table in Hive
cen yuhai created CARBONDATA-1378: - Summary: Support create carbon table in Hive Key: CARBONDATA-1378 URL: https://issues.apache.org/jira/browse/CARBONDATA-1378 Project: CarbonData Issue Type: Sub-task Reporter: cen yuhai Support create carbon table in Hive -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1377) Implement hive partition
cen yuhai created CARBONDATA-1377: - Summary: Implement hive partition Key: CARBONDATA-1377 URL: https://issues.apache.org/jira/browse/CARBONDATA-1377 Project: CarbonData Issue Type: Sub-task Reporter: cen yuhai Current partition implement is like database, If I want to use carbon to replace parquet massively, we must make the usage of carbon the same with parquet/orc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1375) clean hive pom
cen yuhai created CARBONDATA-1375: - Summary: clean hive pom Key: CARBONDATA-1375 URL: https://issues.apache.org/jira/browse/CARBONDATA-1375 Project: CarbonData Issue Type: Bug Components: hive-integration Reporter: cen yuhai the hive pom contains some unnecessary dependencies -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1374) Can't insert carbon if the source table contains array datatype
[ https://issues.apache.org/jira/browse/CARBONDATA-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1374: -- Summary: Can't insert carbon if the source table contains array datatype (was: Can't insert carbon if the source table contains array data) > Can't insert carbon if the source table contains array datatype > --- > > Key: CARBONDATA-1374 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1374 > Project: CarbonData > Issue Type: Bug >Reporter: cen yuhai >Assignee: cen yuhai > > {code} > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -1 > Â Â Â Â at > java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:872) > Â Â Â Â at java.lang.StringBuilder.substring(StringBuilder.java:72) > Â Â Â Â at > scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166) > Â Â Â Â at > org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:126) > Â Â Â Â at > org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD$$anonfun$internalCompute$1.apply$mcVI$sp(CarbonGlobalDictionaryRDD.scala:295) > Â Â Â Â at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) > Â Â Â Â at > org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD.internalCompute(CarbonGlobalDictionaryRDD.scala:294) > Â Â Â Â at > org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) > Â Â Â Â at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > Â Â Â Â at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > Â Â Â Â at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) > Â Â Â Â at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > Â Â Â Â at org.apache.spark.scheduler.Task.run(Task.scala:104) > Â Â Â Â at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > Â Â Â Â at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > Â Â Â Â at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > Â Â Â Â at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1374) Can't insert carbon if the source table contains array data
[ https://issues.apache.org/jira/browse/CARBONDATA-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai reassigned CARBONDATA-1374: - Assignee: cen yuhai > Can't insert carbon if the source table contains array data > --- > > Key: CARBONDATA-1374 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1374 > Project: CarbonData > Issue Type: Bug >Reporter: cen yuhai >Assignee: cen yuhai > > {code} > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -1 > Â Â Â Â at > java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:872) > Â Â Â Â at java.lang.StringBuilder.substring(StringBuilder.java:72) > Â Â Â Â at > scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166) > Â Â Â Â at > org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:126) > Â Â Â Â at > org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD$$anonfun$internalCompute$1.apply$mcVI$sp(CarbonGlobalDictionaryRDD.scala:295) > Â Â Â Â at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) > Â Â Â Â at > org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD.internalCompute(CarbonGlobalDictionaryRDD.scala:294) > Â Â Â Â at > org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) > Â Â Â Â at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > Â Â Â Â at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > Â Â Â Â at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) > Â Â Â Â at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > Â Â Â Â at org.apache.spark.scheduler.Task.run(Task.scala:104) > Â Â Â Â at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > Â Â Â Â at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > Â Â Â Â at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > Â Â Â Â at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1374) Can't insert carbon if the source table contains array data
cen yuhai created CARBONDATA-1374: - Summary: Can't insert carbon if the source table contains array data Key: CARBONDATA-1374 URL: https://issues.apache.org/jira/browse/CARBONDATA-1374 Project: CarbonData Issue Type: Bug Reporter: cen yuhai {code} Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -1 Â Â Â Â at java.lang.AbstractStringBuilder.substring(AbstractStringBuilder.java:872) Â Â Â Â at java.lang.StringBuilder.substring(StringBuilder.java:72) Â Â Â Â at scala.collection.mutable.StringBuilder.substring(StringBuilder.scala:166) Â Â Â Â at org.apache.carbondata.spark.util.CarbonScalaUtil$.getString(CarbonScalaUtil.scala:126) Â Â Â Â at org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD$$anonfun$internalCompute$1.apply$mcVI$sp(CarbonGlobalDictionaryRDD.scala:295) Â Â Â Â at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160) Â Â Â Â at org.apache.carbondata.spark.rdd.CarbonBlockDistinctValuesCombineRDD.internalCompute(CarbonGlobalDictionaryRDD.scala:294) Â Â Â Â at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) Â Â Â Â at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) Â Â Â Â at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) Â Â Â Â at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97) Â Â Â Â at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) Â Â Â Â at org.apache.spark.scheduler.Task.run(Task.scala:104) Â Â Â Â at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) Â Â Â Â at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) Â Â Â Â at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) Â Â Â Â at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1362) ArrayIndexOutOfBoundsException when decoing decimal type
[ https://issues.apache.org/jira/browse/CARBONDATA-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16115420#comment-16115420 ] cen yuhai commented on CARBONDATA-1362: --- I use the old code to create carbon table and then load data, and then I update my code to master, query the data will throws this exception. If I recreate the table, it will be ok > ArrayIndexOutOfBoundsException when decoing decimal type > > > Key: CARBONDATA-1362 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1362 > Project: CarbonData > Issue Type: Bug > Components: core >Reporter: cen yuhai > > {code} > ava.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:231) > at > org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator.close(AbstractDetailQueryResultIterator.java:306) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.finish(AbstractQueryExecutor.java:544) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.close(VectorizedCarbonRecordReader.java:132) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:215) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:213) > at > org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:123) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:97) > at > org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:95) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) > at > org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:95) > at org.apache.spark.scheduler.Task.run(Task.scala:117) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: > java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:188) > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:226) > ... 16 more > Caused by: java.lang.RuntimeException: > java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:62) > at > org.apache.carbondata.core.scan.scanner.AbstractBlockletScanner.scanBlocklet(AbstractBlockletScanner.java:100) > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:191) > at > org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:178) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > ... 3 more > Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 > at > org.apache.carbondata.core.util.DataTypeUtil.byteToBigDecimal(DataTypeUtil.java:210) > at > org.apache.carbondata.core.metadata.ColumnPageCodecMeta.deserialize(ColumnPageCodecMeta.java:217) > at > org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.decodeMeasure(CompressedMeasureChunkFileBasedReaderV3.java:236) > at > org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.convertToMeasureChunk(CompressedMeasureChunkFileBasedReaderV3.java:219) > at > org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:59) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1362) ArrayIndexOutOfBoundsException when decoing decimal type
cen yuhai created CARBONDATA-1362: - Summary: ArrayIndexOutOfBoundsException when decoing decimal type Key: CARBONDATA-1362 URL: https://issues.apache.org/jira/browse/CARBONDATA-1362 Project: CarbonData Issue Type: Bug Components: core Reporter: cen yuhai {code} ava.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:231) at org.apache.carbondata.core.scan.result.iterator.AbstractDetailQueryResultIterator.close(AbstractDetailQueryResultIterator.java:306) at org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.finish(AbstractQueryExecutor.java:544) at org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.close(VectorizedCarbonRecordReader.java:132) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:215) at org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1$$anonfun$7.apply(CarbonScanRDD.scala:213) at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:123) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:97) at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:95) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:95) at org.apache.spark.scheduler.Task.run(Task.scala:117) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator.close(AbstractDataBlockIterator.java:226) ... 16 more Caused by: java.lang.RuntimeException: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:62) at org.apache.carbondata.core.scan.scanner.AbstractBlockletScanner.scanBlocklet(AbstractBlockletScanner.java:100) at org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:191) at org.apache.carbondata.core.scan.processor.AbstractDataBlockIterator$1.call(AbstractDataBlockIterator.java:178) at java.util.concurrent.FutureTask.run(FutureTask.java:262) ... 3 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at org.apache.carbondata.core.util.DataTypeUtil.byteToBigDecimal(DataTypeUtil.java:210) at org.apache.carbondata.core.metadata.ColumnPageCodecMeta.deserialize(ColumnPageCodecMeta.java:217) at org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.decodeMeasure(CompressedMeasureChunkFileBasedReaderV3.java:236) at org.apache.carbondata.core.datastore.chunk.reader.measure.v3.CompressedMeasureChunkFileBasedReaderV3.convertToMeasureChunk(CompressedMeasureChunkFileBasedReaderV3.java:219) at org.apache.carbondata.core.datastore.chunk.impl.MeasureRawColumnChunk.convertToMeasureColDataChunks(MeasureRawColumnChunk.java:59) {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (CARBONDATA-1153) Can not add column
[ https://issues.apache.org/jira/browse/CARBONDATA-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai closed CARBONDATA-1153. - Resolution: Not A Problem > Can not add column > -- > > Key: CARBONDATA-1153 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1153 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.2.0 >Reporter: cen yuhai > > Sometimes it will throws exception as below. why can't I add column? no one > are altering the table... > {code} > scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)") > 17/06/11 22:09:13 AUDIT > [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: > [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add > columns request has been received for temp.yuhai_carbon > 17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- > task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job > 17/06/11 22:10:22 ERROR > [org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: > main Alter table add columns failed :Job aborted due to stage failure: Task 0 > in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): > java.lang.RuntimeException: Dictionary file test1 is locked for updation. > Please try after some time > at scala.sys.package$.error(package.scala:27) > at > org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857) > at > org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83) > at > org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) > at org.apache.spark.scheduler.Task.run(Task.scala:104) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1343) Hive can't query data when the carbon table info is store in hive metastore
[ https://issues.apache.org/jira/browse/CARBONDATA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1343: -- Description: {code} set spark.carbon.hive.schema.store=true in spark-defaults.conf spark-shell --jars carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs://mycluster/user/master/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table temp.hive_carbon(id short, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO TABLE temp.hive_carbon") start hive cli set hive.mapred.supports.subdirectories=true; set mapreduce.input.fileinputformat.input.dir.recursive=true; select * from temp.hive_carbon; {code} {code} 17/07/30 19:33:07 ERROR [CliDriver(1097) -- 53ea0b98-bcf0-4b86-a167-58ce570df284 main]: Failed with exception java.io.IOException:java.io.IOException: File does not exist: hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema java.io.IOException: java.io.IOException: File does not exist: hdfs://bipcluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2187) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: File does not exist: hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema at org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) at org.apache.carbondata.hadoop.CarbonInputFormat.populateCarbonTable(CarbonInputFormat.java:147) at org.apache.carbondata.hadoop.CarbonInputFormat.getCarbonTable(CarbonInputFormat.java:124) at org.apache.carbondata.hadoop.CarbonInputFormat.getAbsoluteTableIdentifier(CarbonInputFormat.java:221) at org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:234) at org.apache.carbondata.hive.MapredCarbonInputFormat.getSplits(MapredCarbonInputFormat.java:51) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) ... 15 more {code} was: {code} set spark.carbon.hive.schema.store=true in spark-defaults.conf spark-shell --jars carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs://mycluster/user/master/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table temp.hive_carbon(id short, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO TABLE temp.hive_carbon") start hive cli set hive.mapred.supports.subdirectories=true; set mapreduce.input.fileinputformat.input.dir.recursive=true; select * from temp.hive_carbon; {code} {code} 17/07/30 19:33:07 ERROR [CliDriver(1097) --
[jira] [Updated] (CARBONDATA-1343) Hive can't query data when the carbon table info is store in hive metastore
[ https://issues.apache.org/jira/browse/CARBONDATA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1343: -- Description: {code} set spark.carbon.hive.schema.store=true in spark-defaults.conf spark-shell --jars carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs://mycluster/user/master/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table temp.hive_carbon(id short, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO TABLE temp.hive_carbon") start hive cli set hive.mapred.supports.subdirectories=true; set mapreduce.input.fileinputformat.input.dir.recursive=true; select * from temp.hive_carbon; {code} {code} 17/07/30 19:33:07 ERROR [CliDriver(1097) -- 53ea0b98-bcf0-4b86-a167-58ce570df284 main]: Failed with exception java.io.IOException:java.io.IOException: File does not exist: hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema java.io.IOException: java.io.IOException: File does not exist: hdfs://bipcluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2187) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: File does not exist: hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema at org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) at org.apache.carbondata.hadoop.CarbonInputFormat.populateCarbonTable(CarbonInputFormat.java:147) at org.apache.carbondata.hadoop.CarbonInputFormat.getCarbonTable(CarbonInputFormat.java:124) at org.apache.carbondata.hadoop.CarbonInputFormat.getAbsoluteTableIdentifier(CarbonInputFormat.java:221) at org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:234) at org.apache.carbondata.hive.MapredCarbonInputFormat.getSplits(MapredCarbonInputFormat.java:51) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) ... 15 more {code} was: set spark.carbon.hive.schema.store=true in spark-defaults.conf spark-shell --jars carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs://mycluster/user/master/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table temp.hive_carbon(id short, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO TABLE temp.hive_carbon") start hive cli ``` set hive.mapred.supports.subdirectories=true; set mapreduce.input.fileinputformat.input.dir.recursive=true; select * from temp.hive_carbon; {code} 17/07/30 19:33:07 ERROR [CliDriver(1097) -- 53ea0b98-bcf0-4b86-a167-58ce570df284 main]:
[jira] [Commented] (CARBONDATA-1343) Hive can't query data when the carbon table info is store in hive metastore
[ https://issues.apache.org/jira/browse/CARBONDATA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106484#comment-16106484 ] cen yuhai commented on CARBONDATA-1343: --- I am working on it > Hive can't query data when the carbon table info is store in hive metastore > --- > > Key: CARBONDATA-1343 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1343 > Project: CarbonData > Issue Type: Bug >Reporter: cen yuhai >Assignee: cen yuhai > > set spark.carbon.hive.schema.store=true in spark-defaults.conf > spark-shell --jars > carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs://mycluster/user/master/carbon" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > val carbon > =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, > metastoredb) > carbon.sql("create table temp.hive_carbon(id short, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv > INTO TABLE temp.hive_carbon") > start hive cli > ``` > set hive.mapred.supports.subdirectories=true; > set mapreduce.input.fileinputformat.input.dir.recursive=true; > select * from temp.hive_carbon; > {code} > 17/07/30 19:33:07 ERROR [CliDriver(1097) -- > 53ea0b98-bcf0-4b86-a167-58ce570df284 main]: Failed with exception > java.io.IOException:java.io.IOException: File does not exist: > hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema > java.io.IOException: java.io.IOException: File does not exist: > hdfs://bipcluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2187) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: File does not exist: > hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema > at > org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) > at > org.apache.carbondata.hadoop.CarbonInputFormat.populateCarbonTable(CarbonInputFormat.java:147) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getCarbonTable(CarbonInputFormat.java:124) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getAbsoluteTableIdentifier(CarbonInputFormat.java:221) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:234) > at > org.apache.carbondata.hive.MapredCarbonInputFormat.getSplits(MapredCarbonInputFormat.java:51) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) > ... 15 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1343) Hive can't query data when the carbon table info is store in hive metastore
cen yuhai created CARBONDATA-1343: - Summary: Hive can't query data when the carbon table info is store in hive metastore Key: CARBONDATA-1343 URL: https://issues.apache.org/jira/browse/CARBONDATA-1343 Project: CarbonData Issue Type: Bug Reporter: cen yuhai set spark.carbon.hive.schema.store=true in spark-defaults.conf spark-shell --jars carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs://mycluster/user/master/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table temp.hive_carbon(id short, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO TABLE temp.hive_carbon") start hive cli ``` set hive.mapred.supports.subdirectories=true; set mapreduce.input.fileinputformat.input.dir.recursive=true; select * from temp.hive_carbon; {code} 17/07/30 19:33:07 ERROR [CliDriver(1097) -- 53ea0b98-bcf0-4b86-a167-58ce570df284 main]: Failed with exception java.io.IOException:java.io.IOException: File does not exist: hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema java.io.IOException: java.io.IOException: File does not exist: hdfs://bipcluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2187) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.io.IOException: File does not exist: hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema at org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) at org.apache.carbondata.hadoop.CarbonInputFormat.populateCarbonTable(CarbonInputFormat.java:147) at org.apache.carbondata.hadoop.CarbonInputFormat.getCarbonTable(CarbonInputFormat.java:124) at org.apache.carbondata.hadoop.CarbonInputFormat.getAbsoluteTableIdentifier(CarbonInputFormat.java:221) at org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:234) at org.apache.carbondata.hive.MapredCarbonInputFormat.getSplits(MapredCarbonInputFormat.java:51) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) ... 15 more {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1343) Hive can't query data when the carbon table info is store in hive metastore
[ https://issues.apache.org/jira/browse/CARBONDATA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai reassigned CARBONDATA-1343: - Assignee: cen yuhai > Hive can't query data when the carbon table info is store in hive metastore > --- > > Key: CARBONDATA-1343 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1343 > Project: CarbonData > Issue Type: Bug >Reporter: cen yuhai >Assignee: cen yuhai > > set spark.carbon.hive.schema.store=true in spark-defaults.conf > spark-shell --jars > carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs://mycluster/user/master/carbon" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > val carbon > =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, > metastoredb) > carbon.sql("create table temp.hive_carbon(id short, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv > INTO TABLE temp.hive_carbon") > start hive cli > ``` > set hive.mapred.supports.subdirectories=true; > set mapreduce.input.fileinputformat.input.dir.recursive=true; > select * from temp.hive_carbon; > {code} > 17/07/30 19:33:07 ERROR [CliDriver(1097) -- > 53ea0b98-bcf0-4b86-a167-58ce570df284 main]: Failed with exception > java.io.IOException:java.io.IOException: File does not exist: > hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema > java.io.IOException: java.io.IOException: File does not exist: > hdfs://bipcluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:521) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:428) > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:2187) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > Caused by: java.io.IOException: File does not exist: > hdfs://bipcluster/user/master/carbon/store/temp/hive_carbon/Metadata/schema > at > org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) > at > org.apache.carbondata.hadoop.CarbonInputFormat.populateCarbonTable(CarbonInputFormat.java:147) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getCarbonTable(CarbonInputFormat.java:124) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getAbsoluteTableIdentifier(CarbonInputFormat.java:221) > at > org.apache.carbondata.hadoop.CarbonInputFormat.getSplits(CarbonInputFormat.java:234) > at > org.apache.carbondata.hive.MapredCarbonInputFormat.getSplits(MapredCarbonInputFormat.java:51) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:372) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:304) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:459) > ... 15 more > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1338) Spark can not query data when 'spark.carbon.hive.schema.store' is true
[ https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1338: -- Summary: Spark can not query data when 'spark.carbon.hive.schema.store' is true (was: Can not query data when 'spark.carbon.hive.schema.store' is true) > Spark can not query data when 'spark.carbon.hive.schema.store' is true > -- > > Key: CARBONDATA-1338 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1338 > Project: CarbonData > Issue Type: Bug >Reporter: cen yuhai >Assignee: cen yuhai > Fix For: 1.2.0 > > Time Spent: 4h > Remaining Estimate: 0h > > My step is as blow: > {code} > set spark.carbon.hive.schema.store=true in spark-defaults.conf > spark-shell --jars > carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs://mycluster/user/master/carbon" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > val carbon > =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, > metastoredb) > carbon.sql("create table temp.yuhai_carbon(id short, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv > INTO TABLE temp.yuhai_carbon") > carbon.sql("select * from temp.yuhai_carbon").show > {code} > Exception: > {code} > Caused by: java.io.IOException: File does not exist: > hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema > at > org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) > > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) > > at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) > at org.apache.spark.scheduler.Task.run(Task.scala:104) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (CARBONDATA-1338) Spark can not query data when 'spark.carbon.hive.schema.store' is true
[ https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai resolved CARBONDATA-1338. --- Resolution: Fixed Fix Version/s: 1.2.0 > Spark can not query data when 'spark.carbon.hive.schema.store' is true > -- > > Key: CARBONDATA-1338 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1338 > Project: CarbonData > Issue Type: Bug >Reporter: cen yuhai >Assignee: cen yuhai > Fix For: 1.2.0 > > Time Spent: 4h > Remaining Estimate: 0h > > My step is as blow: > {code} > set spark.carbon.hive.schema.store=true in spark-defaults.conf > spark-shell --jars > carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs://mycluster/user/master/carbon" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > val carbon > =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, > metastoredb) > carbon.sql("create table temp.yuhai_carbon(id short, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv > INTO TABLE temp.yuhai_carbon") > carbon.sql("select * from temp.yuhai_carbon").show > {code} > Exception: > {code} > Caused by: java.io.IOException: File does not exist: > hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema > at > org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) > > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) > > at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) > at org.apache.spark.scheduler.Task.run(Task.scala:104) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true
[ https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1338: -- Description: My step is as blow: {code} set spark.carbon.hive.schema.store=true in spark-defaults.conf spark-shell --jars carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs://mycluster/user/master/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table temp.yuhai_carbon(id short, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO TABLE temp.yuhai_carbon") carbon.sql("select * from temp.yuhai_carbon").show {code} Exception: {code} Caused by: java.io.IOException: File does not exist: hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema at org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) at org.apache.spark.scheduler.Task.run(Task.scala:104) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} was: My step is as blow: {code} set spark.carbon.hive.schema.store=true in carbon.properties spark-shell --jars carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar --files carbon.properties import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs://mycluster/user/master/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table temp.yuhai_carbon(id short, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO TABLE temp.yuhai_carbon") carbon.sql("select * from temp.yuhai_carbon").show {code} Exception: {code} Caused by: java.io.IOException: File does not exist: hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema at org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) at org.apache.spark.scheduler.Task.run(Task.scala:104)
[jira] [Commented] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true
[ https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16106044#comment-16106044 ] cen yuhai commented on CARBONDATA-1338: --- I am working on it > Can not query data when 'spark.carbon.hive.schema.store' is true > > > Key: CARBONDATA-1338 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1338 > Project: CarbonData > Issue Type: Bug >Reporter: cen yuhai >Assignee: cen yuhai > > My step is as blow: > {code} > set spark.carbon.hive.schema.store=true in carbon.properties > spark-shell --jars > carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar > --files carbon.properties > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs://mycluster/user/master/carbon" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > val carbon > =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, > metastoredb) > carbon.sql("create table temp.yuhai_carbon(id short, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv > INTO TABLE temp.yuhai_carbon") > carbon.sql("select * from temp.yuhai_carbon").show > {code} > Exception: > {code} > Caused by: java.io.IOException: File does not exist: > hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema > at > org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) > > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) > > at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) > at org.apache.spark.scheduler.Task.run(Task.scala:104) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true
[ https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai reassigned CARBONDATA-1338: - Assignee: cen yuhai > Can not query data when 'spark.carbon.hive.schema.store' is true > > > Key: CARBONDATA-1338 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1338 > Project: CarbonData > Issue Type: Bug >Reporter: cen yuhai >Assignee: cen yuhai > > My step is as blow: > {code} > set spark.carbon.hive.schema.store=true in carbon.properties > spark-shell --jars > carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar > --files carbon.properties > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs://mycluster/user/master/carbon" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > val carbon > =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, > metastoredb) > carbon.sql("create table temp.yuhai_carbon(id short, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv > INTO TABLE temp.yuhai_carbon") > carbon.sql("select * from temp.yuhai_carbon").show > {code} > Exception: > {code} > Caused by: java.io.IOException: File does not exist: > hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema > at > org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) > > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) > > at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) > at org.apache.spark.scheduler.Task.run(Task.scala:104) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true
cen yuhai created CARBONDATA-1338: - Summary: Can not query data when 'spark.carbon.hive.schema.store' is true Key: CARBONDATA-1338 URL: https://issues.apache.org/jira/browse/CARBONDATA-1338 Project: CarbonData Issue Type: Bug Reporter: cen yuhai -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true
[ https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1338: -- Docs Text: (was: My step is as blow: {code} set spark.carbon.hive.schema.store=true in carbon.properties spark-shell --jars carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar --files carbon.properties import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs://mycluster/user/master/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table temp.yuhai_carbon(id short, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv' INTO TABLE temp.yuhai_carbon") carbon.sql("select * from temp.yuhai_carbon").show {code} Exception: {code} Caused by: java.io.IOException: File does not exist: hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema at org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) at org.apache.spark.scheduler.Task.run(Task.scala:104) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code}) > Can not query data when 'spark.carbon.hive.schema.store' is true > > > Key: CARBONDATA-1338 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1338 > Project: CarbonData > Issue Type: Bug >Reporter: cen yuhai > > My step is as blow: > {code} > set spark.carbon.hive.schema.store=true in carbon.properties > spark-shell --jars > carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar > --files carbon.properties > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs://mycluster/user/master/carbon" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > val carbon > =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, > metastoredb) > carbon.sql("create table temp.yuhai_carbon(id short, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv > INTO TABLE temp.yuhai_carbon") > carbon.sql("select * from temp.yuhai_carbon").show > {code} > Exception: > {code} > Caused by: java.io.IOException: File does not exist: > hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema > at > org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) > > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) > > at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Updated] (CARBONDATA-1338) Can not query data when 'spark.carbon.hive.schema.store' is true
[ https://issues.apache.org/jira/browse/CARBONDATA-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1338: -- Description: My step is as blow: {code} set spark.carbon.hive.schema.store=true in carbon.properties spark-shell --jars carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar --files carbon.properties import org.apache.spark.sql.SparkSession import org.apache.spark.sql.CarbonSession._ val rootPath = "hdfs://mycluster/user/master/carbon" val storeLocation = s"$rootPath/store" val warehouse = s"$rootPath/warehouse" val metastoredb = s"$rootPath/metastore_db" val carbon =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, metastoredb) carbon.sql("create table temp.yuhai_carbon(id short, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv INTO TABLE temp.yuhai_carbon") carbon.sql("select * from temp.yuhai_carbon").show {code} Exception: {code} Caused by: java.io.IOException: File does not exist: hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema at org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) at org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) at org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) at org.apache.spark.scheduler.Task.run(Task.scala:104) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} > Can not query data when 'spark.carbon.hive.schema.store' is true > > > Key: CARBONDATA-1338 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1338 > Project: CarbonData > Issue Type: Bug >Reporter: cen yuhai > > My step is as blow: > {code} > set spark.carbon.hive.schema.store=true in carbon.properties > spark-shell --jars > carbonlib/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar,carbonlib/carbondata-hive-1.2.0-SNAPSHOT.jar > --files carbon.properties > import org.apache.spark.sql.SparkSession > import org.apache.spark.sql.CarbonSession._ > val rootPath = "hdfs://mycluster/user/master/carbon" > val storeLocation = s"$rootPath/store" > val warehouse = s"$rootPath/warehouse" > val metastoredb = s"$rootPath/metastore_db" > val carbon > =SparkSession.builder().enableHiveSupport().getOrCreateCarbonSession(storeLocation, > metastoredb) > carbon.sql("create table temp.yuhai_carbon(id short, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > carbon.sql("LOAD DATA INPATH 'hdfs://mycluster/user/master/sample.csv > INTO TABLE temp.yuhai_carbon") > carbon.sql("select * from temp.yuhai_carbon").show > {code} > Exception: > {code} > Caused by: java.io.IOException: File does not exist: > hdfs://mycluster/user/master/carbon/store/temp/yuhai_carbon/Metadata/schema > at > org.apache.carbondata.hadoop.util.SchemaReader.readCarbonTableFromStore(SchemaReader.java:70) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getOrCreateCarbonTable(CarbonTableInputFormat.java:142) > > at > org.apache.carbondata.hadoop.api.CarbonTableInputFormat.getQueryModel(CarbonTableInputFormat.java:441) > > at > org.apache.carbondata.spark.rdd.CarbonScanRDD.internalCompute(CarbonScanRDD.scala:191) > > at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:50) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at
[jira] [Closed] (CARBONDATA-1031) spark-sql can't read the carbon table
[ https://issues.apache.org/jira/browse/CARBONDATA-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai closed CARBONDATA-1031. - Resolution: Cannot Reproduce > spark-sql can't read the carbon table > - > > Key: CARBONDATA-1031 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1031 > Project: CarbonData > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: cen yuhai >Assignee: anubhav tarar > > I create a carbon table by spark-shell > And then I use this command "spark-sql --jars carbon*.jar" to start > spark-sql cli. > When the first time I execute this "select * from temp.test-schema", spark > will throw exception. After I execute another command, It will be ok. > {code} > 17/05/06 21:43:12 ERROR > [org.apache.spark.sql.hive.thriftserver.SparkSQLDriver(91) -- main]: Failed > in [select * from temp.test_schema] > java.lang.AssertionError: assertion failed: No plan for > Relation[id#10,name#11,scale#12,country#13,salary#14] > CarbonDatasourceHadoopRelation(org.apache.spark.sql.SparkSession@42d9ea3b,[Ljava.lang.String;@70a0e9c6,Map(path > -> hdfs:user/hadoop/carbon/store/temp/test_schema, serialization.format > -> 1, dbname -> temp, tablepath -> > hdfs:user/hadoop/carbon/store/temp/test_schema, tablename -> > test_schema),None,ArrayBuffer()) > at scala.Predef$.assert(Predef.scala:170) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:77) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:74) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at > scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157) > at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1336) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:74) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:66) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at > org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:92) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:84) > at > org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:89) > at > org.apache.spark.sql.execution.QueryExecution.hiveResultString(QueryExecution.scala:119) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:335) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:247) > at > org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:742) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:186) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:211) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (CARBONDATA-1153) Can not add column
[ https://issues.apache.org/jira/browse/CARBONDATA-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079624#comment-16079624 ] cen yuhai commented on CARBONDATA-1153: --- I found the root cause, I don't have a carbon.properties {code} 17/07/09 19:08:13 ERROR HdfsFileLock: Executor task launch worker for task 7 Incomplete HDFS URI, no host: hdfs://mycluster../carbon.store/temp/yuhai_carbon/d91e35aa-5f13-499c-adcb-94fc20dcf8fb.lock java.io.IOException: Incomplete HDFS URI, no host: hdfs://mycluster../carbon.store/temp/yuhai_carbon/d91e35aa-5f13-499c-adcb-94fc20dcf8fb.lock {code} > Can not add column > -- > > Key: CARBONDATA-1153 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1153 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.2.0 >Reporter: cen yuhai > > Sometimes it will throws exception as below. why can't I add column? no one > are altering the table... > {code} > scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)") > 17/06/11 22:09:13 AUDIT > [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: > [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add > columns request has been received for temp.yuhai_carbon > 17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- > task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job > 17/06/11 22:10:22 ERROR > [org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: > main Alter table add columns failed :Job aborted due to stage failure: Task 0 > in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): > java.lang.RuntimeException: Dictionary file test1 is locked for updation. > Please try after some time > at scala.sys.package$.error(package.scala:27) > at > org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857) > at > org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83) > at > org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) > at org.apache.spark.scheduler.Task.run(Task.scala:104) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (CARBONDATA-1153) Can not add column
[ https://issues.apache.org/jira/browse/CARBONDATA-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1153: -- Summary: Can not add column (was: Can not add column because it is aborted) > Can not add column > -- > > Key: CARBONDATA-1153 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1153 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.2.0 >Reporter: cen yuhai > > Sometimes it will throws exception as below. why can't I add column? no one > are altering the table... > {code} > scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)") > 17/06/11 22:09:13 AUDIT > [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: > [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add > columns request has been received for temp.yuhai_carbon > 17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- > task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job > 17/06/11 22:10:22 ERROR > [org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: > main Alter table add columns failed :Job aborted due to stage failure: Task 0 > in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): > java.lang.RuntimeException: Dictionary file test1 is locked for updation. > Please try after some time > at scala.sys.package$.error(package.scala:27) > at > org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857) > at > org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83) > at > org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) > at org.apache.spark.scheduler.Task.run(Task.scala:104) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (CARBONDATA-1153) Can not add column because it is aborted
[ https://issues.apache.org/jira/browse/CARBONDATA-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai updated CARBONDATA-1153: -- Description: Sometimes it will throws exception as below. why can't I add column? no one are altering the table... {code} scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)") 17/06/11 22:09:13 AUDIT [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add columns request has been received for temp.yuhai_carbon 17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job 17/06/11 22:10:22 ERROR [org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: main Alter table add columns failed :Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): java.lang.RuntimeException: Dictionary file test1 is locked for updation. Please try after some time at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857) at org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83) at org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) at org.apache.spark.scheduler.Task.run(Task.scala:104) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} was: why can't I add column? no one are altering the table... {code} scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)") 17/06/11 22:09:13 AUDIT [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add columns request has been received for temp.yuhai_carbon 17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job 17/06/11 22:10:22 ERROR [org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: main Alter table add columns failed :Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): java.lang.RuntimeException: Dictionary file test1 is locked for updation. Please try after some time at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857) at org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83) at org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) at org.apache.spark.scheduler.Task.run(Task.scala:104) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} > Can not add column because it is aborted > > > Key: CARBONDATA-1153 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1153 > Project: CarbonData > Issue Type: Bug > Components: spark-integration >Affects Versions: 1.2.0 >Reporter: cen yuhai > > Sometimes it will throws exception as below. why can't I add column? no one > are altering the table... > {code} > scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)") > 17/06/11 22:09:13 AUDIT > [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: > [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add > columns request has been received for temp.yuhai_carbon > 17/06/11 22:10:22 ERROR
[jira] [Created] (CARBONDATA-1153) Can not add column because it is aborted
cen yuhai created CARBONDATA-1153: - Summary: Can not add column because it is aborted Key: CARBONDATA-1153 URL: https://issues.apache.org/jira/browse/CARBONDATA-1153 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.2.0 Reporter: cen yuhai why can't I add column? no one are altering the table... {code} scala> carbon.sql("alter table temp.yuhai_carbon add columns(test1 string)") 17/06/11 22:09:13 AUDIT [org.apache.spark.sql.execution.command.AlterTableAddColumns(207) -- main]: [sh-hadoop-datanode-250-104.elenet.me][master][Thread-1]Alter table add columns request has been received for temp.yuhai_carbon 17/06/11 22:10:22 ERROR [org.apache.spark.scheduler.TaskSetManager(70) -- task-result-getter-3]: Task 0 in stage 0.0 failed 4 times; aborting job 17/06/11 22:10:22 ERROR [org.apache.spark.sql.execution.command.AlterTableAddColumns(141) -- main]: main Alter table add columns failed :Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, sh-hadoop-datanode-368.elenet.me, executor 7): java.lang.RuntimeException: Dictionary file test1 is locked for updation. Please try after some time at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.util.GlobalDictionaryUtil$.loadDefaultDictionaryValueForNewColumn(GlobalDictionaryUtil.scala:857) at org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD$$anon$1.(AlterTableAddColumnRDD.scala:83) at org.apache.carbondata.spark.rdd.AlterTableAddColumnRDD.compute(AlterTableAddColumnRDD.scala:68) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:331) at org.apache.spark.rdd.RDD.iterator(RDD.scala:295) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:88) at org.apache.spark.scheduler.Task.run(Task.scala:104) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:351) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CARBONDATA-1105) ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf
[ https://issues.apache.org/jira/browse/CARBONDATA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16045876#comment-16045876 ] cen yuhai commented on CARBONDATA-1105: --- I think we should support spark2.1.1, right? spark2.1.0 is not stable > ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf > -- > > Key: CARBONDATA-1105 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1105 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 1.2.0 > Environment: spark 2.1.1 >Reporter: cen yuhai > > I think it is related to SPARK-19944 > https://github.com/apache/spark/pull/17301 > {code} > scala> carbon.sql("create table temp.test_carbon(id int, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf > at > org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127) > at > org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:593) > ... 52 elided > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.sql.catalyst.CatalystConf > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 59 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (CARBONDATA-1105) ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf
[ https://issues.apache.org/jira/browse/CARBONDATA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] cen yuhai reassigned CARBONDATA-1105: - Assignee: cen yuhai > ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf > -- > > Key: CARBONDATA-1105 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1105 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 1.2.0 > Environment: spark 2.1.1 >Reporter: cen yuhai >Assignee: cen yuhai > > I think it is related to SPARK-19944 > https://github.com/apache/spark/pull/17301 > {code} > scala> carbon.sql("create table temp.test_carbon(id int, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf > at > org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127) > at > org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:593) > ... 52 elided > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.sql.catalyst.CatalystConf > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 59 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CARBONDATA-1102) Selecting Int type in hive from carbon table is showing class cast exception
[ https://issues.apache.org/jira/browse/CARBONDATA-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16029567#comment-16029567 ] cen yuhai commented on CARBONDATA-1102: --- I will fix it in CARBON-1008 > Selecting Int type in hive from carbon table is showing class cast exception > > > Key: CARBONDATA-1102 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1102 > Project: CarbonData > Issue Type: Bug > Components: hive-integration >Affects Versions: 1.2.0 > Environment: hive,spark 2.1 >Reporter: anubhav tarar >Assignee: anubhav tarar >Priority: Trivial > > in carbon > 0: jdbc:hive2://localhost:1> CREATE TABLE ALLDATATYPETEST(ID INT,NAME > STRING,SALARY DECIMAL,MARKS DOUBLE,JOININGDATE DATE,LEAVINGDATE TIMESTAMP) > STORED BY 'CARBONDATA' ; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (3.702 seconds) > 0: jdbc:hive2://localhost:1> LOAD DATA INPATH > 'hdfs://localhost:54310/alldatatypetest.csv' into table alldatatypetest; > +-+--+ > | Result | > +-+--+ > +-+--+ > No rows selected (7.16 seconds) > 0: jdbc:hive2://localhost:1> SELECT * FROM ALLDATATYPETEST; > +-++-++--++--+ > | ID |NAME| SALARY | MARKS | JOININGDATE | LEAVINGDATE > | > +-++-++--++--+ > | 1 | 'ANUBHAV' | 20 | 100.0 | 2016-04-14 | 2016-04-14 15:00:09.0 > | > | 2 | 'LIANG'| 20 | 100.0 | 2016-04-14 | 2016-04-14 15:00:09.0 > | > +-++-++--++--+ > 2 rows selected (1.978 seconds) > in hive > hive> CREATE TABLE ALLDATATYPETEST(ID INT,NAME STRING,SALARY DECIMAL,MARKS > DOUBLE,JOININGDATE DATE,LEAVINGDATE TIMESTAMP) ROW FORMAT SERDE > 'org.apache.carbondata.hive.CarbonHiveSerDe' STORED AS INPUTFORMAT > 'org.apache.carbondata.hive.MapredCarbonInputFormat' OUTPUTFORMAT > 'org.apache.carbondata.hive.MapredCarbonOutputFormat' TBLPROPERTIES > ('spark.sql.sources.provider'='org.apache.spark.sql.CarbonSource'); > OK > Time taken: 1.934 seconds > hive> ALTER TABLE ALLDATATYPETEST SET LOCATION > 'hdfs://localhost:54310/opt/carbonStore/default/alldatatypetest'; > OK > Time taken: 1.192 seconds > hive> SELECT * FROM ALLDATATYPETEST; > OK > Failed with exception java.io.IOException:java.lang.ClassCastException: > java.lang.Integer cannot be cast to java.lang.Long > Time taken: 0.174 seconds -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (CARBONDATA-1105) ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf
[ https://issues.apache.org/jira/browse/CARBONDATA-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16028810#comment-16028810 ] cen yuhai commented on CARBONDATA-1105: --- we should rebuild carbon with spark2.1.1. mvn clean package -Dspark.version=2.1.1 -Pspark-2.1 -DskipTests > ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf > -- > > Key: CARBONDATA-1105 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1105 > Project: CarbonData > Issue Type: Bug > Components: core >Affects Versions: 1.2.0 > Environment: spark 2.1.1 >Reporter: cen yuhai > > I think it is related to SPARK-19944 > https://github.com/apache/spark/pull/17301 > {code} > scala> carbon.sql("create table temp.test_carbon(id int, name string, scale > decimal, country string, salary double) STORED BY 'carbondata'") > java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf > at > org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127) > at > org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126) > at > org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) > at > org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) > at > org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) > at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) > at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:593) > ... 52 elided > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.sql.catalyst.CatalystConf > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > ... 59 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-1105) ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf
cen yuhai created CARBONDATA-1105: - Summary: ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf Key: CARBONDATA-1105 URL: https://issues.apache.org/jira/browse/CARBONDATA-1105 Project: CarbonData Issue Type: Bug Components: core Affects Versions: 1.2.0 Environment: spark 2.1.1 Reporter: cen yuhai I think it is related to SPARK-19944 https://github.com/apache/spark/pull/17301 {code} scala> carbon.sql("create table temp.test_carbon(id int, name string, scale decimal, country string, salary double) STORED BY 'carbondata'") java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/CatalystConf at org.apache.spark.sql.hive.CarbonSessionState.analyzer$lzycompute(CarbonSessionState.scala:127) at org.apache.spark.sql.hive.CarbonSessionState.analyzer(CarbonSessionState.scala:126) at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:69) at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:67) at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:50) at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:63) at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:593) ... 52 elided Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.catalyst.CatalystConf at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 59 more {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)