Hi,
I have compiled the latest version of CarbonData which is compatible with HDP2.6. I’m doing the following steps but the data are never copied to the table. Start Spark Shell: /home/ubuntu/carbondata# spark-shell --jars /home/ubuntu/carbondata/ carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.2.jar Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.1.0.2.6.0.3-8 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_121) Type in expressions to have them evaluated. Type :help for more information. scala> import org.apache.spark.sql.SparkSession import org.apache.spark.sql.SparkSession scala> import org.apache.spark.sql.CarbonSession._ import org.apache.spark.sql.CarbonSession._ scala> val carbon = SparkSession.builder().config(sc.getConf). getOrCreateCarbonSession("/test/carbondata/","/test/carbondata/") 17/07/26 14:58:42 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect. 17/07/26 14:58:42 WARN CarbonProperties: main The enable unsafe sort value "null" is invalid. Using the default value "false 17/07/26 14:58:42 WARN CarbonProperties: main The custom block distribution value "null" is invalid. Using the default value "false 17/07/26 14:58:42 WARN CarbonProperties: main The enable vector reader value "null" is invalid. Using the default value "true 17/07/26 14:58:42 WARN CarbonProperties: main The value "null" configured for key carbon.lock.type" is invalid. Using the default value "HDFSLOCK carbon: org.apache.spark.sql.SparkSession = org.apache.spark.sql. CarbonSession@5f7bd970 scala> carbon.sql("CREATE TABLE IF NOT EXISTS test_carbon(id string, name string, city string,age Int) STORED BY 'carbondata'") 17/07/26 15:04:35 AUDIT CreateTable: [gateway-dc1r04n01][hdfs][Thread-1]Creating Table with Database name [default] and Table name [test_carbon] 17/07/26 15:04:36 WARN HiveExternalCatalog: Couldn't find corresponding Hive SerDe for data source provider org.apache.spark.sql.CarbonSource. Persisting data source table `default`.`test_carbon` into Hive metastore in Spark SQL specific format, which is NOT compatible with Hive. 17/07/26 15:04:36 AUDIT CreateTable: [gateway-dc1][hdfs][Thread-1]Table created with Database name [default] and Table name [test_carbon] res7: org.apache.spark.sql.DataFrame = [] scala> carbon.sql("describe test_carbon").show() +--------+---------+-------+ |col_name|data_type|comment| +--------+---------+-------+ | id| string| null| | name| string| null| | city| string| null| | age| int| null| +--------+---------+-------+ scala> carbon.sql("INSERT INTO test_carbon VALUES(1,'x1','x2',34)") 17/07/26 15:07:25 AUDIT CarbonDataRDDFactory$: [gateway-dc1][hdfs][Thread-1]Data load request has been received for table default.test_carbon 17/07/26 15:07:25 WARN CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT 17/07/26 15:07:25 WARN CarbonDataProcessorUtil: Executor task launch worker for task 5 sort scope is set to LOCAL_SORT 17/07/26 15:07:25 WARN CarbonDataProcessorUtil: Executor task launch worker for task 5 batch sort size is set to 0 17/07/26 15:07:25 WARN CarbonDataProcessorUtil: Executor task launch worker for task 5 sort scope is set to LOCAL_SORT 17/07/26 15:07:25 WARN CarbonDataProcessorUtil: Executor task launch worker for task 5 sort scope is set to LOCAL_SORT 17/07/26 15:07:25 AUDIT CarbonDataRDDFactory$: [gateway-dc1r04n01][hdfs][Thread-1]Data load is successful for default.test_carbon res11: org.apache.spark.sql.DataFrame = [] scala> carbon.sql("LOAD DATA INPATH 'hdfs://xxxx/test/carbondata/sample.csv' INTO TABLE test_carbon") 17/07/26 14:59:28 AUDIT CarbonDataRDDFactory$: [gateway-dc1][hdfs][Thread-1]Data load request has been received for table default.test_table 17/07/26 14:59:28 WARN CarbonDataProcessorUtil: main sort scope is set to LOCAL_SORT 17/07/26 14:59:28 WARN CarbonDataProcessorUtil: [Executor task launch worker for task 0][partitionID:default_test_table_8662d5ff-9392-4e23-b37e-9a4485f71f0e] sort scope is set to LOCAL_SORT 17/07/26 14:59:28 WARN CarbonDataProcessorUtil: [Executor task launch worker for task 0][partitionID:default_test_table_8662d5ff-9392-4e23-b37e-9a4485f71f0e] batch sort size is set to 0 17/07/26 14:59:28 WARN CarbonDataProcessorUtil: [Executor task launch worker for task 0][partitionID:default_test_table_8662d5ff-9392-4e23-b37e-9a4485f71f0e] sort scope is set to LOCAL_SORT 17/07/26 14:59:28 WARN CarbonDataProcessorUtil: [Executor task launch worker for task 0][partitionID:default_test_table_8662d5ff-9392-4e23-b37e-9a4485f71f0e] sort scope is set to LOCAL_SORT 17/07/26 14:59:29 AUDIT CarbonDataRDDFactory$: [gateway-dc1][hdfs][Thread-1]Data load is successful for default.test_table res1: org.apache.spark.sql.DataFrame = [] scala> carbon.sql("Select * from test_carbon").show() java.io.FileNotFoundException: File /test/carbondata/default/test_table/Fact/Part0/Segment_0 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>( DistributedFileSystem.java:1081) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.<init>( DistributedFileSystem.java:1059) at org.apache.hadoop.hdfs.DistributedFileSystem$23. doCall(DistributedFileSystem.java:1004) at org.apache.hadoop.hdfs.DistributedFileSystem$23. doCall(DistributedFileSystem.java:1000) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve( FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listLocatedStatus( DistributedFileSystem.java:1000) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1735) at org.apache.carbondata.hadoop.CarbonInputFormat.getFileStatusInternal( CarbonInputFormat.java:862) at org.apache.carbondata.hadoop.CarbonInputFormat.getFileStatus( CarbonInputFormat.java:845) at org.apache.carbondata.hadoop.CarbonInputFormat.listStatus( CarbonInputFormat.java:802) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. getSplits(FileInputFormat.java:387) at org.apache.carbondata.hadoop.CarbonInputFormat.getSplitsInternal( CarbonInputFormat.java:319) at org.apache.carbondata.hadoop.CarbonInputFormat.getTableBlockInfo( CarbonInputFormat.java:523) at org.apache.carbondata.hadoop.CarbonInputFormat. getSegmentAbstractIndexs(CarbonInputFormat.java:616) at org.apache.carbondata.hadoop.CarbonInputFormat.getDataBlocksOfSegment( CarbonInputFormat.java:441) at org.apache.carbondata.hadoop.CarbonInputFormat.getSplits( CarbonInputFormat.java:379) at org.apache.carbondata.hadoop.CarbonInputFormat.getSplits( CarbonInputFormat.java:302) at org.apache.carbondata.spark.rdd.CarbonScanRDD. getPartitions(CarbonScanRDD.scala:81) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions( MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala: 311) at org.apache.spark.sql.execution.CollectLimitExec. executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$ Dataset$$execute$1$1.apply(Dataset.scala:2378) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId( SQLExecution.scala:57) at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2780) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$ execute$1(Dataset.scala:2377) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$ collect(Dataset.scala:2384) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2120) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2119) at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2810) at org.apache.spark.sql.Dataset.head(Dataset.scala:2119) at org.apache.spark.sql.Dataset.take(Dataset.scala:2334) at org.apache.spark.sql.Dataset.showString(Dataset.scala:248) at org.apache.spark.sql.Dataset.show(Dataset.scala:638) at org.apache.spark.sql.Dataset.show(Dataset.scala:597) at org.apache.spark.sql.Dataset.show(Dataset.scala:606) ... 50 elided I have check the folder on HDFS and there is a structure /test/carbondata/default/test_carbon/ but the folder is empty. I’m pretty sure that I’m missing silly, but I have not been able to find a way to insert data in the table. On another subject, I’m trying to also access this through presto, but here the error is always: Query 20170726_145207_00005_ytsnk failed: line 1:1: Schema 'default' does not exist I’m also a little bit lost as from Spark it seems that the table are created in the hive metastore, but the Presto plugin doesn’t seem to refer to it. Thanks for reading! AG