discussion about benchmark standard that carbondata used
Hi all, Benchmark test can measure the performance of a system. Although carbondata is a data-store, maybe it's better to have a benchmark test use some universal benchmark standard such as TPC-DS. so, which benchmark standard does carbondata use?
Re: Problem while copying file from local store to carbon store
thx I've solved the problem, here is my record: first, I found the spark job failed when loading data and there is an error "CarbonDataWriterException: Problem while copying file from local store to carbon store", when located to the source code at ./processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter, it shows: private void copyCarbonDataFileToCarbonStorePath(String localFileName) throws CarbonDataWriterException { long copyStartTime = System.currentTimeMillis(); LOGGER.info("Copying " + localFileName + " --> " + dataWriterVo.getCarbonDataDirectoryPath()); try { CarbonFile localCarbonFile = FileFactory.getCarbonFile(localFileName, FileFactory.getFileType(localFileName)); String carbonFilePath = dataWriterVo.getCarbonDataDirectoryPath() + localFileName .substring(localFileName.lastIndexOf(File.separator)); copyLocalFileToCarbonStore(carbonFilePath, localFileName, CarbonCommonConstants.BYTEBUFFER_SIZE, getMaxOfBlockAndFileSize(fileSizeInBytes, localCarbonFile.getSize())); } catch (IOException e) { throw new CarbonDataWriterException( "Problem while copying file from local store to carbon store"); } LOGGER.info( "Total copy time (ms) to copy file " + localFileName + " is " + (System.currentTimeMillis() - copyStartTime)); } the main reason is that the method copyLocalFileToCarbonStore cause an IOException, but the catch block doesn't tell me what is the real reason that coused the error(at this moment, I really like technical logs more then business logs). so I add a line of code: ... catch (IOException e) { LOGGER.info("---logs print by liyinwei start-"); LOGGER.error(e, ""); LOGGER.info("---logs print by liyinwei end -"); throw new CarbonDataWriterException( "Problem while copying file from local store to carbon store"); then I rebuild the source code and it logs as follows: INFO 10-01 10:29:59,546 - [test_table: Graph - MDKeyGentest_table][partitionID:0] ---logs print by liyinwei start- ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] java.io.FileNotFoundException: /home/hadoop/carbondata/bin/carbonshellstore/default/test_table/Fact/Part0/Segment_0/part-0-0-1484015398000.carbondata (No such file or directory) at java.io.FileOutputStream.open0(Native Method) ... INFO 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] ---logs print by liyinwei end - ERROR 10-01 10:29:59,547 - [test_table: Graph - MDKeyGentest_table][partitionID:0] Problem while copying file from local store to carbon store second, as u see, the main reason that cause the error is a FileNotFoundException, which means the metadata is not found. with the help of Liang Chen & Brave heart, I found that the default of carbondata storePath is as below if we start the spark-shell by using carbon-spark-shell: scala> print(cc.storePath) /home/hadoop/carbondata/bin/carbonshellstore so I added a parameter when starting carbon-spark-shell: ./bin/carbon-spark-shell --conf spark.carbon.storepath=hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore and then print the storePath: scala> print(cc.storePath) hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore finally, I run the command cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv' into table test_table") again and it success, which follows: cc.sql("select * from test_table").show -- Original -- From: "Liang Chen";; Date: Tue, Jan 10, 2017 12:11 PM To: "dev" ; Subject: Re: Problem while copying file from local store to carbon store Hi Please use spark-shell to create carboncontext, you can refer to these articles : https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635497 Regards Liang -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/load-data-error-from-csv-file-at-hdfs-error-in-standalone-spark-cluster-tp5783p5844.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Problem while copying file from local store to carbon store
Hi all?? when I load data from hdfs to a table: cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv' into table test_table") two errors occured, at slave1: INFO 09-01 16:17:58,611 - test_table: Graph - CSV Input *Started all csv reading*** INFO 09-01 16:17:58,611 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] *started csv reading by thread*** INFO 09-01 16:17:58,635 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Total Number of records processed by this thread is: 3 INFO 09-01 16:17:58,635 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Time taken to processed 3 Number of records: 24 INFO 09-01 16:17:58,636 - [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] *Completed csv reading by thread*** INFO 09-01 16:17:58,636 - test_table: Graph - CSV Input *Completed all csv reading*** INFO 09-01 16:17:58,642 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Column cache size not configured. Therefore default behavior will be considered and no LRU based eviction of columns will be done ERROR 09-01 16:17:58,645 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] org.apache.carbondata.core.util.CarbonUtilException: Either dictionary or its metadata does not exist for column identifier :: ColumnIdentifier [columnId=c70480f9-4336-4186-8bd0-a3bebb50ea6a] ERROR 09-01 16:17:58,646 - [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] org.pentaho.di.core.exception.KettleException: org.apache.carbondata.core.util.CarbonUtilException: Either dictionary or its metadata does not exist for column identifier :: ColumnIdentifier [columnId=c70480f9-4336-4186-8bd0-a3bebb50ea6a]at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.initDictionaryCacheInfo(FileStoreSurrogateKeyGenForCSV.java:297) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.populateCache(FileStoreSurrogateKeyGenForCSV.java:270) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.(FileStoreSurrogateKeyGenForCSV.java:144) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.processRow(CarbonCSVBasedSeqGenStep.java:385) at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50) at java.lang.Thread.run(Thread.java:745) INFO 09-01 16:17:58,647 - [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Record Procerssed For table: test_table INFO 09-01 16:17:58,647 - [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Summary: Carbon Slice Merger Step: Read: 0: Write: 0 INFO 09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Record Processed For table: test_table INFO 09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Number of Records was Zero INFO 09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Summary: Carbon Sort Key Step: Read: 0: Write: 0 INFO 09-01 16:17:58,747 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] Graph execution is finished. ERROR 09-01 16:17:58,748 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] Graph Execution had errors INFO 09-01 16:17:58,749 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] Deleted the local store location/tmp/259202084415620/0 INFO 09-01 16:17:58,749 - DataLoad complete INFO 09-01 16:17:58,749 - Data Loaded successfully with LoadCount:0 INFO 09-01 16:17:58,749 - DataLoad failure ERROR 09-01 16:17:58,749 - [Executor task launch worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at
load data error from csv file at hdfs error in standalone spark cluster
Hi all, when I load data from hdfs csv file, a stage of spark job failed with the following error, where can I find a more detail error that can help me find the solution, or may some one know why this happen and how to solve it. command: cc.sql(s"load data inpath 'hdfs://master:9000/opt/sample.csv' into table test_table") error log: Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. at org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212) at org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144) at org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212) at org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.(CarbonDataLoadRDD.scala:255) at org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Driver stacktrace:
ERROR csvbased.CarbonCSVBasedSeqGenStep: [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0]
Hi all, when I load csv file to a table, it accured an error in spark jobs: Version & Environment?? Spark1.6.0 + Lastest version of Carbondata at github + cluster mode commands: cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'") cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/sample.csv' into table test_table") CSV file data: cat > sample.csv << EOF id,name,city,age 1,david,shenzhen,31 2,eason,shenzhen,27 3,jarry,wuhan,35 EOF Error Description: collect at CarbonDataRDDFactory.scala:623 Failure Reason: Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most recent failure: Lost task 0.3 in stage 3.0 (TID 8, slave3): org.apache.carbondata.processing.etl.DataLoadingException: Due to internal errors, please check logs for more details. Spark Worker Log?? 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: test_table: Graph - CSV Input *Started all csv reading*** 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] *started csv reading by thread*** 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Total Number of records processed by this thread is: 3 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Time taken to processed 3 Number of records: 15 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: [pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] *Completed csv reading by thread*** 16/12/28 14:18:40 INFO csvreaderstep.CsvInput: test_table: Graph - CSV Input *Completed all csv reading*** 16/12/28 14:18:40 INFO cache.CarbonLRUCache: [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] Column cache size not configured. Therefore default behavior will be considered and no LRU based eviction of columns will be done 16/12/28 14:18:40 ERROR csvbased.CarbonCSVBasedSeqGenStep: [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0] java.lang.RuntimeException: java.lang.NullPointerException at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.process(CarbonCSVBasedSeqGenStep.java:940) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.processRow(CarbonCSVBasedSeqGenStep.java:515) at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.carbondata.core.cache.dictionary.ColumnReverseDictionaryInfo.getSurrogateKey(ColumnReverseDictionaryInfo.java:73) at org.apache.carbondata.core.cache.dictionary.AbstractColumnDictionaryInfo.getSurrogateKey(AbstractColumnDictionaryInfo.java:289) at org.apache.carbondata.core.cache.dictionary.ReverseDictionary.getSurrogateKey(ReverseDictionary.java:50) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedDimSurrogateKeyGen.generateSurrogateKeys(CarbonCSVBasedDimSurrogateKeyGen.java:150) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.populateOutputRow(CarbonCSVBasedSeqGenStep.java:1233) at org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.process(CarbonCSVBasedSeqGenStep.java:929) ... 3 more 16/12/28 14:18:40 INFO sortdatastep.SortKeyStep: [test_table: Graph - Sort Key: Sort keystest_table][partitionID:0] Record Processed For table: test_table 16/12/28 14:18:40 INFO step.CarbonSliceMergerStep: [test_table: Graph - Carbon Slice Mergertest_table][partitionID:table] Record Procerssed For table: test_table is anyone has any idea? thx~
Re: ?????? Dictionary file is locked for updation
thx QiangCai, the problem is solved. so, maybe it's better to correct the document at https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide, change the value of spark.executor.extraJavaOptions from -Dcarbon.properties.filepath=carbon.properties to -Dcarbon.properties.filepath="/conf/carbon.properties -- Original -- From: "QiangCai";; Date: Tue, Dec 27, 2016 05:40 PM To: "dev" ; Subject: Re: ?? Dictionary file is locked for updation please correct the path of carbon.properties file. spark.executor.extraJavaOptions -Dcarbon.properties.filepath=carbon.properties -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-tp5076p5092.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: ?????? Dictionary file is locked for updation
I'm sorry, carbon.storelocation has been configured in my cluster. I didn't copy it. the configuration is: carbon.storelocation=hdfs://master:9000/carbondata -- Original -- From: "QiangCai";; Date: Tue, Dec 27, 2016 05:29 PM To: "dev" ; Subject: Re: ?? Dictionary file is locked for updation Please try to add carbon.storelocation to carbon.properties file. e.g. carbon.storelocation=hdfs://master:9000/carbondata/store You can have a look the following guide and pay attention to carbon.properties file. https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-tp5076p5090.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
?????? Dictionary file is locked for updation
I'm using spark 1.6.0, and the carbondata is the latest master branch in the github. my carbon.properties is configured as : carbon.ddl.base.hdfs.url=hdfs://master:9000/carbondata/data carbon.badRecords.location=/opt/Carbon/Spark/badrecords carbon.kettle.home=/opt/spark-1.6.0/carbonlib/carbonplugins carbon.lock.type=HDFSLOCK my spark-default.conf is configured as : spark.masterspark://master:7077 spark.yarn.dist.files /opt/spark-1.6.0/conf/carbon.properties spark.yarn.dist.archives /opt/spark-1.6.0/carbonlib/carbondata_2.10-1.0.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar spark.executor.extraJavaOptions -Dcarbon.properties.filepath=carbon.properties #spark.executor.extraClassPath /opt/spark-1.6.0/carbonlib/carbondata_2.10-1.0.0-incubating-SNAPSHOT-shade-hadoop2.2.0.jar #spark.driver.extraClassPath /opt/spark-1.6.0/carbonlib/carbondata_2.10-1.0.0-incubating-SNAPSHOT-shade-hadoop2.2.0.jar spark.driver.extraJavaOptions -Dcarbon.properties.filepath=/opt/spark-1.6.0/conf/carbon.properties carbon.kettle.home /opt/spark-1.6.0/carbonlib/carbonplugins -- -- ??: "Ravindra Pesala";<ravi.pes...@gmail.com>; : 2016??12??27??(??) 4:15 ??: "dev"<dev@carbondata.incubator.apache.org>; : Re: Dictionary file is locked for updation Hi, It seems the store path location is taking default location. Did you set the store location properly? Which spark version you are using? Regards, Ravindra On Tue, Dec 27, 2016, 1:38 PM 251469031 <251469...@qq.com> wrote: > Hi Kumar, > > > thx to your repley, the full logs is as follows: > > > 16/12/27 12:30:17 INFO locks.HdfsFileLock: Executor task launch worker-0 > HDFS lock > path:hdfs://master:9000../carbon.store/default/test_table/2e9b7efa-2934-463a-9280-ff50c5129268.lock > 16/12/27 12:30:17 INFO storage.ShuffleBlockFetcherIterator: Getting 1 > non-empty blocks out of 1 blocks > 16/12/27 12:30:17 INFO storage.ShuffleBlockFetcherIterator: Started 1 > remote fetches in 1 ms > 16/12/27 12:30:32 ERROR rdd.CarbonGlobalDictionaryGenerateRDD: Executor > task launch worker-0 > java.lang.RuntimeException: Dictionary file name is locked for updation. > Please try after some time > at scala.sys.package$.error(package.scala:27) > at > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:364) > at > org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:302) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) > at > org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > > > > as u see, the lock file path > is:hdfs://master:9000../carbon.store/default/test_table/2e9b7efa-2934-463a-9280-ff50c5129268.lock > > > > > -- -- > ??: "Kumar Vishal";<kumarvishal1...@gmail.com>; > : 2016??12??27??(??) 3:25 > ??: "dev"<dev@carbondata.incubator.apache.org>; > > : Re: Dictionary file is locked for updation > > > > Hi, > can you please find *"HDFS lock path"* string in executor log and let me > know the complete log message. > > -Regards > Kumar Vishal > > On Tue, Dec 27, 2016 at 12:45 PM, 251469031 <251469...@qq.com> wrote: > > > Hi all, > > > > > > when I run the following script: > > scala> cc.sql(s"load data inpath > 'hdfs://master:9000/carbondata/sample.csv' > > into table test_table") > > > > > > it turns out that: > > WARN 27-12 12:37:58,044 - Lost task 1.3 in stage 2.0 (TID 13, slave1): > > java.lang.RuntimeException: Dictionary file name is locked for updation. > > Please try after some time > > > > > > what I have done are: > > 1.in carbon.properties, set carbon.lock.type=HDFSLOCK > > 2.send carbon.properties & spark-defaults.conf to all nodes of the > clusters > > > > > > if any of you have any idea, looking forward to your replay, thx~
?????? Dictionary file is locked for updation
Hi Kumar, thx to your repley, the full logs is as follows: 16/12/27 12:30:17 INFO locks.HdfsFileLock: Executor task launch worker-0 HDFS lock path:hdfs://master:9000../carbon.store/default/test_table/2e9b7efa-2934-463a-9280-ff50c5129268.lock 16/12/27 12:30:17 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty blocks out of 1 blocks 16/12/27 12:30:17 INFO storage.ShuffleBlockFetcherIterator: Started 1 remote fetches in 1 ms 16/12/27 12:30:32 ERROR rdd.CarbonGlobalDictionaryGenerateRDD: Executor task launch worker-0 java.lang.RuntimeException: Dictionary file name is locked for updation. Please try after some time at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:364) at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:302) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) as u see, the lock file path is:hdfs://master:9000../carbon.store/default/test_table/2e9b7efa-2934-463a-9280-ff50c5129268.lock -- -- ??: "Kumar Vishal";<kumarvishal1...@gmail.com>; : 2016??12??27??(??) 3:25 ??: "dev"<dev@carbondata.incubator.apache.org>; : Re: Dictionary file is locked for updation Hi, can you please find *"HDFS lock path"* string in executor log and let me know the complete log message. -Regards Kumar Vishal On Tue, Dec 27, 2016 at 12:45 PM, 251469031 <251469...@qq.com> wrote: > Hi all, > > > when I run the following script: > scala> cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/sample.csv' > into table test_table") > > > it turns out that: > WARN 27-12 12:37:58,044 - Lost task 1.3 in stage 2.0 (TID 13, slave1): > java.lang.RuntimeException: Dictionary file name is locked for updation. > Please try after some time > > > what I have done are: > 1.in carbon.properties, set carbon.lock.type=HDFSLOCK > 2.send carbon.properties & spark-defaults.conf to all nodes of the clusters > > > if any of you have any idea, looking forward to your replay, thx~
Dictionary file is locked for updation
Hi all, when I run the following script: scala> cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/sample.csv' into table test_table") it turns out that: WARN 27-12 12:37:58,044 - Lost task 1.3 in stage 2.0 (TID 13, slave1): java.lang.RuntimeException: Dictionary file name is locked for updation. Please try after some time what I have done are: 1.in carbon.properties, set carbon.lock.type=HDFSLOCK 2.send carbon.properties & spark-defaults.conf to all nodes of the clusters if any of you have any idea, looking forward to your replay, thx~
the storepath in carbon.properties seems not work
Hi all: I'm now configing carbondata in cluster mode, and some configurations in the file carbon.properties are as bellow: carbon.storelocation=hdfs://master:9000/carbondata carbon.ddl.base.hdfs.url=hdfs://master:9000/carbondata/data carbon.kettle.home=/opt/spark-1.6.0/carbonlib/carbonplugins but when I create a table using the command: cc.sql("create table if not exists test_table (id string, name string, city string, age Int) STORED BY 'carbondata'") the output in the spark shell says the tablePath is a local one: tablePath=/home/hadoop/carbondata/bin/carbonshellstore/default/test_table and the storePath is also shown as : scala> print(cc.storePath) /home/hadoop/carbondata/bin/carbonshellstore the file carbon.properties has been sent to all the nodes in the cluster. I doult where can I modify the config, looking forward to your help, thx~
?????? etl.DataLoadingException: The input file does not exist
Oh I see, I've solved it, thx very much to Manish & QiangCai~~ here is my dml script: cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/pt/sample.csv' into table test_table") -- -- ??: "manish gupta";<tomanishgupt...@gmail.com>; : 2016??12??23??(??) 2:32 ??: "dev"<dev@carbondata.incubator.apache.org>; : Re: ?? etl.DataLoadingException: The input file does not exist Hi 251469031, Thanks for showing interest in carbon. For your question please refer the explanation below. scala> val dataFilePath = new File("hdfs://master:9000/ carbondata/sample.csv").getCanonicalPath dataFilePath: String = /home/hadoop/carbondata/hdfs:/ master:9000/carbondata/sample.csv If you use new File, it will always return the pointer for path from local file system. So Incase you are not appending hdfs url to the file/folder path in the Load data DDL command, you can configure *carbon.ddl.base.hdfs.url* in carbon.properties file as suggested by QiangCai. *carbon.ddl.base.hdfs.url=hdfs://:* example *carbon.ddl.base.hdfs.url=hdfs://9.82.101.42:54310 <http://9.82.101.42:54310>* Regards Manish Gupta On Fri, Dec 23, 2016 at 10:09 AM, QiangCai <qiang...@qq.com> wrote: > Please find the following item in carbon.properties file, give a proper > path(hdfs://master:9000/) > carbon.ddl.base.hdfs.url > > During loading, will combine this url and data file path. > > BTW, better to provide the version number. > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The- > input-file-does-not-exist-tp4853p4888.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. >
?????? etl.DataLoadingException: The input file does not exist
Well, In the source code of carbondata, the filetype is determined as : if (property.startsWith(CarbonUtil.HDFS_PREFIX)) { storeDefaultFileType = FileType.HDFS; } and CarbonUtil.HDFS_PREFIX="hdfs://" but when I run the following script, the dataFilePath is still local: scala> val dataFilePath = new File("hdfs://master:9000/carbondata/sample.csv").getCanonicalPath dataFilePath: String = /home/hadoop/carbondata/hdfs:/master:9000/carbondata/sample.csv -- -- ??: "Liang Chen";; : 2016??12??22??(??) 8:47 ??: "dev" ; : Re: etl.DataLoadingException: The input file does not exist Hi This is because that you use cluster mode, but the input file is local file. 1.If you use cluster mode, please load hadoop files 2.If you just want to load local files, please use local mode. ?? wrote > Hi, > > when i run the following script: > > > scala>val dataFilePath = new > File("/carbondata/pt/sample.csv").getCanonicalPath > scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table") > > > is turns out: > > > org.apache.carbondata.processing.etl.DataLoadingException: The input file > does not exist: > hdfs://master:9000hdfs://master/opt/data/carbondata/pt/sample.csv > at > org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) > > > It confused me that why there is a string "hdfs://master:9000" before > "hdfs://master/opt/data/carbondata/pt/sample.csv", I can't found some > configuration that contains "hdfs://master:9000", could any one help me~ -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The-input-file-does-not-exist-tp4853p4854.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
etl.DataLoadingException: The input file does not exist
Hi, when i run the following script: scala>val dataFilePath = new File("/carbondata/pt/sample.csv").getCanonicalPath scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table") is turns out: org.apache.carbondata.processing.etl.DataLoadingException: The input file does not exist: hdfs://master:9000hdfs://master/opt/data/carbondata/pt/sample.csv at org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) It confused me that why there is a string "hdfs://master:9000" before "hdfs://master/opt/data/carbondata/pt/sample.csv", I can't found some configuration that contains "hdfs://master:9000", could any one help me~
?????? InvalidInputException when loading data to table
OK, thx~ It's a local path, well, in the error log, it shows that the dataFilePath is set to /home/hadoop/carbondata/sample.csv, and it is where my test file located. @see the log: Input path does not exist: /home/hadoop/carbondata/sample.csv in the following command, is the package of class File is java.io.File? scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath -- -- ??: "Liang Chen";<chenliang6...@gmail.com>; : 2016??12??20??(??) 8:35 ??: "dev"<dev@carbondata.incubator.apache.org>; : Re: InvalidInputException when loading data to table Hi 1.Your input path is hadoop or local ? Please double check your input path if it is correct. 2.As a new starter, suggest you use IntellijIDEA to open CarbonData , and run all examples. Regards Liang -- org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: /home/hadoop/carbondata/sample.csv at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. listStatus(FileInputFormat.java:285) at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. getSplits(FileInputFormat.java:340) at org.apache.spark.rdd.NewHadoopRDD.getPartitions( NewHadoopRDD.scala:113) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( RDD.scala:239) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( RDD.scala:237) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) 2016-12-19 20:24 GMT+08:00 251469031 <251469...@qq.com>: > Hi all, > > I'm now learning how to getting started with carbondata according to > the tutorial: https://cwiki.apache.org/confluence/display/CARBONDATA/ > Quick+Start. > > > I created a file named sample.csv under the path > /home/hadoop/carbondata at the master node, and when I run the script: > > > scala>val dataFilePath = new File("../carbondata/sample. > csv").getCanonicalPath > scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table") > > > it turns out a "InvalidInputException" while the file is acctually exist, > here is the scripts and logs: > > > scala> val dataFilePath = new File("../carbondata/sample. > csv").getCanonicalPath > dataFilePath: String = /home/hadoop/carbondata/sample.csv > > > scala> cc.sql(s"load data inpath '$dataFilePath' into table test_table") > INFO 19-12 20:18:22,991 - main Query [LOAD DATA INPATH > '/HOME/HADOOP/CARBONDATA/SAMPLE.CSV' INTO TABLE TEST_TABLE] > INFO 19-12 20:18:23,271 - Successfully able to get the table metadata > file lock > INFO 19-12 20:18:23,276 - main Initiating Direct Load for the Table : > (default.test_table) > INFO 19-12 20:18:23,279 - main Generate global dictionary from source > data files! > INFO 19-12 20:18:23,296 - main [Block Distribution] > INFO 19-12 20:18:23,297 - main totalInputSpaceConsumed: 74 , > defaultParallelism: 28 > INFO 19-12 20:18:23,297 - main mapreduce.input.fileinputformat.split.maxsize: > 16777216 > INFO 19-12 20:18:23,380 - Block broadcast_0 stored as values in memory > (estimated size 137.1 KB, free 137.1 KB) > INFO 19-12 20:18:23,397 - Block broadcast_0_piece0 stored as bytes in > memory (estimated size 15.0 KB, free 152.1 KB) > INFO 19-12 20:18:23,398 - Added broadcast_0_piece0 in memory on > 172.17.195.12:46335 (size: 15.0 KB, free: 511.1 MB) > INFO 19-12 20:18:23,399 - Created broadcast 0 from NewHadoopRDD at > CarbonTextFile.scala:73 > ERROR 19-12 20:18:23,431 - main generate global dictionary failed > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path > does not exist: /home/hadoop/carbondata/sample.csv > at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. > listStatus(FileInputFormat.java:285) > at org.apache.hadoop.mapreduce.lib.input.FileInputFormat. > getSplits(FileInputFormat.java:340) > at org.apache.spark.rdd.NewHadoopRDD.getPartitions( > NewHadoopRDD.scala:113) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( > RDD.scala:239) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply( > RDD.scala:237) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:237) > ... > > > If any of you have met the same problem, would you tell me why this > happen, looking forward to your replay, thx~ -- Regards Liang
?????? How to compile the latest source code of carbondata
Thx liang. I solve the problem. In the file carbon-spark-shell, FWDIR was set to $SPARK_HOME. I have configure the $SPARK_HOME in /etc/profile and the output of command "echo $SPARK_HOME" is correct, which mean the $SPARK_HOME has been set. But if I don't run the command "export $SPARK_HOME=" before running the command "./bin/carbon-spark-shell", the variable FWDIR can't be set. I doult why it is. -- -- ??: "";<251469...@qq.com>; : 2016??12??19??(??) 4:02 ??: "dev"<dev@carbondata.incubator.apache.org>; : ?? How to compile the latest source code of carbondata I can visit spark web-ui http://master:8080/ if there are any other environment that I should config. -- -- ??: "Liang Chen";<chenliang6...@gmail.com>; : 2016??12??19??(??) 3:40 ??: "dev"<dev@carbondata.incubator.apache.org>; : Re: How to compile the latest source code of carbondata Hi Please check your spark environment if it is ready ? 2016-12-19 15:34 GMT+08:00 251469031 <251469...@qq.com>: > the privileges of the folder "carbondata" is: > > > drwxr-xr-x 18 hadoop hadoop 4096 Dec 19 14:56 carbondata > > > and hadoop is the user who run maven. > > > well, after run mvn command, I get the info from console as follows: > > > [INFO] Reactor Summary: > [INFO] > [INFO] Apache CarbonData :: Parent SUCCESS [ > 1.012 s] > [INFO] Apache CarbonData :: Common SUCCESS [ > 2.066 s] > [INFO] Apache CarbonData :: Core .. SUCCESS [ > 5.512 s] > [INFO] Apache CarbonData :: Processing SUCCESS [ > 1.892 s] > [INFO] Apache CarbonData :: Hadoop SUCCESS [ > 0.789 s] > [INFO] Apache CarbonData :: Spark Common .. SUCCESS [ > 17.121 s] > [INFO] Apache CarbonData :: Spark . SUCCESS [ > 33.269 s] > [INFO] Apache CarbonData :: Assembly .. SUCCESS [ > 17.700 s] > [INFO] Apache CarbonData :: Spark Examples SUCCESS [ > 7.741 s] > [INFO] > > [INFO] BUILD SUCCESS > [INFO] > > [INFO] Total time: 01:27 min > [INFO] Finished at: 2016-12-19T14:57:26+08:00 > [INFO] Final Memory: 83M/1623M > [INFO] > > > > > but I didn't find a file names spark-submit under the path carbondata/bin/: > > > [hadoop@master ~]$ cd carbondata/bin/ > [hadoop@master bin]$ ll > total 8 > -rwxrwxr-x 1 hadoop hadoop 3879 Dec 19 14:54 carbon-spark-shell > -rwxrwxr-x 1 hadoop hadoop 2820 Dec 19 14:54 carbon-spark-sql > > > > is this phenomenon normal ? > > > > > > -- -- > ??: "Liang Chen";<chenliang6...@gmail.com>; > : 2016??12??19??(??) 3:19 > ??: "dev"<dev@carbondata.incubator.apache.org>; > > : Re: How to compile the latest source code of carbondata > > > > Hi > > Please check if you have added the enough right for folder "carbondata"? > > --- > For spark 1.5, the compile process has no issue, but carbon-spark-shell > can not run correctly: > step 1: git clone https://github.com/apache/incubator-carbondata.git > carbondata > step 2: mvn clean package -DskipTests -Pspark-1.5 > step 3: ./bin/carbon-spark-shell, and it turns out: > > > [hadoop@master carbondata]$ ./bin/carbon-spark-shell > ./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or > directory > > 2016-12-19 15:05 GMT+08:00 251469031 <251469...@qq.com>: > > > thx liang. > > > > > > I've tried spark 2.0.0 and spark 1.5.0, my step & script is: > > > > > > For spark 2.0, the compile process has no issue, but carbon-spark-shell > > can not run correctly: > > > > > > step 1: git clone https://github.com/apache/incubator-carbondata.git > > carbondata > > step 2: mvn clean package -DskipTests -Pspark-2.0 > > step 3: ./bin/carbon-spark-shell, and is turns out: > > > > > > [hadoop@master carbondata]$ ./bin/carbon-spark-shell > > ls: cannot access /home/hadoop/carbondata/assembly/target/scala-2.10: No > > such file or directory > > ls: cannot access /home/hadoop/carbond
?????? How to compile the latest source code of carbondata
the privileges of the folder "carbondata" is: drwxr-xr-x 18 hadoop hadoop 4096 Dec 19 14:56 carbondata and hadoop is the user who run maven. well, after run mvn command, I get the info from console as follows: [INFO] Reactor Summary: [INFO] [INFO] Apache CarbonData :: Parent SUCCESS [ 1.012 s] [INFO] Apache CarbonData :: Common SUCCESS [ 2.066 s] [INFO] Apache CarbonData :: Core .. SUCCESS [ 5.512 s] [INFO] Apache CarbonData :: Processing SUCCESS [ 1.892 s] [INFO] Apache CarbonData :: Hadoop SUCCESS [ 0.789 s] [INFO] Apache CarbonData :: Spark Common .. SUCCESS [ 17.121 s] [INFO] Apache CarbonData :: Spark . SUCCESS [ 33.269 s] [INFO] Apache CarbonData :: Assembly .. SUCCESS [ 17.700 s] [INFO] Apache CarbonData :: Spark Examples SUCCESS [ 7.741 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 01:27 min [INFO] Finished at: 2016-12-19T14:57:26+08:00 [INFO] Final Memory: 83M/1623M [INFO] but I didn't find a file names spark-submit under the path carbondata/bin/: [hadoop@master ~]$ cd carbondata/bin/ [hadoop@master bin]$ ll total 8 -rwxrwxr-x 1 hadoop hadoop 3879 Dec 19 14:54 carbon-spark-shell -rwxrwxr-x 1 hadoop hadoop 2820 Dec 19 14:54 carbon-spark-sql is this phenomenon normal ? -- -- ??: "Liang Chen";<chenliang6...@gmail.com>; : 2016??12??19??(??) 3:19 ??: "dev"<dev@carbondata.incubator.apache.org>; : Re: How to compile the latest source code of carbondata Hi Please check if you have added the enough right for folder "carbondata"? --- For spark 1.5, the compile process has no issue, but carbon-spark-shell can not run correctly: step 1: git clone https://github.com/apache/incubator-carbondata.git carbondata step 2: mvn clean package -DskipTests -Pspark-1.5 step 3: ./bin/carbon-spark-shell, and it turns out: [hadoop@master carbondata]$ ./bin/carbon-spark-shell ./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or directory 2016-12-19 15:05 GMT+08:00 251469031 <251469...@qq.com>: > thx liang. > > > I've tried spark 2.0.0 and spark 1.5.0, my step & script is: > > > For spark 2.0, the compile process has no issue, but carbon-spark-shell > can not run correctly: > > > step 1: git clone https://github.com/apache/incubator-carbondata.git > carbondata > step 2: mvn clean package -DskipTests -Pspark-2.0 > step 3: ./bin/carbon-spark-shell, and is turns out: > > > [hadoop@master carbondata]$ ./bin/carbon-spark-shell > ls: cannot access /home/hadoop/carbondata/assembly/target/scala-2.10: No > such file or directory > ls: cannot access /home/hadoop/carbondata/assembly/target/scala-2.10: No > such file or directory > ./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or > directory > > > > For spark 1.5, the compile process has no issue, but carbon-spark-shell > can not run correctly: > step 1: git clone https://github.com/apache/incubator-carbondata.git > carbondata > step 2: mvn clean package -DskipTests -Pspark-1.5 > step 3: ./bin/carbon-spark-shell, and it turns out: > > > [hadoop@master carbondata]$ ./bin/carbon-spark-shell > ./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or > directory > > > > > > > > > > > -- -- > ??: "Liang Chen";<chenliang6...@gmail.com>; > : 2016??12??19??(??) 2:37 > ??: "dev"<dev@carbondata.incubator.apache.org>; > > : Re: How to compile the latest source code of carbondata > > > > Hi > > Can you share : do you get what errors and using which compile command? > > Regards > Liang > > 2016-12-19 14:32 GMT+08:00 251469031 <251469...@qq.com>: > > > Hi all: > > > > I've tried to comple the latest source code followed by the toturial: > > https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start , but > > it doesn't work on the latest source code on the github. > > > > > > Would you send me some toturial about how to do this or tell me how > to > > use carbondata, thx~ > > > > > -- > Regards > Liang -- Regards Liang
?????? How to compile the latest source code of carbondata
thx liang. I've tried spark 2.0.0 and spark 1.5.0, my step & script is: For spark 2.0, the compile process has no issue, but carbon-spark-shell can not run correctly: step 1: git clone https://github.com/apache/incubator-carbondata.git carbondata step 2: mvn clean package -DskipTests -Pspark-2.0 step 3: ./bin/carbon-spark-shell, and is turns out: [hadoop@master carbondata]$ ./bin/carbon-spark-shell ls: cannot access /home/hadoop/carbondata/assembly/target/scala-2.10: No such file or directory ls: cannot access /home/hadoop/carbondata/assembly/target/scala-2.10: No such file or directory ./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or directory For spark 1.5, the compile process has no issue, but carbon-spark-shell can not run correctly: step 1: git clone https://github.com/apache/incubator-carbondata.git carbondata step 2: mvn clean package -DskipTests -Pspark-1.5 step 3: ./bin/carbon-spark-shell, and it turns out: [hadoop@master carbondata]$ ./bin/carbon-spark-shell ./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or directory -- -- ??: "Liang Chen";<chenliang6...@gmail.com>; : 2016??12??19??(??) 2:37 ??: "dev"<dev@carbondata.incubator.apache.org>; : Re: How to compile the latest source code of carbondata Hi Can you share : do you get what errors and using which compile command? Regards Liang 2016-12-19 14:32 GMT+08:00 251469031 <251469...@qq.com>: > Hi all: > > I've tried to comple the latest source code followed by the toturial: > https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start , but > it doesn't work on the latest source code on the github. > > > Would you send me some toturial about how to do this or tell me how to > use carbondata, thx~ -- Regards Liang
How to compile the latest source code of carbondata
Hi all: I've tried to comple the latest source code followed by the toturial: https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start , but it doesn't work on the latest source code on the github. Would you send me some toturial about how to do this or tell me how to use carbondata, thx~