discussion about benchmark standard that carbondata used

2017-01-15 Thread 251469031
Hi all,


Benchmark test can measure the performance of a system. Although carbondata 
is a data-store, maybe it's better to have a benchmark test use some universal 
benchmark standard such as TPC-DS.


so, which benchmark standard does carbondata use?

Re: Problem while copying file from local store to carbon store

2017-01-09 Thread 251469031
thx 


I've solved the problem, here is my record:


first, 


I found the spark job failed when loading data and there is an error 
"CarbonDataWriterException: Problem while copying file from local store to 
carbon store", when located to the source code at 
./processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter,
 it shows:


private void copyCarbonDataFileToCarbonStorePath(String localFileName)
  throws CarbonDataWriterException {
long copyStartTime = System.currentTimeMillis();
LOGGER.info("Copying " + localFileName + " --> " + 
dataWriterVo.getCarbonDataDirectoryPath());
try {
  CarbonFile localCarbonFile =
  FileFactory.getCarbonFile(localFileName, 
FileFactory.getFileType(localFileName));
  String carbonFilePath = dataWriterVo.getCarbonDataDirectoryPath() + 
localFileName
  .substring(localFileName.lastIndexOf(File.separator));
  copyLocalFileToCarbonStore(carbonFilePath, localFileName,
  CarbonCommonConstants.BYTEBUFFER_SIZE,
  getMaxOfBlockAndFileSize(fileSizeInBytes, localCarbonFile.getSize()));
} catch (IOException e) {
  throw new CarbonDataWriterException(
  "Problem while copying file from local store to carbon store");
}
LOGGER.info(
"Total copy time (ms) to copy file " + localFileName + " is " + 
(System.currentTimeMillis()
- copyStartTime));
  }



the main reason is that the method copyLocalFileToCarbonStore cause an 
IOException, but the catch block doesn't tell me what is the real reason that 
coused the error(at this moment, I really like technical logs more then 
business logs). so I add a line of code:
...
catch (IOException e) {
  LOGGER.info("---logs print by liyinwei 
start-");
  LOGGER.error(e, "");
  LOGGER.info("---logs print by liyinwei end  
-");
  throw new CarbonDataWriterException(
  "Problem while copying file from local store to carbon store");



then I rebuild the source code and it logs as follows:


INFO  10-01 10:29:59,546 - [test_table: Graph - 
MDKeyGentest_table][partitionID:0] ---logs print by liyinwei 
start-
ERROR 10-01 10:29:59,547 - [test_table: Graph - 
MDKeyGentest_table][partitionID:0] 
java.io.FileNotFoundException: 
/home/hadoop/carbondata/bin/carbonshellstore/default/test_table/Fact/Part0/Segment_0/part-0-0-1484015398000.carbondata
 (No such file or directory)
at java.io.FileOutputStream.open0(Native Method)
...
INFO  10-01 10:29:59,547 - [test_table: Graph - 
MDKeyGentest_table][partitionID:0] ---logs print by liyinwei 
end  -
ERROR 10-01 10:29:59,547 - [test_table: Graph - 
MDKeyGentest_table][partitionID:0] Problem while copying file from local store 
to carbon store



second,


as u see, the main reason that cause the error is a FileNotFoundException, 
which means the metadata is not found. with the help of Liang Chen & Brave 
heart, I found that the default of carbondata storePath is as below if we start 
the spark-shell by using carbon-spark-shell:
scala> print(cc.storePath)
/home/hadoop/carbondata/bin/carbonshellstore



so I added a parameter when starting carbon-spark-shell:
./bin/carbon-spark-shell --conf 
spark.carbon.storepath=hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore


and then print the storePath:
scala> print(cc.storePath)
hdfs://master:9000/home/hadoop/carbondata/bin/carbonshellstore





finally,


I run the command


cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv' into 
table test_table")


again and it success, which follows:


cc.sql("select * from test_table").show






-- Original --
From:  "Liang Chen";;
Date:  Tue, Jan 10, 2017 12:11 PM
To:  "dev"; 

Subject:  Re: Problem while copying file from local store to carbon store



Hi 

Please use spark-shell to create carboncontext, you can refer to these
articles : 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=67635497

Regards
Liang



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/load-data-error-from-csv-file-at-hdfs-error-in-standalone-spark-cluster-tp5783p5844.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Problem while copying file from local store to carbon store

2017-01-09 Thread 251469031
Hi all??

when I load data from hdfs to a table:

cc.sql(s"load data inpath 'hdfs://master:9000/home/hadoop/sample.csv' into 
table test_table")

two errors occured,  at slave1:


INFO  09-01 16:17:58,611 - test_table: Graph - CSV Input 
*Started all csv reading*** INFO  09-01 16:17:58,611 - 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] 
*started csv reading by thread*** INFO  09-01 
16:17:58,635 - 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Total 
Number of records processed by this thread is: 3 INFO  09-01 16:17:58,635 - 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Time 
taken to processed 3 Number of records: 24 INFO  09-01 16:17:58,636 - 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] 
*Completed csv reading by thread*** INFO  09-01 
16:17:58,636 - test_table: Graph - CSV Input *Completed all csv 
reading*** INFO  09-01 16:17:58,642 - [test_table: Graph - Carbon 
Surrogate Key Generator][partitionID:0] Column cache size not configured. 
Therefore default behavior will be considered and no LRU based eviction of 
columns will be done ERROR 09-01 16:17:58,645 - [test_table: Graph - Carbon 
Surrogate Key Generator][partitionID:0] 
org.apache.carbondata.core.util.CarbonUtilException: Either dictionary or its 
metadata does not exist for column identifier :: ColumnIdentifier 
[columnId=c70480f9-4336-4186-8bd0-a3bebb50ea6a] ERROR 09-01 16:17:58,646 - 
[test_table: Graph - Carbon Surrogate Key Generator][partitionID:0]  
org.pentaho.di.core.exception.KettleException:  
org.apache.carbondata.core.util.CarbonUtilException: Either dictionary or its 
metadata does not exist for column identifier :: ColumnIdentifier 
[columnId=c70480f9-4336-4186-8bd0-a3bebb50ea6a]at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.initDictionaryCacheInfo(FileStoreSurrogateKeyGenForCSV.java:297)
 at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.populateCache(FileStoreSurrogateKeyGenForCSV.java:270)
   at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.FileStoreSurrogateKeyGenForCSV.(FileStoreSurrogateKeyGenForCSV.java:144)
  at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.processRow(CarbonCSVBasedSeqGenStep.java:385)
  at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)   at 
java.lang.Thread.run(Thread.java:745) INFO  09-01 16:17:58,647 - [test_table: 
Graph - Carbon Slice Mergertest_table][partitionID:table] Record Procerssed For 
table: test_table INFO  09-01 16:17:58,647 - [test_table: Graph - Carbon Slice 
Mergertest_table][partitionID:table] Summary: Carbon Slice Merger Step: Read: 
0: Write: 0 INFO  09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort 
keystest_table][partitionID:0] Record Processed For table: test_table INFO  
09-01 16:17:58,647 - [test_table: Graph - Sort Key: Sort 
keystest_table][partitionID:0] Number of Records was Zero INFO  09-01 
16:17:58,647 - [test_table: Graph - Sort Key: Sort 
keystest_table][partitionID:0] Summary: Carbon Sort Key Step: Read: 0: Write: 0 
INFO  09-01 16:17:58,747 - [Executor task launch 
worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] 
Graph execution is finished. ERROR 09-01 16:17:58,748 - [Executor task launch 
worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] 
Graph Execution had errors INFO  09-01 16:17:58,749 - [Executor task launch 
worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e] 
Deleted the local store location/tmp/259202084415620/0 INFO  09-01 16:17:58,749 
- DataLoad complete INFO  09-01 16:17:58,749 - Data Loaded successfully with 
LoadCount:0 INFO  09-01 16:17:58,749 - DataLoad failure ERROR 09-01 
16:17:58,749 - [Executor task launch 
worker-0][partitionID:default_test_table_632e80a6-77ef-44b2-aed7-2e5bbf56610e]  
org.apache.carbondata.processing.etl.DataLoadingException: Due to internal 
errors, please check logs for more details. at 
org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212)
   at 
org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144)
  at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212)
at 
org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125)
at 
org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.(CarbonDataLoadRDD.scala:255)
at 
org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)  
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at 

load data error from csv file at hdfs error in standalone spark cluster

2017-01-08 Thread 251469031
Hi all,


when I load data from hdfs csv file, a stage of spark job failed with the 
following error, where can I find a more detail error that can help me find the 
solution, or may some one know why this happen and how to solve it.


command:


cc.sql(s"load data inpath 'hdfs://master:9000/opt/sample.csv' into table 
test_table")


error log:


Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): 
org.apache.carbondata.processing.etl.DataLoadingException: Due to internal 
errors, please check logs for more details.


Job aborted due to stage failure: Task 0 in stage 7.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 7.0 (TID 17, slave2): 
org.apache.carbondata.processing.etl.DataLoadingException: Due to internal 
errors, please check logs for more details.  at 
org.apache.carbondata.processing.csvload.DataGraphExecuter.execute(DataGraphExecuter.java:212)
   at 
org.apache.carbondata.processing.csvload.DataGraphExecuter.executeGraph(DataGraphExecuter.java:144)
  at 
org.apache.carbondata.spark.load.CarbonLoaderUtil.executeGraph(CarbonLoaderUtil.java:212)
at 
org.apache.carbondata.spark.rdd.SparkPartitionLoader.run(CarbonDataLoadRDD.scala:125)
at 
org.apache.carbondata.spark.rdd.DataFileLoaderRDD$$anon$1.(CarbonDataLoadRDD.scala:255)
at 
org.apache.carbondata.spark.rdd.DataFileLoaderRDD.compute(CarbonDataLoadRDD.scala:232)
   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)  
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)   at 
org.apache.spark.scheduler.Task.run(Task.scala:89)   at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
 at java.lang.Thread.run(Thread.java:745) Driver stacktrace:

ERROR csvbased.CarbonCSVBasedSeqGenStep: [test_table: Graph - Carbon Surrogate Key Generator][partitionID:0]

2016-12-27 Thread 251469031
Hi all,


  when I load csv file to a table, it accured an error in spark jobs:


Version & Environment??
Spark1.6.0 + Lastest version of Carbondata at github + cluster mode


commands:
cc.sql("create table if not exists test_table (id string, name string, city 
string, age Int) STORED BY 'carbondata'")
cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/sample.csv' into table 
test_table")


CSV file data:
cat > sample.csv << EOF
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
EOF



Error Description:
collect at CarbonDataRDDFactory.scala:623


Failure Reason:
Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 3.0 (TID 8, slave3): 
org.apache.carbondata.processing.etl.DataLoadingException: Due to internal 
errors, please check logs for more details.


Spark Worker Log??


16/12/28 14:18:40 INFO csvreaderstep.CsvInput: test_table: Graph - CSV Input 
*Started all csv reading***
16/12/28 14:18:40 INFO csvreaderstep.CsvInput: 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] 
*started csv reading by thread***
16/12/28 14:18:40 INFO csvreaderstep.CsvInput: 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Total 
Number of records processed by this thread is: 3
16/12/28 14:18:40 INFO csvreaderstep.CsvInput: 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Time 
taken to processed 3 Number of records: 15
16/12/28 14:18:40 INFO csvreaderstep.CsvInput: 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] 
*Completed csv reading by thread***
16/12/28 14:18:40 INFO csvreaderstep.CsvInput: test_table: Graph - CSV Input 
*Completed all csv reading***
16/12/28 14:18:40 INFO cache.CarbonLRUCache: [test_table: Graph - Carbon 
Surrogate Key Generator][partitionID:0] Column cache size not configured. 
Therefore default behavior will be considered and no LRU based eviction of 
columns will be done
16/12/28 14:18:40 ERROR csvbased.CarbonCSVBasedSeqGenStep: [test_table: Graph - 
Carbon Surrogate Key Generator][partitionID:0]
java.lang.RuntimeException: java.lang.NullPointerException
at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.process(CarbonCSVBasedSeqGenStep.java:940)
at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.processRow(CarbonCSVBasedSeqGenStep.java:515)
at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
at 
org.apache.carbondata.core.cache.dictionary.ColumnReverseDictionaryInfo.getSurrogateKey(ColumnReverseDictionaryInfo.java:73)
at 
org.apache.carbondata.core.cache.dictionary.AbstractColumnDictionaryInfo.getSurrogateKey(AbstractColumnDictionaryInfo.java:289)
at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionary.getSurrogateKey(ReverseDictionary.java:50)
at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedDimSurrogateKeyGen.generateSurrogateKeys(CarbonCSVBasedDimSurrogateKeyGen.java:150)
at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.populateOutputRow(CarbonCSVBasedSeqGenStep.java:1233)
at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.process(CarbonCSVBasedSeqGenStep.java:929)
... 3 more
16/12/28 14:18:40 INFO sortdatastep.SortKeyStep: [test_table: Graph - Sort Key: 
Sort keystest_table][partitionID:0] Record Processed For table: test_table
16/12/28 14:18:40 INFO step.CarbonSliceMergerStep: [test_table: Graph - Carbon 
Slice Mergertest_table][partitionID:table] Record Procerssed For table: 
test_table



is anyone has any idea? thx~

Re: ?????? Dictionary file is locked for updation

2016-12-27 Thread 251469031
thx QiangCai, the problem is solved.


so, maybe it's better to correct the document at 
https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide,
 change the value of spark.executor.extraJavaOptions 


from 
-Dcarbon.properties.filepath=carbon.properties 


to 
-Dcarbon.properties.filepath="/conf/carbon.properties





-- Original --
From:  "QiangCai";;
Date:  Tue, Dec 27, 2016 05:40 PM
To:  "dev"; 

Subject:  Re:  ?? Dictionary file is locked for updation



please correct the path of carbon.properties file.

spark.executor.extraJavaOptions
-Dcarbon.properties.filepath=carbon.properties 





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-tp5076p5092.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: ?????? Dictionary file is locked for updation

2016-12-27 Thread 251469031
I'm sorry, carbon.storelocation has been configured in my cluster. I didn't 
copy it. the configuration is:


carbon.storelocation=hdfs://master:9000/carbondata




-- Original --
From:  "QiangCai";;
Date:  Tue, Dec 27, 2016 05:29 PM
To:  "dev"; 

Subject:  Re: ?? Dictionary file is locked for updation



Please try to add carbon.storelocation to carbon.properties file.
e.g.
carbon.storelocation=hdfs://master:9000/carbondata/store

You can have a look the following guide and pay attention to
carbon.properties file.

https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-tp5076p5090.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

?????? Dictionary file is locked for updation

2016-12-27 Thread 251469031
I'm using spark 1.6.0, and the carbondata is the latest master branch in the 
github.


my carbon.properties is configured as :


carbon.ddl.base.hdfs.url=hdfs://master:9000/carbondata/data
carbon.badRecords.location=/opt/Carbon/Spark/badrecords
carbon.kettle.home=/opt/spark-1.6.0/carbonlib/carbonplugins

carbon.lock.type=HDFSLOCK



my spark-default.conf is configured as :


spark.masterspark://master:7077
spark.yarn.dist.files   
/opt/spark-1.6.0/conf/carbon.properties
spark.yarn.dist.archives
/opt/spark-1.6.0/carbonlib/carbondata_2.10-1.0.0-incubating-SNAPSHOT-shade-hadoop2.7.2.jar
spark.executor.extraJavaOptions 
-Dcarbon.properties.filepath=carbon.properties
#spark.executor.extraClassPath  
/opt/spark-1.6.0/carbonlib/carbondata_2.10-1.0.0-incubating-SNAPSHOT-shade-hadoop2.2.0.jar
#spark.driver.extraClassPath
/opt/spark-1.6.0/carbonlib/carbondata_2.10-1.0.0-incubating-SNAPSHOT-shade-hadoop2.2.0.jar
spark.driver.extraJavaOptions   
-Dcarbon.properties.filepath=/opt/spark-1.6.0/conf/carbon.properties
carbon.kettle.home  
/opt/spark-1.6.0/carbonlib/carbonplugins





--  --
??: "Ravindra Pesala";<ravi.pes...@gmail.com>;
: 2016??12??27??(??) 4:15
??: "dev"<dev@carbondata.incubator.apache.org>; 

: Re: Dictionary file is locked for updation



Hi,

It seems the store path location is taking default location. Did you set
the store location properly? Which spark version you are using?

Regards,
Ravindra

On Tue, Dec 27, 2016, 1:38 PM 251469031 <251469...@qq.com> wrote:

> Hi Kumar,
>
>
>   thx to your repley, the full logs is as follows:
>
>
> 16/12/27 12:30:17 INFO locks.HdfsFileLock: Executor task launch worker-0
> HDFS lock
> path:hdfs://master:9000../carbon.store/default/test_table/2e9b7efa-2934-463a-9280-ff50c5129268.lock
> 16/12/27 12:30:17 INFO storage.ShuffleBlockFetcherIterator: Getting 1
> non-empty blocks out of 1 blocks
> 16/12/27 12:30:17 INFO storage.ShuffleBlockFetcherIterator: Started 1
> remote fetches in 1 ms
> 16/12/27 12:30:32 ERROR rdd.CarbonGlobalDictionaryGenerateRDD: Executor
> task launch worker-0
> java.lang.RuntimeException: Dictionary file name is locked for updation.
> Please try after some time
> at scala.sys.package$.error(package.scala:27)
> at
> org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:364)
> at
> org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:302)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
> as u see, the lock file path
> is:hdfs://master:9000../carbon.store/default/test_table/2e9b7efa-2934-463a-9280-ff50c5129268.lock
>
>
>
>
> --  --
> ??: "Kumar Vishal";<kumarvishal1...@gmail.com>;
> : 2016??12??27??(??) 3:25
> ??: "dev"<dev@carbondata.incubator.apache.org>;
>
> : Re: Dictionary file is locked for updation
>
>
>
> Hi,
> can you please find *"HDFS lock path"* string in executor log and let me
> know the complete log message.
>
> -Regards
> Kumar Vishal
>
> On Tue, Dec 27, 2016 at 12:45 PM, 251469031 <251469...@qq.com> wrote:
>
> > Hi all,
> >
> >
> > when I run the following script:
> > scala> cc.sql(s"load data inpath
> 'hdfs://master:9000/carbondata/sample.csv'
> > into table test_table")
> >
> >
> > it turns out that:
> > WARN  27-12 12:37:58,044 - Lost task 1.3 in stage 2.0 (TID 13, slave1):
> > java.lang.RuntimeException: Dictionary file name is locked for updation.
> > Please try after some time
> >
> >
> > what I have done are:
> > 1.in carbon.properties, set carbon.lock.type=HDFSLOCK
> > 2.send carbon.properties & spark-defaults.conf to all nodes of the
> clusters
> >
> >
> > if any of you have any idea, looking forward to your replay, thx~

?????? Dictionary file is locked for updation

2016-12-27 Thread 251469031
Hi Kumar,


  thx to your repley, the full logs is as follows:


16/12/27 12:30:17 INFO locks.HdfsFileLock: Executor task launch worker-0 HDFS 
lock 
path:hdfs://master:9000../carbon.store/default/test_table/2e9b7efa-2934-463a-9280-ff50c5129268.lock
16/12/27 12:30:17 INFO storage.ShuffleBlockFetcherIterator: Getting 1 non-empty 
blocks out of 1 blocks
16/12/27 12:30:17 INFO storage.ShuffleBlockFetcherIterator: Started 1 remote 
fetches in 1 ms
16/12/27 12:30:32 ERROR rdd.CarbonGlobalDictionaryGenerateRDD: Executor task 
launch worker-0
java.lang.RuntimeException: Dictionary file name is locked for updation. Please 
try after some time
at scala.sys.package$.error(package.scala:27)
at 
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:364)
at 
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:302)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



as u see, the lock file path 
is:hdfs://master:9000../carbon.store/default/test_table/2e9b7efa-2934-463a-9280-ff50c5129268.lock




--  --
??: "Kumar Vishal";<kumarvishal1...@gmail.com>;
: 2016??12??27??(??) 3:25
??: "dev"<dev@carbondata.incubator.apache.org>; 

: Re: Dictionary file is locked for updation



Hi,
can you please find *"HDFS lock path"* string in executor log and let me
know the complete log message.

-Regards
Kumar Vishal

On Tue, Dec 27, 2016 at 12:45 PM, 251469031 <251469...@qq.com> wrote:

> Hi all,
>
>
> when I run the following script:
> scala> cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/sample.csv'
> into table test_table")
>
>
> it turns out that:
> WARN  27-12 12:37:58,044 - Lost task 1.3 in stage 2.0 (TID 13, slave1):
> java.lang.RuntimeException: Dictionary file name is locked for updation.
> Please try after some time
>
>
> what I have done are:
> 1.in carbon.properties, set carbon.lock.type=HDFSLOCK
> 2.send carbon.properties & spark-defaults.conf to all nodes of the clusters
>
>
> if any of you have any idea, looking forward to your replay, thx~

Dictionary file is locked for updation

2016-12-26 Thread 251469031
Hi all,


when I run the following script:
scala> cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/sample.csv' 
into table test_table")


it turns out that:
WARN  27-12 12:37:58,044 - Lost task 1.3 in stage 2.0 (TID 13, slave1): 
java.lang.RuntimeException: Dictionary file name is locked for updation. Please 
try after some time


what I have done are:
1.in carbon.properties, set carbon.lock.type=HDFSLOCK
2.send carbon.properties & spark-defaults.conf to all nodes of the clusters


if any of you have any idea, looking forward to your replay, thx~

the storepath in carbon.properties seems not work

2016-12-26 Thread 251469031
Hi all:


I'm now configing carbondata in cluster mode, and some configurations in 
the file carbon.properties are as bellow:


carbon.storelocation=hdfs://master:9000/carbondata
carbon.ddl.base.hdfs.url=hdfs://master:9000/carbondata/data
carbon.kettle.home=/opt/spark-1.6.0/carbonlib/carbonplugins



but when I create a table using the command:
cc.sql("create table if not exists test_table (id string, name string, city 
string, age Int) STORED BY 'carbondata'")


the output in the spark shell says the tablePath is a local one:
tablePath=/home/hadoop/carbondata/bin/carbonshellstore/default/test_table


and the storePath is also shown as :


scala> print(cc.storePath)
/home/hadoop/carbondata/bin/carbonshellstore



the file carbon.properties has been sent to all the nodes in the cluster. I 
doult where can I modify the config, looking forward to your help, thx~

?????? etl.DataLoadingException: The input file does not exist

2016-12-23 Thread 251469031
Oh I see, I've solved it, thx very much to Manish & QiangCai~~


here is my dml script:
cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/pt/sample.csv' into 
table test_table")
 




--  --
??: "manish gupta";<tomanishgupt...@gmail.com>;
: 2016??12??23??(??) 2:32
??: "dev"<dev@carbondata.incubator.apache.org>; 

: Re: ?? etl.DataLoadingException: The input file does not exist



Hi 251469031,

Thanks for showing interest in carbon. For your question please refer the
explanation below.

scala> val dataFilePath = new File("hdfs://master:9000/
carbondata/sample.csv").getCanonicalPath
dataFilePath: String = /home/hadoop/carbondata/hdfs:/
master:9000/carbondata/sample.csv

If you use new File, it will always return the pointer for path from local
file system. So Incase you are not appending hdfs url to the file/folder
path in the Load data DDL command, you can configure
*carbon.ddl.base.hdfs.url* in carbon.properties file as suggested by
QiangCai.

*carbon.ddl.base.hdfs.url=hdfs://:*

example
*carbon.ddl.base.hdfs.url=hdfs://9.82.101.42:54310
<http://9.82.101.42:54310>*

Regards
Manish Gupta

On Fri, Dec 23, 2016 at 10:09 AM, QiangCai <qiang...@qq.com> wrote:

> Please find the following item in carbon.properties file, give a proper
> path(hdfs://master:9000/)
> carbon.ddl.base.hdfs.url
>
> During loading, will combine this url and data file path.
>
> BTW, better to provide the version number.
>
>
>
> --
> View this message in context: http://apache-carbondata-
> mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The-
> input-file-does-not-exist-tp4853p4888.html
> Sent from the Apache CarbonData Mailing List archive mailing list archive
> at Nabble.com.
>

?????? etl.DataLoadingException: The input file does not exist

2016-12-22 Thread 251469031
Well, In the source code of carbondata, the filetype is determined as :


if (property.startsWith(CarbonUtil.HDFS_PREFIX)) {
storeDefaultFileType = FileType.HDFS;
  }


and  CarbonUtil.HDFS_PREFIX="hdfs://"


but when I run the following script, the dataFilePath is still local:


scala> val dataFilePath = new 
File("hdfs://master:9000/carbondata/sample.csv").getCanonicalPath
dataFilePath: String = 
/home/hadoop/carbondata/hdfs:/master:9000/carbondata/sample.csv





--  --
??: "Liang Chen";;
: 2016??12??22??(??) 8:47
??: "dev"; 

: Re: etl.DataLoadingException: The input file does not exist



Hi

This is because that you use cluster mode, but the input file is local file.
1.If you use cluster mode, please load hadoop files
2.If you just want to load local files, please use local mode. 


?? wrote
> Hi,
> 
> when i run the following script:
> 
> 
> scala>val dataFilePath = new
> File("/carbondata/pt/sample.csv").getCanonicalPath
> scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
> 
> 
> is turns out:
> 
> 
> org.apache.carbondata.processing.etl.DataLoadingException: The input file
> does not exist:
> hdfs://master:9000hdfs://master/opt/data/carbondata/pt/sample.csv
>   at
> org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
> 
> 
> It confused me that why there is a string "hdfs://master:9000" before
> "hdfs://master/opt/data/carbondata/pt/sample.csv", I can't found some
> configuration that contains "hdfs://master:9000", could any one help me~





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The-input-file-does-not-exist-tp4853p4854.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

etl.DataLoadingException: The input file does not exist

2016-12-22 Thread 251469031
Hi,

when i run the following script:


scala>val dataFilePath = new File("/carbondata/pt/sample.csv").getCanonicalPath
scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")


is turns out:


org.apache.carbondata.processing.etl.DataLoadingException: The input file does 
not exist: hdfs://master:9000hdfs://master/opt/data/carbondata/pt/sample.csv
at 
org.apache.spark.util.FileUtils$$anonfun$getPaths$1.apply$mcVI$sp(FileUtils.scala:66)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)


It confused me that why there is a string "hdfs://master:9000" before 
"hdfs://master/opt/data/carbondata/pt/sample.csv", I can't found some 
configuration that contains "hdfs://master:9000", could any one help me~

?????? InvalidInputException when loading data to table

2016-12-19 Thread 251469031
OK, thx~


It's a local path,  well, in the error log, it shows that the dataFilePath is 
set to  /home/hadoop/carbondata/sample.csv,  and it is where my test file 
located. @see the log:


Input path does not exist: /home/hadoop/carbondata/sample.csv


in the following command, is the package of class File is java.io.File?
scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath






--  --
??: "Liang Chen";<chenliang6...@gmail.com>;
: 2016??12??20??(??) 8:35
??: "dev"<dev@carbondata.incubator.apache.org>; 

: Re: InvalidInputException when loading data to table



Hi

1.Your input path is hadoop or local ? Please double check your input path
if it is correct.
2.As a new starter, suggest you use IntellijIDEA to open CarbonData , and
run all examples.

Regards
Liang
--
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: /home/hadoop/carbondata/sample.csv
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
listStatus(FileInputFormat.java:285)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
getSplits(FileInputFormat.java:340)
at org.apache.spark.rdd.NewHadoopRDD.getPartitions(
NewHadoopRDD.scala:113)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)

2016-12-19 20:24 GMT+08:00 251469031 <251469...@qq.com>:

> Hi all,
>
> I'm now learning how to getting started with carbondata according to
> the tutorial: https://cwiki.apache.org/confluence/display/CARBONDATA/
> Quick+Start.
>
>
> I created a file named sample.csv under the path
> /home/hadoop/carbondata at the master node, and when I run the script:
>
>
> scala>val dataFilePath = new File("../carbondata/sample.
> csv").getCanonicalPath
> scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")
>
>
> it turns out a "InvalidInputException" while the file is acctually exist,
> here is the scripts and logs:
>
>
> scala> val dataFilePath = new File("../carbondata/sample.
> csv").getCanonicalPath
> dataFilePath: String = /home/hadoop/carbondata/sample.csv
>
>
> scala> cc.sql(s"load data inpath '$dataFilePath' into table test_table")
> INFO  19-12 20:18:22,991 - main Query [LOAD DATA INPATH
> '/HOME/HADOOP/CARBONDATA/SAMPLE.CSV' INTO TABLE TEST_TABLE]
> INFO  19-12 20:18:23,271 - Successfully able to get the table metadata
> file lock
> INFO  19-12 20:18:23,276 - main Initiating Direct Load for the Table :
> (default.test_table)
> INFO  19-12 20:18:23,279 - main Generate global dictionary from source
> data files!
> INFO  19-12 20:18:23,296 - main [Block Distribution]
> INFO  19-12 20:18:23,297 - main totalInputSpaceConsumed: 74 ,
> defaultParallelism: 28
> INFO  19-12 20:18:23,297 - main mapreduce.input.fileinputformat.split.maxsize:
> 16777216
> INFO  19-12 20:18:23,380 - Block broadcast_0 stored as values in memory
> (estimated size 137.1 KB, free 137.1 KB)
> INFO  19-12 20:18:23,397 - Block broadcast_0_piece0 stored as bytes in
> memory (estimated size 15.0 KB, free 152.1 KB)
> INFO  19-12 20:18:23,398 - Added broadcast_0_piece0 in memory on
> 172.17.195.12:46335 (size: 15.0 KB, free: 511.1 MB)
> INFO  19-12 20:18:23,399 - Created broadcast 0 from NewHadoopRDD at
> CarbonTextFile.scala:73
> ERROR 19-12 20:18:23,431 - main generate global dictionary failed
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
> does not exist: /home/hadoop/carbondata/sample.csv
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
> listStatus(FileInputFormat.java:285)
> at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.
> getSplits(FileInputFormat.java:340)
> at org.apache.spark.rdd.NewHadoopRDD.getPartitions(
> NewHadoopRDD.scala:113)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:239)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(
> RDD.scala:237)
> at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
> ...
>
>
> If any of you have met the same problem, would you tell me why this
> happen, looking forward to your replay, thx~




-- 
Regards
Liang

?????? How to compile the latest source code of carbondata

2016-12-19 Thread 251469031
Thx liang.


I solve the problem. 
In the file carbon-spark-shell, FWDIR was set to $SPARK_HOME. I have configure 
the $SPARK_HOME in /etc/profile and the output of command "echo $SPARK_HOME" is 
correct, which mean the $SPARK_HOME has been set. 


But if I don't run the command "export $SPARK_HOME=" before running the 
command "./bin/carbon-spark-shell", the variable FWDIR can't be set. I doult 
why it is.




--  --
??: "";<251469...@qq.com>;
: 2016??12??19??(??) 4:02
??: "dev"<dev@carbondata.incubator.apache.org>; 

: ?? How to compile the latest source code of carbondata



I can visit spark web-ui  http://master:8080/
if there are any other environment that I should config.




--  --
??: "Liang Chen";<chenliang6...@gmail.com>;
: 2016??12??19??(??) 3:40
??: "dev"<dev@carbondata.incubator.apache.org>; 

: Re: How to compile the latest source code of carbondata



Hi

Please check your spark environment if it is ready ?



2016-12-19 15:34 GMT+08:00 251469031 <251469...@qq.com>:

> the privileges of the folder "carbondata" is:
>
>
> drwxr-xr-x 18 hadoop hadoop  4096 Dec 19 14:56 carbondata
>
>
> and hadoop is the user who run maven.
>
>
> well, after run mvn command, I get the info from console as follows:
>
>
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache CarbonData :: Parent  SUCCESS [
> 1.012 s]
> [INFO] Apache CarbonData :: Common  SUCCESS [
> 2.066 s]
> [INFO] Apache CarbonData :: Core .. SUCCESS [
> 5.512 s]
> [INFO] Apache CarbonData :: Processing  SUCCESS [
> 1.892 s]
> [INFO] Apache CarbonData :: Hadoop  SUCCESS [
> 0.789 s]
> [INFO] Apache CarbonData :: Spark Common .. SUCCESS [
> 17.121 s]
> [INFO] Apache CarbonData :: Spark . SUCCESS [
> 33.269 s]
> [INFO] Apache CarbonData :: Assembly .. SUCCESS [
> 17.700 s]
> [INFO] Apache CarbonData :: Spark Examples  SUCCESS [
> 7.741 s]
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
> 
> [INFO] Total time: 01:27 min
> [INFO] Finished at: 2016-12-19T14:57:26+08:00
> [INFO] Final Memory: 83M/1623M
> [INFO] 
> 
>
>
>
> but I didn't find a file names spark-submit under the path carbondata/bin/:
>
>
> [hadoop@master ~]$ cd carbondata/bin/
> [hadoop@master bin]$ ll
> total 8
> -rwxrwxr-x 1 hadoop hadoop 3879 Dec 19 14:54 carbon-spark-shell
> -rwxrwxr-x 1 hadoop hadoop 2820 Dec 19 14:54 carbon-spark-sql
>
>
>
> is this phenomenon normal ?
>
>
>
>
>
> --  --
> ??: "Liang Chen";<chenliang6...@gmail.com>;
> : 2016??12??19??(??) 3:19
> ??: "dev"<dev@carbondata.incubator.apache.org>;
>
> : Re: How to compile the latest source code of carbondata
>
>
>
> Hi
>
> Please check if you have added the enough right for folder "carbondata"?
> 
> ---
> For spark 1.5,  the compile process has no issue, but  carbon-spark-shell
> can not run correctly:
> step 1: git clone https://github.com/apache/incubator-carbondata.git
>  carbondata
> step 2: mvn clean package -DskipTests -Pspark-1.5
> step 3: ./bin/carbon-spark-shell, and it turns out:
>
>
> [hadoop@master carbondata]$ ./bin/carbon-spark-shell
> ./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or
> directory
>
> 2016-12-19 15:05 GMT+08:00 251469031 <251469...@qq.com>:
>
> > thx liang.
> >
> >
> > I've tried spark 2.0.0 and spark 1.5.0, my step & script is:
> >
> >
> > For spark 2.0, the compile process has no issue, but  carbon-spark-shell
> > can not run correctly:
> >
> >
> > step 1: git clone https://github.com/apache/incubator-carbondata.git
> > carbondata
> > step 2: mvn clean package -DskipTests -Pspark-2.0
> > step 3: ./bin/carbon-spark-shell, and is turns out:
> >
> >
> > [hadoop@master carbondata]$ ./bin/carbon-spark-shell
> > ls: cannot access /home/hadoop/carbondata/assembly/target/scala-2.10: No
> > such file or directory
> > ls: cannot access /home/hadoop/carbond

?????? How to compile the latest source code of carbondata

2016-12-18 Thread 251469031
the privileges of the folder "carbondata" is:


drwxr-xr-x 18 hadoop hadoop  4096 Dec 19 14:56 carbondata


and hadoop is the user who run maven.


well, after run mvn command, I get the info from console as follows:


[INFO] Reactor Summary:
[INFO] 
[INFO] Apache CarbonData :: Parent  SUCCESS [  1.012 s]
[INFO] Apache CarbonData :: Common  SUCCESS [  2.066 s]
[INFO] Apache CarbonData :: Core .. SUCCESS [  5.512 s]
[INFO] Apache CarbonData :: Processing  SUCCESS [  1.892 s]
[INFO] Apache CarbonData :: Hadoop  SUCCESS [  0.789 s]
[INFO] Apache CarbonData :: Spark Common .. SUCCESS [ 17.121 s]
[INFO] Apache CarbonData :: Spark . SUCCESS [ 33.269 s]
[INFO] Apache CarbonData :: Assembly .. SUCCESS [ 17.700 s]
[INFO] Apache CarbonData :: Spark Examples  SUCCESS [  7.741 s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 01:27 min
[INFO] Finished at: 2016-12-19T14:57:26+08:00
[INFO] Final Memory: 83M/1623M
[INFO] 



but I didn't find a file names spark-submit under the path carbondata/bin/:


[hadoop@master ~]$ cd carbondata/bin/
[hadoop@master bin]$ ll
total 8
-rwxrwxr-x 1 hadoop hadoop 3879 Dec 19 14:54 carbon-spark-shell
-rwxrwxr-x 1 hadoop hadoop 2820 Dec 19 14:54 carbon-spark-sql



is this phenomenon normal ?
 




--  --
??: "Liang Chen";<chenliang6...@gmail.com>;
: 2016??12??19??(??) 3:19
??: "dev"<dev@carbondata.incubator.apache.org>; 

: Re: How to compile the latest source code of carbondata



Hi

Please check if you have added the enough right for folder "carbondata"?
---
For spark 1.5,  the compile process has no issue, but  carbon-spark-shell
can not run correctly:
step 1: git clone https://github.com/apache/incubator-carbondata.git
 carbondata
step 2: mvn clean package -DskipTests -Pspark-1.5
step 3: ./bin/carbon-spark-shell, and it turns out:


[hadoop@master carbondata]$ ./bin/carbon-spark-shell
./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or
directory

2016-12-19 15:05 GMT+08:00 251469031 <251469...@qq.com>:

> thx liang.
>
>
> I've tried spark 2.0.0 and spark 1.5.0, my step & script is:
>
>
> For spark 2.0, the compile process has no issue, but  carbon-spark-shell
> can not run correctly:
>
>
> step 1: git clone https://github.com/apache/incubator-carbondata.git
> carbondata
> step 2: mvn clean package -DskipTests -Pspark-2.0
> step 3: ./bin/carbon-spark-shell, and is turns out:
>
>
> [hadoop@master carbondata]$ ./bin/carbon-spark-shell
> ls: cannot access /home/hadoop/carbondata/assembly/target/scala-2.10: No
> such file or directory
> ls: cannot access /home/hadoop/carbondata/assembly/target/scala-2.10: No
> such file or directory
> ./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or
> directory
>
>
>
> For spark 1.5,  the compile process has no issue, but  carbon-spark-shell
> can not run correctly:
> step 1: git clone https://github.com/apache/incubator-carbondata.git
> carbondata
> step 2: mvn clean package -DskipTests -Pspark-1.5
> step 3: ./bin/carbon-spark-shell, and it turns out:
>
>
> [hadoop@master carbondata]$ ./bin/carbon-spark-shell
> ./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or
> directory
>
>
>
>
>
>
>
>
>
>
> --  --
> ??: "Liang Chen";<chenliang6...@gmail.com>;
> : 2016??12??19??(??) 2:37
> ??: "dev"<dev@carbondata.incubator.apache.org>;
>
> : Re: How to compile the latest source code of carbondata
>
>
>
> Hi
>
> Can you share : do you get what errors and using which compile command?
>
> Regards
> Liang
>
> 2016-12-19 14:32 GMT+08:00 251469031 <251469...@qq.com>:
>
> > Hi all:
> >
> > I've tried to comple the latest source code followed by the toturial:
> > https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start , but
> > it doesn't work on the latest source code on the github.
> >
> >
> > Would you send me some toturial about how to do this or tell me how
> to
> > use carbondata, thx~
>
>
>
>
> --
> Regards
> Liang




-- 
Regards
Liang

?????? How to compile the latest source code of carbondata

2016-12-18 Thread 251469031
thx liang.


I've tried spark 2.0.0 and spark 1.5.0, my step & script is:


For spark 2.0, the compile process has no issue, but  carbon-spark-shell can 
not run correctly:


step 1: git clone https://github.com/apache/incubator-carbondata.git carbondata
step 2: mvn clean package -DskipTests -Pspark-2.0
step 3: ./bin/carbon-spark-shell, and is turns out:


[hadoop@master carbondata]$ ./bin/carbon-spark-shell 
ls: cannot access /home/hadoop/carbondata/assembly/target/scala-2.10: No such 
file or directory
ls: cannot access /home/hadoop/carbondata/assembly/target/scala-2.10: No such 
file or directory
./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or directory



For spark 1.5,  the compile process has no issue, but  carbon-spark-shell can 
not run correctly:
step 1: git clone https://github.com/apache/incubator-carbondata.git carbondata
step 2: mvn clean package -DskipTests -Pspark-1.5
step 3: ./bin/carbon-spark-shell, and it turns out:


[hadoop@master carbondata]$ ./bin/carbon-spark-shell 
./bin/carbon-spark-shell: line 78: /bin/spark-submit: No such file or directory





 




--  --
??: "Liang Chen";<chenliang6...@gmail.com>;
: 2016??12??19??(??) 2:37
??: "dev"<dev@carbondata.incubator.apache.org>; 

: Re: How to compile the latest source code of carbondata



Hi

Can you share : do you get what errors and using which compile command?

Regards
Liang

2016-12-19 14:32 GMT+08:00 251469031 <251469...@qq.com>:

> Hi all:
>
> I've tried to comple the latest source code followed by the toturial:
> https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start , but
> it doesn't work on the latest source code on the github.
>
>
> Would you send me some toturial about how to do this or tell me how to
> use carbondata, thx~




-- 
Regards
Liang

How to compile the latest source code of carbondata

2016-12-18 Thread 251469031
Hi all:

I've tried to comple the latest source code followed by the toturial: 
https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start , but it 
doesn't work on the latest source code on the github.


Would you send me some toturial about how to do this or tell me how to use 
carbondata, thx~