Hi all,

  when I load csv file to a table, it accured an error in spark jobs:


Version & Environment??
Spark1.6.0 + Lastest version of Carbondata at github + cluster mode


commands:
cc.sql("create table if not exists test_table (id string, name string, city 
string, age Int) STORED BY 'carbondata'")
cc.sql(s"load data inpath 'hdfs://master:9000/carbondata/sample.csv' into table 
test_table")


CSV file data:
cat > sample.csv << EOF
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
EOF



Error Description:
collect at CarbonDataRDDFactory.scala:623


Failure Reason:
Job aborted due to stage failure: Task 0 in stage 3.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 3.0 (TID 8, slave3): 
org.apache.carbondata.processing.etl.DataLoadingException: Due to internal 
errors, please check logs for more details.


Spark Worker Log??


16/12/28 14:18:40 INFO csvreaderstep.CsvInput: test_table: Graph - CSV Input 
*****************Started all csv reading***********
16/12/28 14:18:40 INFO csvreaderstep.CsvInput: 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] 
*****************started csv reading by thread***********
16/12/28 14:18:40 INFO csvreaderstep.CsvInput: 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Total 
Number of records processed by this thread is: 3
16/12/28 14:18:40 INFO csvreaderstep.CsvInput: 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] Time 
taken to processed 3 Number of records: 15
16/12/28 14:18:40 INFO csvreaderstep.CsvInput: 
[pool-20-thread-1][partitionID:PROCESS_BLOCKS;queryID:pool-20-thread-1] 
*****************Completed csv reading by thread***********
16/12/28 14:18:40 INFO csvreaderstep.CsvInput: test_table: Graph - CSV Input 
*****************Completed all csv reading***********
16/12/28 14:18:40 INFO cache.CarbonLRUCache: [test_table: Graph - Carbon 
Surrogate Key Generator][partitionID:0] Column cache size not configured. 
Therefore default behavior will be considered and no LRU based eviction of 
columns will be done
16/12/28 14:18:40 ERROR csvbased.CarbonCSVBasedSeqGenStep: [test_table: Graph - 
Carbon Surrogate Key Generator][partitionID:0]
java.lang.RuntimeException: java.lang.NullPointerException
        at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.process(CarbonCSVBasedSeqGenStep.java:940)
        at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.processRow(CarbonCSVBasedSeqGenStep.java:515)
        at org.pentaho.di.trans.step.RunThread.run(RunThread.java:50)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at 
org.apache.carbondata.core.cache.dictionary.ColumnReverseDictionaryInfo.getSurrogateKey(ColumnReverseDictionaryInfo.java:73)
        at 
org.apache.carbondata.core.cache.dictionary.AbstractColumnDictionaryInfo.getSurrogateKey(AbstractColumnDictionaryInfo.java:289)
        at 
org.apache.carbondata.core.cache.dictionary.ReverseDictionary.getSurrogateKey(ReverseDictionary.java:50)
        at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedDimSurrogateKeyGen.generateSurrogateKeys(CarbonCSVBasedDimSurrogateKeyGen.java:150)
        at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.populateOutputRow(CarbonCSVBasedSeqGenStep.java:1233)
        at 
org.apache.carbondata.processing.surrogatekeysgenerator.csvbased.CarbonCSVBasedSeqGenStep.process(CarbonCSVBasedSeqGenStep.java:929)
        ... 3 more
16/12/28 14:18:40 INFO sortdatastep.SortKeyStep: [test_table: Graph - Sort Key: 
Sort keystest_table][partitionID:0] Record Processed For table: test_table
16/12/28 14:18:40 INFO step.CarbonSliceMergerStep: [test_table: Graph - Carbon 
Slice Mergertest_table][partitionID:table] Record Procerssed For table: 
test_table



is anyone has any idea? thx~

Reply via email to