[ 
https://issues.apache.org/jira/browse/CARBONDATA-2877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16617521#comment-16617521
 ] 

Naman Rastogi commented on CARBONDATA-2877:
-------------------------------------------

Data loading from large files requires large "Unsafe Working Memory", a lot 
more than default 512 MB. So changing it something like 10GB should fix this 
problem. Please make this change, and it should work just fine.

> CarbonDataWriterException when loading data to carbon table with large number 
> of rows/columns from Spark-Submit
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-2877
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2877
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-load
>    Affects Versions: 1.4.1
>         Environment: Spark 2.1
>            Reporter: Chetan Bhat
>            Assignee: Naman Rastogi
>            Priority: Major
>
> Steps :
> from Spark-Submit. User creates a table with large number of columns(around 
> 100) and tries to load around 3 lakh records to the table.
> Spark-submit command - spark-submit --master yarn --num-executors 3 
> --executor-memory 75g --driver-memory 10g --executor-cores 12 --class
> Actual Issue : Data loading fails with CarbonDataWriterException.
> Executor yarn UI log-
> org.apache.spark.util.TaskCompletionListenerException: 
> org.apache.carbondata.core.datastore.exception.CarbonDataWriterException
> Previous exception in task: Error while initializing data handler : 
>  
> org.apache.carbondata.processing.loading.steps.DataWriterProcessorStepImpl.execute(DataWriterProcessorStepImpl.java:141)
>  
> org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:51)
>  
> org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD$$anon$1.<init>(NewCarbonDataLoadRDD.scala:221)
>  
> org.apache.carbondata.spark.rdd.NewCarbonDataLoadRDD.internalCompute(NewCarbonDataLoadRDD.scala:197)
>  org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:78)
>  org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  org.apache.spark.scheduler.Task.run(Task.scala:99)
>  org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>  
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  java.lang.Thread.run(Thread.java:748)
>  at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
>  at 
> org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:325)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
>  
> Expected : The dataloading should be successful from Spark-submit similar to 
> that in Beeline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to