[jira] [Created] (SPARK-5435) saveAsNewAPIHadoopDataset is not setting up the local configuration

Joe Mudd (JIRA) Tue, 27 Jan 2015 13:11:19 -0800

Joe Mudd created SPARK-5435:
-------------------------------

             Summary: saveAsNewAPIHadoopDataset is not setting up the local 
configuration
                 Key: SPARK-5435
                 URL: https://issues.apache.org/jira/browse/SPARK-5435
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
    Affects Versions: 1.2.0
         Environment: Cloudera 5.3.0
            Reporter: Joe Mudd



The HCatOutputFormat utilizes FileOutpuFormatContainer which refers to the MRv1 
FileOutputFormat.getUniqueName() method.  Since the local configuration has not 
been set up, getUniqueName() ends up throwing an IllegalArgumentException.

It appears the saveAsNewAPIHadoopDataset().writeshard method needs to record 
Job information in the local Hadoop configuration similar to 
HadoopRDD.addLocalConfiguration().  In a test build, I ended up setting both 
the MRv1 and MRv2 names since just having the MRv2 names did not work.

Here's the traceback:

java.lang.IllegalArgumentException: This method can only be called from within 
a Job
        at 
org.apache.hadoop.mapred.FileOutputFormat.getUniqueName(FileOutputFormat.java:286)
        at 
org.apache.hive.hcatalog.mapreduce.FileOutputFormatContainer.getRecordWriter(FileOutputFormatContainer.java:101)
        at 
org.apache.hive.hcatalog.mapreduce.HCatOutputFormat.getRecordWriter(HCatOutputFormat.java:260)
        at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:984)
        at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:965)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:56)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-5435) saveAsNewAPIHadoopDataset is not setting up the local configuration

Reply via email to