[GitHub] spark pull request: [SPARK-5741][SQL] Support comma in path in Hiv...

watermen Wed, 11 Feb 2015 03:00:06 -0800

GitHub user watermen opened a pull request:

    https://github.com/apache/spark/pull/4532


    [SPARK-5741][SQL] Support comma in path in HiveContext

    When run ```select * from nzhang_part where hr = 'file,';```, it will throw 
error ```java.lang.IllegalArgumentException: Can not create a Path from an 
empty string```
    ```, because the path name of hdfs contains comma.
    set hive.merge.mapfiles=true; 
    set hive.merge.mapredfiles=true;
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    set hive.exec.dynamic.partition=true;
    set hive.exec.dynamic.partition.mode=nonstrict;
    create table nzhang_part like srcpart;
    insert overwrite table nzhang_part partition (ds='2010-08-15', hr) select 
key, value, hr from srcpart where ds='2008-04-08';
    insert overwrite table nzhang_part partition (ds='2010-08-15', hr=11) 
select key, value from srcpart where ds='2008-04-08';
    insert overwrite table nzhang_part partition (ds='2010-08-15', hr) 
    select * from (
    select key, value, hr from srcpart where ds='2008-04-08'
    union all
    select '1' as key, '1' as value, 'file,' as hr from src limit 1) s;
    select * from nzhang_part where hr = 'file,';
    ```
    ###############################
    Error log
    ###############################
    15/02/10 14:33:16 ERROR SparkSQLDriver: Failed in [select * from 
nzhang_part where hr = 'file,']
    java.lang.IllegalArgumentException: Can not create a Path from an empty 
string
    at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
    at org.apache.hadoop.fs.Path.<init>(Path.java:135)
    at org.apache.hadoop.util.StringUtils.stringToPath(StringUtils.java:241)
    at 
org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:400)
    at 
org.apache.spark.sql.hive.HadoopTableReader$.initializeLocalJobConfFunc(TableReader.scala:251)
    at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply(TableReader.scala:229)
    at 
org.apache.spark.sql.hive.HadoopTableReader$$anonfun$11.apply(TableReader.scala:229)
    at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:172)
    at 
org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:172)
    at scala.Option.map(Option.scala:145)
    at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:172)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:196)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/watermen/spark SPARK-5741

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/4532.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #4532
    
----
commit 2eedacc06b36dd582155da1709985e6d262a1e57
Author: q00251598 <[email protected]>
Date:   2015-02-11T09:16:06Z

    change setInputPaths to set

commit dc83c8893dc43f4efa6631f6d0aef925ab84c4dc
Author: q00251598 <[email protected]>
Date:   2015-02-11T09:41:18Z

    change setInputPaths to set

commit ae41e55c93745c8eaa3f0a5bd50131917271481d
Author: q00251598 <[email protected]>
Date:   2015-02-11T09:52:37Z

    change setInputPaths to set

commit 358ba4d8614536b68b938a71f67011911d158c8d
Author: q00251598 <[email protected]>
Date:   2015-02-11T10:17:32Z

    change FileInputFormat.setInputPaths to jobConf.set

commit 0ab9fac1ba9410b1a632c4d63357c3e7437b031b
Author: q00251598 <[email protected]>
Date:   2015-02-11T10:48:52Z

    change FileInputFormat.setInputPaths to jobConf.set

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5741][SQL] Support comma in path in Hiv...

Reply via email to