GitHub user tdas opened a pull request:

    https://github.com/apache/spark/pull/13727

    [SPARK-15982][SPARK-16009] Harmonize the behavior of 
DataFrameReader.text/csv/json/parquet/orc

    ## What changes were proposed in this pull request?
    
    Issues with current reader behavior.
    - `text()` without args returns an empty DF with no columns -> 
inconsistent, its expected that text will always return a DF with `value` 
string field,
    - `textFile()` without args fails with exception because of the above 
reason, it expected the DF returned by `text()` to have a `value` field.
    - `orc()` does not have var args, inconsistent with others
    - `json(single-arg)` was removed, but that caused source compatibility 
issues - SPARK-16009
    
    The solution I am implementing is to do the following. 
    - For each format, there will be a single argument method, and a vararg 
method. For json, parquet, csv, text, this means adding json(string), etc.. For 
orc, this means adding orc(varargs).
    - Remove the special handling of text(), csv(), etc. that returns empty 
dataframe with no fields. Rather pass on the empty sequence of paths to the 
datasource, and let each datasource handle it right. For e.g, text data source, 
should return empty DF with schema (value: string)
    
    ## How was this patch tested?
    Added new unit tests for Scala and Java tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tdas/spark SPARK-15982

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13727.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13727
    
----
commit dcc4655225b27a4bc544ce38580949fb3fe60121
Author: Tathagata Das <[email protected]>
Date:   2016-06-17T03:37:44Z

    Fixed and added tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to