[GitHub] spark pull request: Master

Luwein Sun, 22 May 2016 09:51:07 -0700

GitHub user Luwein opened a pull request:

    https://github.com/apache/spark/pull/13256


    Master

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13256.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13256
    
----
commit 7f5922aa4a810a0b9cc783956a8b7aa3dad86a0a
Author: Andrew Or <[email protected]>
Date:   2016-05-05T23:51:06Z

    [HOTFIX] Fix MLUtils compile

commit 157a49aa410dc1870cd171148d317084c5a90d23
Author: Sun Rui <[email protected]>
Date:   2016-05-06T01:49:43Z

    [SPARK-11395][SPARKR] Support over and window specification in SparkR.
    
    This PR:
    1. Implement WindowSpec S4 class.
    2. Implement Window.partitionBy() and Window.orderBy() as utility functions 
to create WindowSpec objects.
    3. Implement over() of Column class.
    
    Author: Sun Rui <[email protected]>
    Author: Sun Rui <[email protected]>
    
    Closes #10094 from sun-rui/SPARK-11395.

commit a03c5e68abd8c066c97ebd388883070d59dce1a7
Author: Luciano Resende <[email protected]>
Date:   2016-05-06T11:25:45Z

    [SPARK-14738][BUILD] Separate docker integration tests from main build
    
    ## What changes were proposed in this pull request?
    
    Create a maven profile for executing the docker integration tests using 
maven
    Remove docker integration tests from main sbt build
    Update documentation on how to run docker integration tests from sbt
    
    ## How was this patch tested?
    
    Manual test of the docker integration tests as in :
    mvn -Pdocker-integration-tests -pl :spark-docker-integration-tests_2.11 
compile test
    
    ## Other comments
    
    Note that the the DB2 Docker Tests are still disabled as there is a kernel 
version issue on the AMPLab Jenkins slaves and we would need to get them on the 
right level before enabling those tests. They do run ok locally with the 
updates from PR #12348
    
    Author: Luciano Resende <[email protected]>
    
    Closes #12508 from lresende/docker.

commit fa928ff9a3c1de5d5aff9d14e6bc1bd03fcca087
Author: hyukjinkwon <[email protected]>
Date:   2016-05-06T17:46:45Z

    [SPARK-14962][SQL] Do not push down isnotnull/isnull on unsuportted types 
in ORC
    
    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-14962
    
    ORC filters were being pushed down for all types for both `IsNull` and 
`IsNotNull`.
    
    This is apparently OK because both `IsNull` and `IsNotNull` do not take a 
type as an argument (Hive 1.2.x) during building filters (`SearchArgument`) in 
Spark-side but they do not filter correctly because stored statistics always 
produces `null` for not supported types (eg `ArrayType`) in ORC-side. So, it is 
always `true` for `IsNull` which ends up with always `false` for `IsNotNull`. 
(Please see 
[RecordReaderImpl.java#L296-L318](https://github.com/apache/hive/blob/branch-1.2/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java#L296-L318)
  and 
[RecordReaderImpl.java#L359-L365](https://github.com/apache/hive/blob/branch-1.2/ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java#L359-L365)
 in Hive 1.2)
    
    This looks prevented in Hive 1.3.x >= by forcing to give a type 
([`PredicateLeaf.Type`](https://github.com/apache/hive/blob/e085b7e9bd059d91aaf013df0db4d71dca90ec6f/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/PredicateLeaf.java#L50-L56))
 when building a filter 
([`SearchArgument`](https://github.com/apache/hive/blob/26b5c7b56a4f28ce3eabc0207566cce46b29b558/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgument.java#L260))
 but Hive 1.2.x seems not doing this.
    
    This PR prevents ORC filter creation for `IsNull` and `IsNotNull` on 
unsupported types. `OrcFilters` resembles `ParquetFilters`.
    
    ## How was this patch tested?
    
    Unittests in `OrcQuerySuite` and `OrcFilterSuite` and `sbt scalastyle`.
    
    Author: hyukjinkwon <[email protected]>
    Author: Hyukjin Kwon <[email protected]>
    
    Closes #12777 from HyukjinKwon/SPARK-14962.

commit 76ad04d9a0a7d4dfb762318d9c7be0d7720f4e1a
Author: Zheng RuiFeng <[email protected]>
Date:   2016-05-06T17:47:13Z

    [SPARK-14512] [DOC] Add python example for QuantileDiscretizer
    
    ## What changes were proposed in this pull request?
    Add the missing python example for QuantileDiscretizer
    
    ## How was this patch tested?
    manual tests
    
    Author: Zheng RuiFeng <[email protected]>
    
    Closes #12281 from zhengruifeng/discret_pe.

commit 5c8fad7b9bfd6677111a8e27e2574f82b04ec479
Author: gatorsmile <[email protected]>
Date:   2016-05-06T18:43:07Z

    [SPARK-15108][SQL] Describe Permanent UDTF
    
    #### What changes were proposed in this pull request?
    When Describe a UDTF, the command returns a wrong result. The command is 
unable to find the function, which has been created and cataloged in the 
catalog but not in the functionRegistry.
    
    This PR is to correct it. If the function is not in the functionRegistry, 
we will check the catalog for collecting the information of the UDTF function.
    
    #### How was this patch tested?
    Added test cases to verify the results
    
    Author: gatorsmile <[email protected]>
    
    Closes #12885 from gatorsmile/showFunction.

commit e20cd9f4ce977739ce80a2c39f8ebae5e53f72f6
Author: Burak KÃ¶se <[email protected]>
Date:   2016-05-06T20:58:12Z

    [SPARK-14050][ML] Add multiple languages support and additional methods for 
Stop Words Remover
    
    ## What changes were proposed in this pull request?
    
    This PR continues the work from #11871 with the following changes:
    * load English stopwords as default
    * covert stopwords to list in Python
    * update some tests and doc
    
    ## How was this patch tested?
    
    Unit tests.
    
    Closes #11871
    
    cc: burakkose srowen
    
    Author: Burak KÃ¶se <[email protected]>
    Author: Xiangrui Meng <[email protected]>
    Author: Burak KOSE <[email protected]>
    
    Closes #12843 from mengxr/SPARK-14050.

commit f7b7ef41662d7d02fc4f834f3c6c4ee8802e949c
Author: Tathagata Das <[email protected]>
Date:   2016-05-06T22:04:16Z

    [SPARK-14997][SQL] Fixed FileCatalog to return correct set of files when 
there is no partitioning scheme in the given paths
    
    ## What changes were proposed in this pull request?
    Lets says there are json files in the following directories structure
    ```
    xyz/file0.json
    xyz/subdir1/file1.json
    xyz/subdir2/file2.json
    xyz/subdir1/subsubdir1/file3.json
    ```
    `sqlContext.read.json("xyz")` should read only file0.json according to 
behavior in Spark 1.6.1. However in current master, all the 4 files are read.
    
    The fix is to make FileCatalog return only the children files of the given 
path if there is not partitioning detected (instead of all the recursive list 
of files).
    
    Closes #12774
    
    ## How was this patch tested?
    
    unit tests
    
    Author: Tathagata Das <[email protected]>
    
    Closes #12856 from tdas/SPARK-14997.

commit cc95f1ed5fdf2566bcefe8d10116eee544cf9184
Author: Thomas Graves <[email protected]>
Date:   2016-05-07T02:31:26Z

    [SPARK-1239] Improve fetching of map output statuses
    
    The main issue we are trying to solve is the memory bloat of the Driver 
when tasks request the map output statuses.  This means with a large number of 
tasks you either need a huge amount of memory on Driver or you have to 
repartition to smaller number.  This makes it really difficult to run over say 
50000 tasks.
    
    The main issues that cause the memory bloat are:
    1) no flow control on sending the map output status responses.  We 
serialize the map status output  and then hand off to netty to send.  netty is 
sending asynchronously and it can't send them fast enough to keep up with 
incoming requests so we end up with lots of copies of the serialized map output 
statuses sitting there and this causes huge bloat when you have 10's of 
thousands of tasks and map output status is in the 10's of MB.
    2) When initial reduce tasks are started up, they all request the map 
output statuses from the Driver. These requests are handled by multiple threads 
in parallel so even though we check to see if we have a cached version, 
initially when we don't have a cached version yet, many of initial requests can 
all end up serializing the exact same map output statuses.
    
    This patch does a couple of things:
    - When the map output status size is over a threshold (default 512K) then 
it uses broadcast to send the map statuses.  This means we no longer serialize 
a large map output status and thus we don't have issues with memory bloat.  the 
messages sizes are now in the 300-400 byte range and the map status output are 
broadcast. If its under the threadshold it sends it as before, the message 
contains the DIRECT indicator now.
    - synchronize the incoming requests to allow one thread to cache the 
serialized output and broadcast the map output status  that can then be used by 
everyone else.  This ensures we don't create multiple broadcast variables when 
we don't need to.  To ensure this happens I added a second thread pool which 
the Dispatcher hands the requests to so that those threads can block without 
blocking the main dispatcher threads (which would cause things like heartbeats 
and such not to come through)
    
    Note that some of design and code was contributed by mridulm
    
    ## How was this patch tested?
    
    Unit tests and a lot of manually testing.
    Ran with akka and netty rpc. Ran with both dynamic allocation on and off.
    
    one of the large jobs I used to test this was a join of 15TB of data.  it 
had 200,000 map tasks, and  20,000 reduce tasks. Executors ranged from 200 to 
2000.  This job ran successfully with 5GB of memory on the driver with these 
changes. Without these changes I was using 20GB and only had 500 reduce tasks.  
The job has 50mb of serialized map output statuses and took roughly the same 
amount of time for the executors to get the map output statuses as before.
    
    Ran a variety of other jobs, from large wordcounts to small ones not using 
broadcasts.
    
    Author: Thomas Graves <[email protected]>
    
    Closes #12113 from tgravescs/SPARK-1239.

commit a21a3bbe6931e162c53a61daff1ef428fb802b8a
Author: Sandeep Singh <[email protected]>
Date:   2016-05-07T03:10:14Z

    [SPARK-15087][MINOR][DOC] Follow Up: Fix the Comments
    
    ## What changes were proposed in this pull request?
    Remove the Comment, since it not longer applies. see the discussion 
here(https://github.com/apache/spark/pull/12865#discussion-diff-61946906)
    
    Author: Sandeep Singh <[email protected]>
    
    Closes #12953 from techaddict/SPARK-15087-FOLLOW-UP.

commit 607a27a0d149be049091bcf274a73b8476b36c90
Author: Kevin Yu <[email protected]>
Date:   2016-05-07T03:13:48Z

    [SPARK-15051][SQL] Create a TypedColumn alias
    
    ## What changes were proposed in this pull request?
    
    Currently when we create an alias against a TypedColumn from user-defined 
Aggregator(for example: agg(aggSum.toColumn as "a")), spark is using the alias' 
function from Column( as), the alias function will return a column contains a 
TypedAggregateExpression, which is unresolved because the inputDeserializer is 
not defined. Later the aggregator function (agg) will inject the 
inputDeserializer back to the TypedAggregateExpression, but only if the 
aggregate columns are TypedColumn, in the above case, the 
TypedAggregateExpression will remain unresolved because it is under column and 
caused the
    problem reported by this jira 
[15051](https://issues.apache.org/jira/browse/SPARK-15051?jql=project%20%3D%20SPARK).
    
    This PR propose to create an alias function for TypedColumn,  it will 
return a TypedColumn. It is using the similar code path  as Column's alia 
function.
    
    For the spark build in aggregate function, like max, it is working with 
alias, for example
    
    val df1 = Seq(1 -> "a", 2 -> "b", 3 -> "b").toDF("i", "j")
    checkAnswer(df1.agg(max("j") as "b"), Row(3) :: Nil)
    
    Thanks for comments.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    
    Add test cases in DatasetAggregatorSuite.scala
    run the sql related queries against this patch.
    
    Author: Kevin Yu <[email protected]>
    
    Closes #12893 from kevinyu98/spark-15051.

commit df89f1d43d4eaa1dd8a439a8e48bca16b67d5b48
Author: Herman van Hovell <[email protected]>
Date:   2016-05-07T04:06:03Z

    [SPARK-15122] [SQL] Fix TPC-DS 41 - Normalize predicates before pulling 
them out
    
    ## What changes were proposed in this pull request?
    The official TPC-DS 41 query currently fails because it contains a scalar 
subquery with a disjunctive correlated predicate (the correlated predicates 
were nested in ORs). This makes the `Analyzer` pull out the entire predicate 
which is wrong and causes the following (correct) analysis exception: `The 
correlated scalar subquery can only contain equality predicates`
    
    This PR fixes this by first simplifing (or normalizing) the correlated 
predicates before pulling them out of the subquery.
    
    ## How was this patch tested?
    Manual testing on TPC-DS 41, and added a test to SubquerySuite.
    
    Author: Herman van Hovell <[email protected]>
    
    Closes #12954 from hvanhovell/SPARK-15122.

commit b0cafdb6ccff9add89dc31c45adf87c8fa906aac
Author: Nick Pentreath <[email protected]>
Date:   2016-05-07T08:57:40Z

    [MINOR][ML][PYSPARK] ALS example cleanup
    
    Cleans up ALS examples by removing unnecessary casts to double for `rating` 
and `prediction` columns, since `RegressionEvaluator` now supports `Double` & 
`Float` input types.
    
    ## How was this patch tested?
    
    Manual compile and run with `run-example ml.ALSExample` and `spark-submit 
examples/src/main/python/ml/als_example.py`.
    
    Author: Nick Pentreath <[email protected]>
    
    Closes #12892 from MLnick/als-examples-cleanup.

commit 5d188a6970ef97d11656ab39255109fefc42203d
Author: Bryan Cutler <[email protected]>
Date:   2016-05-07T09:20:38Z

    [DOC][MINOR] Fixed minor errors in feature.ml user guide doc
    
    ## What changes were proposed in this pull request?
    Fixed some minor errors found when reviewing feature.ml user guide
    
    ## How was this patch tested?
    built docs locally
    
    Author: Bryan Cutler <[email protected]>
    
    Closes #12940 from BryanCutler/feature.ml-doc_fixes-DOCS-MINOR.

commit 6e268b9ee32eb93d1a757c6073dd69e86b9df617
Author: Sandeep Singh <[email protected]>
Date:   2016-05-07T19:36:43Z

    [SPARK-15178][CORE] Remove LazyFileRegion instead use netty's 
DefaultFileRegion
    
    ## What changes were proposed in this pull request?
    Remove LazyFileRegion instead use netty's DefaultFileRegion, since It was 
created so that we didn't create a file descriptor before having to send the 
file.
    
    ## How was this patch tested?
    Existing tests
    
    Author: Sandeep Singh <[email protected]>
    
    Closes #12977 from techaddict/SPARK-15178.

commit 454ba4d67e782369627dfe60261e6648a27b91a0
Author: Sun Rui <[email protected]>
Date:   2016-05-08T07:17:36Z

    [SPARK-12479][SPARKR] sparkR collect on GroupedData throws R error "missing 
value where TRUE/FALSE needed"
    
    ## What changes were proposed in this pull request?
    
    This PR is a workaround for NA handling in hash code computation.
    
    This PR is on behalf of paulomagalhaes whose PR is 
https://github.com/apache/spark/pull/10436
    
    ## How was this patch tested?
    SparkR unit tests.
    
    Author: Sun Rui <[email protected]>
    Author: ray <[email protected]>
    
    Closes #12976 from sun-rui/SPARK-12479.

commit e9131ec277731de4a73026f2fb4559182c236f84
Author: gatorsmile <[email protected]>
Date:   2016-05-09T04:40:30Z

    [SPARK-15185][SQL] InMemoryCatalog: Silent Removal of an Existent 
Table/Function/Partitions by Rename
    
    #### What changes were proposed in this pull request?
    So far, in the implementation of InMemoryCatalog, we do not check if the 
new/destination table/function/partition exists or not. Thus, we just silently 
remove the existent table/function/partition.
    
    This PR is to detect them and issue an appropriate exception.
    
    #### How was this patch tested?
    Added the related test cases. They also verify if HiveExternalCatalog also 
detects these errors.
    
    Author: gatorsmile <[email protected]>
    
    Closes #12960 from gatorsmile/renameInMemoryCatalog.

commit a59ab594cac5189ecf4158fc0ada200eaa874158
Author: gatorsmile <[email protected]>
Date:   2016-05-09T05:05:18Z

    [SPARK-15184][SQL] Fix Silent Removal of An Existent Temp Table by Rename 
Table
    
    #### What changes were proposed in this pull request?
    Currently, if we rename a temp table `Tab1` to another existent temp table 
`Tab2`. `Tab2` will be silently removed. This PR is to detect it and issue an 
exception message.
    
    In addition, this PR also detects another issue in the rename table 
command. When the destination table identifier does have database name, we 
should not ignore them. That might mean users could rename a regular table.
    
    #### How was this patch tested?
    Added two related test cases
    
    Author: gatorsmile <[email protected]>
    
    Closes #12959 from gatorsmile/rewriteTable.

commit 635ef407e11dec41ae9bc428935fb8fdaa482f7e
Author: Liang-Chi Hsieh <[email protected]>
Date:   2016-05-09T07:05:06Z

    [SPARK-15211][SQL] Select features column from LibSVMRelation causes failure
    
    ## What changes were proposed in this pull request?
    
    We need to use `requiredSchema` in `LibSVMRelation` to project the fetch 
required columns when loading data from this data source. Otherwise, when users 
try to select `features` column, it will cause failure.
    
    ## How was this patch tested?
    `LibSVMRelationSuite`.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #12986 from viirya/fix-libsvmrelation.

commit 68abc1b4e9afbb6c2a87689221a46b835dded102
Author: Yuhao Yang <[email protected]>
Date:   2016-05-09T08:08:54Z

    [SPARK-14814][MLLIB] API: Java compatibility, docs
    
    ## What changes were proposed in this pull request?
    jira: https://issues.apache.org/jira/browse/SPARK-14814
    fix a java compatibility function in mllib DecisionTreeModel. As synced in 
jira, other compatibility issues don't need fixes.
    
    ## How was this patch tested?
    
    existing ut
    
    Author: Yuhao Yang <[email protected]>
    
    Closes #12971 from hhbyyh/javacompatibility.

commit 12fe2ecd1998a8b01667aa1ab910a604b2aec4c8
Author: Holden Karau <[email protected]>
Date:   2016-05-09T08:11:17Z

    [SPARK-15136][PYSPARK][DOC] Fix links to sphinx style and add a default 
param doc note
    
    ## What changes were proposed in this pull request?
    
    PyDoc links in ml are in non-standard format. Switch to standard sphinx 
link format for better formatted documentation. Also add a note about default 
value in one place. Copy some extended docs from scala for GBT
    
    ## How was this patch tested?
    
    Built docs locally.
    
    Author: Holden Karau <[email protected]>
    
    Closes #12918 from holdenk/SPARK-15137-linkify-pyspark-ml-classification.

commit a78fbfa619a13421b294328b80c82510ca7efed0
Author: dding3 <[email protected]>
Date:   2016-05-09T08:43:07Z

    [SPARK-15172][ML] Explicitly tell user initial coefficients is ignored when 
size mismatch happened in LogisticRegression
    
    ## What changes were proposed in this pull request?
    Explicitly tell user initial coefficients is ignored if its size doesn't 
match expected size in LogisticRegression
    
    ## How was this patch tested?
    local build
    
    Author: dding3 <[email protected]>
    
    Closes #12948 from dding3/master.

commit 16a503cf0af3e7c703d56a1a730e4f3a534f6b3c
Author: mwws <[email protected]>
Date:   2016-05-09T08:44:37Z

    [MINOR][TEST][STREAMING] make "testDir" able to be claened after test.
    
    It's a minor bug in test case. `val testDir = null` will keep be `null` as 
it's immutable, so in finally block, nothing will be cleaned. Another `testDir` 
variable created in try block is only visible in try block.
    
    ## How was this patch tested?
    Run existing test case and passed.
    
    Author: mwws <[email protected]>
    
    Closes #12999 from mwws/SPARK_MINOR.

commit 652bbb1bf62722b08a062c7a2bf72019f85e179e
Author: Ryan Blue <[email protected]>
Date:   2016-05-09T09:01:23Z

    [SPARK-14459][SQL] Detect relation partitioning and adjust the logical plan
    
    ## What changes were proposed in this pull request?
    
    This detects a relation's partitioning and adds checks to the analyzer.
    If an InsertIntoTable node has no partitioning, it is replaced by the
    relation's partition scheme and input columns are correctly adjusted,
    placing the partition columns at the end in partition order. If an
    InsertIntoTable node has partitioning, it is checked against the table's
    reported partitions.
    
    These changes required adding a PartitionedRelation trait to the catalog
    interface because Hive's MetastoreRelation doesn't extend
    CatalogRelation.
    
    This commit also includes a fix to InsertIntoTable's resolved logic,
    which now detects that all expected columns are present, including
    dynamic partition columns. Previously, the number of expected columns
    was not checked and resolved was true if there were missing columns.
    
    ## How was this patch tested?
    
    This adds new tests to the InsertIntoTableSuite that are fixed by this PR.
    
    Author: Ryan Blue <[email protected]>
    
    Closes #12239 from rdblue/SPARK-14459-detect-hive-partitioning.

commit ee3b1715620d48b8d22d086ddeef49ad7ff249d2
Author: Yanbo Liang <[email protected]>
Date:   2016-05-09T16:58:36Z

    [MINOR] [SPARKR] Update data-manipulation.R to use native csv reader
    
    ## What changes were proposed in this pull request?
    * Since Spark has supported native csv reader, it does not necessary to use 
the third party ```spark-csv``` in 
```examples/src/main/r/data-manipulation.R```. Meanwhile, remove all 
```spark-csv``` usage in SparkR.
    * Running R applications through ```sparkR``` is not supported as of Spark 
2.0, so we change to use ```./bin/spark-submit``` to run the example.
    
    ## How was this patch tested?
    Offline test.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #13005 from yanboliang/r-df-examples.

commit beb16ec556c3b7a23fe0ac7bda66f71abd5c61e9
Author: Wenchen Fan <[email protected]>
Date:   2016-05-09T17:47:45Z

    [SPARK-15093][SQL] create/delete/rename directory for InMemoryCatalog 
operations if needed
    
    ## What changes were proposed in this pull request?
    
    following operations have file system operation now:
    
    1. CREATE DATABASE: create a dir
    2. DROP DATABASE: delete the dir
    3. CREATE TABLE: create a dir
    4. DROP TABLE: delete the dir
    5. RENAME TABLE: rename the dir
    6. CREATE PARTITIONS: create a dir
    7. RENAME PARTITIONS: rename the dir
    8. DROP PARTITIONS: drop the dir
    
    ## How was this patch tested?
    
    new tests in `ExternalCatalogSuite`
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #12871 from cloud-fan/catalog.

commit b1e01fd519d4d1bc6d9bd2270f9504d757dbd0d2
Author: gatorsmile <[email protected]>
Date:   2016-05-09T17:49:54Z

    [SPARK-15199][SQL] Disallow Dropping Build-in Functions
    
    #### What changes were proposed in this pull request?
    As Hive and the major RDBMS behave, the built-in functions are not allowed 
to drop. In the current implementation, users can drop the built-in functions. 
However, after dropping the built-in functions, users are unable to add them 
back.
    
    #### How was this patch tested?
    Added a test case.
    
    Author: gatorsmile <[email protected]>
    
    Closes #12975 from gatorsmile/dropBuildInFunction.

commit 671b382a80bc789d50f609783c7ba88fafc0c251
Author: Cheng Lian <[email protected]>
Date:   2016-05-09T17:53:32Z

    [SPARK-14127][SQL] Makes 'DESC [EXTENDED|FORMATTED] <table>' support data 
source tables
    
    ## What changes were proposed in this pull request?
    
    This is a follow-up of PR #12844. It makes the newly updated 
`DescribeTableCommand` to support data sources tables.
    
    ## How was this patch tested?
    
    A test case is added to check `DESC [EXTENDED | FORMATTED] <table>` output.
    
    Author: Cheng Lian <[email protected]>
    
    Closes #12934 from liancheng/spark-14127-desc-table-follow-up.

commit 2992a215c9cd95a2be986b254f4e27d18e248b7d
Author: hyukjinkwon <[email protected]>
Date:   2016-05-09T17:54:56Z

    [MINOR][DOCS] Remove remaining sqlContext in documentation at examples
    
    This PR removes `sqlContext` in examples. Actual usage was all replaced in 
https://github.com/apache/spark/pull/12809 but there are some in comments.
    
    Manual style checking.
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #13006 from HyukjinKwon/minor-docs.

commit 65b4ab281efd170c9fad7152629f68eaef7f7088
Author: Philipp Hoffmann <[email protected]>
Date:   2016-05-09T18:02:13Z

    [SPARK-15223][DOCS] fix wrongly named config reference
    
    ## What changes were proposed in this pull request?
    
    The configuration setting `spark.executor.logs.rolling.size.maxBytes` was 
changed to `spark.executor.logs.rolling.maxSize` in 1.4 or so.
    
    This commit fixes a remaining reference to the old name in the 
documentation.
    
    Also the description for `spark.executor.logs.rolling.maxSize` was edited 
to clearly state that the unit for the size is bytes.
    
    ## How was this patch tested?
    
    no tests
    
    Author: Philipp Hoffmann <[email protected]>
    
    Closes #13001 from philipphoffmann/patch-3.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Master

Reply via email to