[GitHub] spark pull request #16502: Branch 2.1

bupt2012 Sun, 08 Jan 2017 07:25:03 -0800

GitHub user bupt2012 opened a pull request:

    https://github.com/apache/spark/pull/16502


    Branch 2.1

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-2.1

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16502.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16502
    
----
commit 3be2d1e0b52bf15ac28a9f96b03ae048e680b035
Author: Shixiong Zhu <[email protected]>
Date:   2016-11-23T00:49:15Z

    [SPARK-18530][SS][KAFKA] Change Kafka timestamp column type to TimestampType
    
    ## What changes were proposed in this pull request?
    
    Changed Kafka timestamp column type to TimestampType.
    
    ## How was this patch tested?
    
    `test("Kafka column types")`.
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #15969 from zsxwing/SPARK-18530.
    
    (cherry picked from commit d0212eb0f22473ee5482fe98dafc24e16ffcfc63)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit fc5fee83e363bc6df22459a9b1ba2ba11bfdfa20
Author: Yanbo Liang <[email protected]>
Date:   2016-11-23T03:17:48Z

    [SPARK-18501][ML][SPARKR] Fix spark.glm errors when fitting on collinear 
data
    
    ## What changes were proposed in this pull request?
    * Fix SparkR ```spark.glm``` errors when fitting on collinear data, since 
```standard error of coefficients, t value and p value``` are not available in 
this condition.
    * Scala/Python GLM summary should throw exception if users get ```standard 
error of coefficients, t value and p value``` but the underlying WLS was solved 
by local "l-bfgs".
    
    ## How was this patch tested?
    Add unit tests.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #15930 from yanboliang/spark-18501.
    
    (cherry picked from commit 982b82e32e0fc7d30c5d557944a79eb3e6d2da59)
    Signed-off-by: Yanbo Liang <[email protected]>

commit fabb5aeaf62e5c18d5d489e769e998e52379ba20
Author: hyukjinkwon <[email protected]>
Date:   2016-11-23T06:25:27Z

    [SPARK-18179][SQL] Throws analysis exception with a proper message for 
unsupported argument types in reflect/java_method function
    
    ## What changes were proposed in this pull request?
    
    This PR proposes throwing an `AnalysisException` with a proper message 
rather than `NoSuchElementException` with the message ` key not found: 
TimestampType` when unsupported types are given to `reflect` and `java_method` 
functions.
    
    ```scala
    spark.range(1).selectExpr("reflect('java.lang.String', 'valueOf', 
cast('1990-01-01' as timestamp))")
    ```
    
    produces
    
    **Before**
    
    ```
    java.util.NoSuchElementException: key not found: TimestampType
      at scala.collection.MapLike$class.default(MapLike.scala:228)
      at scala.collection.AbstractMap.default(Map.scala:59)
      at scala.collection.MapLike$class.apply(MapLike.scala:141)
      at scala.collection.AbstractMap.apply(Map.scala:59)
      at 
org.apache.spark.sql.catalyst.expressions.CallMethodViaReflection$$anonfun$findMethod$1$$anonfun$apply$1.apply(CallMethodViaReflection.scala:159)
    ...
    ```
    
    **After**
    
    ```
    cannot resolve 'reflect('java.lang.String', 'valueOf', CAST('1990-01-01' AS 
TIMESTAMP))' due to data type mismatch: arguments from the third require 
boolean, byte, short, integer, long, float, double or string expressions; line 
1 pos 0;
    'Project [unresolvedalias(reflect(java.lang.String, valueOf, 
cast(1990-01-01 as timestamp)), Some(<function1>))]
    +- Range (0, 1, step=1, splits=Some(2))
    ...
    ```
    
    Added message is,
    
    ```
    arguments from the third require boolean, byte, short, integer, long, 
float, double or string expressions
    ```
    
    ## How was this patch tested?
    
    Tests added in `CallMethodViaReflection`.
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #15694 from HyukjinKwon/SPARK-18179.
    
    (cherry picked from commit 2559fb4b40c9f42f7b3ed2b77de14461f68b6fa5)
    Signed-off-by: Reynold Xin <[email protected]>

commit 5f198d200d47703f6ab770e592c0a1d9f8d7b0dc
Author: Sean Owen <[email protected]>
Date:   2016-11-23T11:25:47Z

    [SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web site
    
    ## What changes were proposed in this pull request?
    
    Updates links to the wiki to links to the new location of content on 
spark.apache.org.
    
    ## How was this patch tested?
    
    Doc builds
    
    Author: Sean Owen <[email protected]>
    
    Closes #15967 from srowen/SPARK-18073.1.
    
    (cherry picked from commit 7e0cd1d9b168286386f15e9b55988733476ae2bb)
    Signed-off-by: Sean Owen <[email protected]>

commit ebeb051405b84cb4abafbb6929ddcfadf59672db
Author: Wenchen Fan <[email protected]>
Date:   2016-11-23T12:15:19Z

    [SPARK-18053][SQL] compare unsafe and safe complex-type values correctly
    
    ## What changes were proposed in this pull request?
    
    In Spark SQL, some expression may output safe format values, e.g. 
`CreateArray`, `CreateStruct`, `Cast`, etc. When we compare 2 values, we should 
be able to compare safe and unsafe formats.
    
    The `GreaterThan`, `LessThan`, etc. in Spark SQL already handles it, but 
the `EqualTo` doesn't. This PR fixes it.
    
    ## How was this patch tested?
    
    new unit test and regression test
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #15929 from cloud-fan/type-aware.
    
    (cherry picked from commit 84284e8c82542d80dad94e458a0c0210bf803db3)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 539c193af7e3e08e9b48df15e94eafcc3532105c
Author: Eric Liang <[email protected]>
Date:   2016-11-23T12:14:08Z

    [SPARK-18545][SQL] Verify number of hive client RPCs in 
PartitionedTablePerfStatsSuite
    
    ## What changes were proposed in this pull request?
    
    This would help catch accidental O(n) calls to the hive client as in 
https://issues.apache.org/jira/browse/SPARK-18507
    
    ## How was this patch tested?
    
    Checked that the test fails before 
https://issues.apache.org/jira/browse/SPARK-18507 was patched. cc cloud-fan
    
    Author: Eric Liang <[email protected]>
    
    Closes #15985 from ericl/spark-18545.
    
    (cherry picked from commit 85235ed6c600270e3fa434738bd50dce3564440a)
    Signed-off-by: Wenchen Fan <[email protected]>

commit e11d7c6874debfbbe44be4a2b0983d6b6763fff8
Author: Reynold Xin <[email protected]>
Date:   2016-11-23T12:22:26Z

    [SPARK-18557] Downgrade confusing memory leak warning message
    
    ## What changes were proposed in this pull request?
    TaskMemoryManager has a memory leak detector that gets called at task 
completion callback and checks whether any memory has not been released. If 
they are not released by the time the callback is invoked, TaskMemoryManager 
releases them.
    
    The current error message says something like the following:
    ```
    WARN  [Executor task launch worker-0]
    org.apache.spark.memory.TaskMemoryManager - leak 16.3 MB memory from
    org.apache.spark.unsafe.map.BytesToBytesMap33fb6a15
    In practice, there are multiple reasons why these can be triggered in the 
normal code path (e.g. limit, or task failures), and the fact that these 
messages are log means the "leak" is fixed by TaskMemoryManager.
    ```
    
    To not confuse users, this patch downgrade the message from warning to 
debug level, and avoids using the word "leak" since it is not actually a leak.
    
    ## How was this patch tested?
    N/A - this is a simple logging improvement.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #15989 from rxin/SPARK-18557.
    
    (cherry picked from commit 9785ed40d7fe4e1fcd440e55706519c6e5f8d6b1)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 599dac1594ed52934dd483e12d2e39d514793dd9
Author: Reynold Xin <[email protected]>
Date:   2016-11-23T12:48:41Z

    [SPARK-18522][SQL] Explicit contract for column stats serialization
    
    ## What changes were proposed in this pull request?
    The current implementation of column stats uses the base64 encoding of the 
internal UnsafeRow format to persist statistics (in table properties in Hive 
metastore). This is an internal format that is not stable across different 
versions of Spark and should NOT be used for persistence. In addition, it would 
be better if statistics stored in the catalog is human readable.
    
    This pull request introduces the following changes:
    
    1. Created a single ColumnStat class to for all data types. All data types 
track the same set of statistics.
    2. Updated the implementation for stats collection to get rid of the 
dependency on internal data structures (e.g. InternalRow, or storing DateType 
as an int32). For example, previously dates were stored as a single integer, 
but are now stored as java.sql.Date. When we implement the next steps of CBO, 
we can add code to convert those back into internal types again.
    3. Documented clearly what JVM data types are being used to store what data.
    4. Defined a simple Map[String, String] interface for serializing and 
deserializing column stats into/from the catalog.
    5. Rearranged the method/function structure so it is more clear what the 
supported data types are, and also moved how stats are generated into 
ColumnStat class so they are easy to find.
    
    ## How was this patch tested?
    Removed most of the original test cases created for column statistics, and 
added three very simple ones to cover all the cases. The three test cases 
validate:
    1. Roundtrip serialization works.
    2. Behavior when analyzing non-existent column or unsupported data type 
column.
    3. Result for stats collection for all valid data types.
    
    Also moved parser related tests into a parser test suite and added an 
explicit serialization test for the Hive external catalog.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #15959 from rxin/SPARK-18522.
    
    (cherry picked from commit 70ad07a9d20586ae182c4e60ed97bdddbcbceff3)
    Signed-off-by: Wenchen Fan <[email protected]>

commit 835f03f344f2dea2134409d09e06b34feaae09f9
Author: Wenchen Fan <[email protected]>
Date:   2016-11-23T17:54:18Z

    [SPARK-18050][SQL] do not create default database if it already exists
    
    ## What changes were proposed in this pull request?
    
    When we try to create the default database, we ask hive to do nothing if it 
already exists. However, Hive will log an error message instead of doing 
nothing, and the error message is quite annoying and confusing.
    
    In this PR, we only create default database if it doesn't exist.
    
    ## How was this patch tested?
    
    N/A
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #15993 from cloud-fan/default-db.
    
    (cherry picked from commit f129ebcd302168b628f47705f4a7d6b7e7b057b0)
    Signed-off-by: Andrew Or <[email protected]>

commit 15d2cf26427084c0398f8d9303c218f360c52bb7
Author: Burak Yavuz <[email protected]>
Date:   2016-11-23T19:48:59Z

    [SPARK-18510] Fix data corruption from inferred partition column dataTypes
    
    ## What changes were proposed in this pull request?
    
    ### The Issue
    
    If I specify my schema when doing
    ```scala
    spark.read
      .schema(someSchemaWherePartitionColumnsAreStrings)
    ```
    but if the partition inference can infer it as IntegerType or I assume 
LongType or DoubleType (basically fixed size types), then once UnsafeRows are 
generated, your data will be corrupted.
    
    ### Proposed solution
    
    The partition handling code path is kind of a mess. In my fix I'm probably 
adding to the mess, but at least trying to standardize the code path.
    
    The real issue is that a user that uses the `spark.read` code path can 
never clearly specify what the partition columns are. If you try to specify the 
fields in `schema`, we practically ignore what the user provides, and fall back 
to our inferred data types. What happens in the end is data corruption.
    
    My solution tries to fix this by always trying to infer partition columns 
the first time you specify the table. Once we find what the partition columns 
are, we try to find them in the user specified schema and use the dataType 
provided there, or fall back to the smallest common data type.
    
    We will ALWAYS append partition columns to the user's schema, even if they 
didn't ask for it. We will only use the data type they provided if they 
specified it. While this is confusing, this has been the behavior since Spark 
1.6, and I didn't want to change this behavior in the QA period of Spark 2.1. 
We may revisit this decision later.
    
    A side effect of this PR is that we won't need 
https://github.com/apache/spark/pull/15942 if this PR goes in.
    
    ## How was this patch tested?
    
    Regression tests
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #15951 from brkyvz/partition-corruption.
    
    (cherry picked from commit 0d1bf2b6c8ac4d4141d7cef0552c22e586843c57)
    Signed-off-by: Tathagata Das <[email protected]>

commit 27d81d0007f4358480148fa6f3f6b079a5431a81
Author: Shixiong Zhu <[email protected]>
Date:   2016-11-24T00:15:35Z

    [SPARK-18510][SQL] Follow up to address comments in #15951
    
    ## What changes were proposed in this pull request?
    
    This PR addressed the rest comments in #15951.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #15997 from zsxwing/SPARK-18510-follow-up.
    
    (cherry picked from commit 223fa218e1f637f0d62332785a3bee225b65b990)
    Signed-off-by: Tathagata Das <[email protected]>

commit 04ec74f1274a164b2f72b31e2c147e042bf41bd9
Author: Zheng RuiFeng <[email protected]>
Date:   2016-11-24T13:46:05Z

    [SPARK-18520][ML] Add missing setXXXCol methods for BisectingKMeansModel 
and GaussianMixtureModel
    
    ## What changes were proposed in this pull request?
    add `setFeaturesCol` and `setPredictionCol` for BiKModel and GMModel
    add `setProbabilityCol` for GMModel
    ## How was this patch tested?
    existing tests
    
    Author: Zheng RuiFeng <[email protected]>
    
    Closes #15957 from zhengruifeng/bikm_set.
    
    (cherry picked from commit 2dfabec38c24174e7f747c27c7144f7738483ec1)
    Signed-off-by: Yanbo Liang <[email protected]>

commit a7f414561325a7140557562d45fecc5ccbc8d7ff
Author: Nattavut Sutyanyong <[email protected]>
Date:   2016-11-24T20:07:55Z

    [SPARK-18578][SQL] Full outer join in correlated subquery returns incorrect 
results
    
    ## What changes were proposed in this pull request?
    
    - Raise Analysis exception when correlated predicates exist in the 
descendant operators of either operand of a Full outer join in a subquery as 
well as in a FOJ operator itself
    - Raise Analysis exception when correlated predicates exists in a Window 
operator (a side effect inadvertently introduced by SPARK-17348)
    
    ## How was this patch tested?
    
    Run sql/test catalyst/test and new test cases, added to SubquerySuite, 
showing the reported incorrect results.
    
    Author: Nattavut Sutyanyong <[email protected]>
    
    Closes #16005 from nsyca/FOJ-incorrect.1.
    
    (cherry picked from commit a367d5ff005884322fb8bb43a1cfa4d4bf54b31a)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 57dbc682dfafc87076dcaafd29c637cb16ace91a
Author: uncleGen <[email protected]>
Date:   2016-11-25T09:10:17Z

    [SPARK-18575][WEB] Keep same style: adjust the position of driver log links
    
    ## What changes were proposed in this pull request?
    
    NOT BUG, just adjust the position of driver log link to keep the same style 
with other executors log link.
    
    
![image](https://cloud.githubusercontent.com/assets/7402327/20590092/f8bddbb8-b25b-11e6-9aaf-3b5b3073df10.png)
    
    ## How was this patch tested?
     no
    
    Author: uncleGen <[email protected]>
    
    Closes #16001 from uncleGen/SPARK-18575.
    
    (cherry picked from commit f58a8aa20106ea36386db79a8a66f529a8da75c9)
    Signed-off-by: Sean Owen <[email protected]>

commit a49dfa93e160d63e806f35cb6b6953367916f44b
Author: n.fraison <[email protected]>
Date:   2016-11-25T09:45:51Z

    [SPARK-18119][SPARK-CORE] Namenode safemode check is only performed on one 
namenode which can stuck the startup of SparkHistory server
    
    ## What changes were proposed in this pull request?
    
    Instead of using the setSafeMode method that check the first namenode used 
the one which permitts to check only for active NNs
    ## How was this patch tested?
    
    manual tests
    
    Please review 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before 
opening a pull request.
    
    This commit is contributed by Criteo SA under the Apache v2 licence.
    
    Author: n.fraison <[email protected]>
    
    Closes #15648 from ashangit/SPARK-18119.
    
    (cherry picked from commit f42db0c0c1434bfcccaa70d0db55e16c4396af04)
    Signed-off-by: Sean Owen <[email protected]>

commit 69856f28361022812d2af83128d8591694bcef4b
Author: hyukjinkwon <[email protected]>
Date:   2016-11-25T11:27:07Z

    [SPARK-3359][BUILD][DOCS] More changes to resolve javadoc 8 errors that 
will help unidoc/genjavadoc compatibility
    
    ## What changes were proposed in this pull request?
    
    This PR only tries to fix things that looks pretty straightforward and were 
fixed in other previous PRs before.
    
    This PR roughly fixes several things as below:
    
    - Fix unrecognisable class and method links in javadoc by changing it from 
`[[..]]` to `` `...` ``
    
      ```
      [error] 
.../spark/sql/core/target/java/org/apache/spark/sql/streaming/DataStreamReader.java:226:
 error: reference not found
      [error]    * Loads text files and returns a {link DataFrame} whose schema 
starts with a string column named
      ```
    
    - Fix an exception annotation and remove code backticks in `throws` 
annotation
    
      Currently, sbt unidoc with Java 8 complains as below:
    
      ```
      [error] .../java/org/apache/spark/sql/streaming/StreamingQuery.java:72: 
error: unexpected text
      [error]    * throws StreamingQueryException, if <code>this</code> query 
has terminated with an exception.
      ```
    
      `throws` should specify the correct class name from 
`StreamingQueryException,` to `StreamingQueryException` without backticks. (see 
[JDK-8007644](https://bugs.openjdk.java.net/browse/JDK-8007644)).
    
    - Fix `[[http..]]` to `<a href="http..."></a>`.
    
      ```diff
      -   * 
[[https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https
 Oracle
      -   * blog page]].
      +   * <a 
href="https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https";>
      +   * Oracle blog page</a>.
      ```
    
       `[[http...]]` link markdown in scaladoc is unrecognisable in javadoc.
    
    - It seems class can't have `return` annotation. So, two cases of this were 
removed.
    
      ```
      [error] 
.../java/org/apache/spark/mllib/regression/IsotonicRegression.java:27: error: 
invalid use of return
      [error]    * return New instance of IsotonicRegression.
      ```
    
    - Fix < to `&lt;` and > to `&gt;` according to HTML rules.
    
    - Fix `</p>` complaint
    
    - Exclude unrecognisable in javadoc, `constructor`, `todo` and `groupname`.
    
    ## How was this patch tested?
    
    Manually tested by `jekyll build` with Java 7 and 8
    
    ```
    java version "1.7.0_80"
    Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
    Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
    ```
    
    ```
    java version "1.8.0_45"
    Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
    Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
    ```
    
    Note: this does not yet make sbt unidoc suceed with Java 8 yet but it 
reduces the number of errors with Java 8.
    
    Author: hyukjinkwon <[email protected]>
    
    Closes #15999 from HyukjinKwon/SPARK-3359-errors.
    
    (cherry picked from commit 51b1c1551d3a7147403b9e821fcc7c8f57b4824c)
    Signed-off-by: Sean Owen <[email protected]>

commit b5afdaca33996eb8af5927bf6e0cff291ed97c7f
Author: Zhenhua Wang <[email protected]>
Date:   2016-11-25T13:02:48Z

    [SPARK-18559][SQL] Fix HLL++ with small relative error
    
    ## What changes were proposed in this pull request?
    
    In `HyperLogLogPlusPlus`, if the relative error is so small that p >= 19, 
it will cause ArrayIndexOutOfBoundsException in `THRESHOLDS(p-4)` . We should 
check `p` and when p >= 19, regress to the original HLL result and use the 
small range correction they use.
    
    The pr also fixes the upper bound in the log info in `require()`.
    The upper bound is computed by:
    ```
    val relativeSD = 1.106d / Math.pow(Math.E, p * Math.log(2.0d) / 2.0d)
    ```
    which is derived from the equation for computing `p`:
    ```
    val p = 2.0d * Math.log(1.106d / relativeSD) / Math.log(2.0d)
    ```
    
    ## How was this patch tested?
    
    add test cases for:
    1. checking validity of parameter relatvieSD
    2. estimation with smaller relative error so that p >= 19
    
    Author: Zhenhua Wang <[email protected]>
    Author: wangzhenhua <[email protected]>
    
    Closes #15990 from wzhfy/hllppRsd.
    
    (cherry picked from commit 5ecdc7c5c019acc6b1f9c2e6c5b7d35957eadb88)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 906d82c4ca28c5f54d2c3f7fa58006a89472c78b
Author: jiangxingbo <[email protected]>
Date:   2016-11-25T20:44:34Z

    [SPARK-18436][SQL] isin causing SQL syntax error with JDBC
    
    ## What changes were proposed in this pull request?
    
    The expression `in(empty seq)` is invalid in some data source. Since 
`in(empty seq)` is always false, we should generate `in(empty seq)` to false 
literal in optimizer.
    The sql `SELECT * FROM t WHERE a IN ()` throws a `ParseException` which is 
consistent with Hive, don't need to change that behavior.
    
    ## How was this patch tested?
    Add new test case in `OptimizeInSuite`.
    
    Author: jiangxingbo <[email protected]>
    
    Closes #15977 from jiangxb1987/isin-empty.
    
    (cherry picked from commit e2fb9fd365466da888ab8b3a2a0836049a65f8c8)
    Signed-off-by: Herman van Hovell <[email protected]>

commit da66b9742eabb2654b369f634eb05910220a6441
Author: Takuya UESHIN <[email protected]>
Date:   2016-11-26T04:25:29Z

    [SPARK-18583][SQL] Fix nullability of InputFileName.
    
    ## What changes were proposed in this pull request?
    
    The nullability of `InputFileName` should be `false`.
    
    ## How was this patch tested?
    
    Existing tests.
    
    Author: Takuya UESHIN <[email protected]>
    
    Closes #16007 from ueshin/issues/SPARK-18583.
    
    (cherry picked from commit a88329d4553b40c45ebf9eacf229db7839d46769)
    Signed-off-by: Reynold Xin <[email protected]>

commit 830ee1345b491bf10fd089d931ef22e28f98e615
Author: Yanbo Liang <[email protected]>
Date:   2016-11-26T13:28:41Z

    [SPARK-18481][ML] ML 2.1 QA: Remove deprecated methods for ML
    
    ## What changes were proposed in this pull request?
    Remove deprecated methods for ML.
    
    ## How was this patch tested?
    Existing tests.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #15913 from yanboliang/spark-18481.
    
    (cherry picked from commit c4a7eef0ce2d305c5c90a0a9a73b5a32eccfba95)
    Signed-off-by: Yanbo Liang <[email protected]>

commit ff699332c113e21b942f5a62f475ae79ac6c0ee5
Author: Weiqing Yang <[email protected]>
Date:   2016-11-26T15:41:37Z

    [WIP][SQL][DOC] Fix incorrect `code` tag
    
    ## What changes were proposed in this pull request?
    This PR is to fix incorrect `code` tag in `sql-programming-guide.md`
    
    ## How was this patch tested?
    Manually.
    
    Author: Weiqing Yang <[email protected]>
    
    Closes #15941 from weiqingy/fixtag.
    
    (cherry picked from commit f4a98e421e14434fddc3f9f1018a17124d660ef0)
    Signed-off-by: Sean Owen <[email protected]>

commit 9c5495728aac1693ddac96421f8a6181a595e775
Author: Dongjoon Hyun <[email protected]>
Date:   2016-11-26T22:57:48Z

    [SPARK-17251][SQL] Improve `OuterReference` to be `NamedExpression`
    
    ## What changes were proposed in this pull request?
    
    Currently, `OuterReference` is not `NamedExpression`. So, it raises 
'ClassCastException` when it used in projection lists of IN correlated 
subqueries. This PR aims to support that by making `OuterReference` as 
`NamedExpression` to show correct error messages.
    
    ```scala
    scala> sql("CREATE TEMPORARY VIEW t1 AS SELECT * FROM VALUES 1, 2 AS t1(a)")
    scala> sql("CREATE TEMPORARY VIEW t2 AS SELECT * FROM VALUES 1 AS t2(b)")
    scala> sql("SELECT a FROM t1 WHERE a IN (SELECT a FROM t2)").show
    java.lang.ClassCastException: 
org.apache.spark.sql.catalyst.expressions.OuterReference cannot be cast to 
org.apache.spark.sql.catalyst.expressions.NamedExpression
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins test with new test cases.
    
    Author: Dongjoon Hyun <[email protected]>
    
    Closes #16015 from dongjoon-hyun/SPARK-17251-2.
    
    (cherry picked from commit 9c03c564605783d8e94f6795432bb59c33933e52)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 1e8fbefa3b61e2deb3dc7d7d3467e4cec69e54ce
Author: gatorsmile <[email protected]>
Date:   2016-11-28T03:43:24Z

    [SPARK-18594][SQL] Name Validation of Databases/Tables
    
    ### What changes were proposed in this pull request?
    Currently, the name validation checks are limited to table creation. It is 
enfored by Analyzer rule: `PreWriteCheck`.
    
    However, table renaming and database creation have the same issues. It 
makes more sense to do the checks in `SessionCatalog`. This PR is to add it 
into `SessionCatalog`.
    
    ### How was this patch tested?
    Added test cases
    
    Author: gatorsmile <[email protected]>
    
    Closes #16018 from gatorsmile/nameValidate.
    
    (cherry picked from commit 07f32c2283e26e86474ba8c9b50125831009a1ea)
    Signed-off-by: gatorsmile <[email protected]>

commit 6b77889e8aea86322e90f0013d45872f867ba905
Author: Wenchen Fan <[email protected]>
Date:   2016-11-28T05:45:50Z

    [SPARK-18482][SQL] make sure Spark can access the table metadata created by 
older version of spark
    
    ## What changes were proposed in this pull request?
    
    In Spark 2.1, we did a lot of refactor for `HiveExternalCatalog` and 
related code path. These refactor may introduce external behavior changes and 
break backward compatibility. e.g. 
http://issues.apache.org/jira/browse/SPARK-18464
    
    To avoid future compatibility problems of `HiveExternalCatalog`, this PR 
dumps some typical table metadata from tables created by 2.0, and test if they 
can recognized by current version of Spark.
    
    ## How was this patch tested?
    
    test only change
    
    Author: Wenchen Fan <[email protected]>
    
    Closes #16003 from cloud-fan/test.
    
    (cherry picked from commit fc2c13bdf0be5e349539b2ab90087c34b2d3faab)
    Signed-off-by: Reynold Xin <[email protected]>

commit 886f880df42b3b2d64377b2e9a236dda180d610d
Author: Takuya UESHIN <[email protected]>
Date:   2016-11-28T07:30:18Z

    [SPARK-18585][SQL] Use `ev.isNull = "false"` if possible for Janino to have 
a chance to optimize.
    
    ## What changes were proposed in this pull request?
    
    Janino can optimize `true ? a : b` into `a` or `false ? a : b` into `b`, or 
if/else with literal condition, so we should use literal as `ev.isNull` if 
possible.
    
    ## How was this patch tested?
    
    Existing tests.
    
    Author: Takuya UESHIN <[email protected]>
    
    Closes #16008 from ueshin/issues/SPARK-18585.
    
    (cherry picked from commit 87141622ee6b11ac177f68f58d0dc5f8b9a9f948)
    Signed-off-by: Reynold Xin <[email protected]>

commit d6e027e610bdff0123e71925735ecedcf4787b83
Author: Herman van Hovell <[email protected]>
Date:   2016-11-28T10:56:26Z

    [SPARK-18604][SQL] Make sure CollapseWindow returns the attributes in the 
same order.
    
    ## What changes were proposed in this pull request?
    The `CollapseWindow` optimizer rule changes the order of output attributes. 
This modifies the output of the plan, which the optimizer cannot do. This also 
breaks things like `collect()` for which we use a `RowEncoder` that assumes 
that the output attributes of the executed plan are equal to those outputted by 
the logical plan.
    
    ## How was this patch tested?
    I have updated an incorrect test in `CollapseWindowSuite`.
    
    Author: Herman van Hovell <[email protected]>
    
    Closes #16027 from hvanhovell/SPARK-18604.
    
    (cherry picked from commit 454b8049916a0353772a0ea5cfe14b62cbd81df4)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 712bd5abc827c4eaf3f53bfc9155c8535584ca96
Author: Kazuaki Ishizaki <[email protected]>
Date:   2016-11-28T12:18:35Z

    [SPARK-18118][SQL] fix a compilation error due to nested JavaBeans
    
    ## What changes were proposed in this pull request?
    
    This PR avoids a compilation error due to more than 64KB Java byte code 
size. This error occur since generated java code 
`SpecificSafeProjection.apply()` for nested JavaBeans is too big. This PR 
avoids this compilation error by splitting a big code chunk into multiple 
methods by calling `CodegenContext.splitExpression` at 
`InitializeJavaBean.doGenCode`
    An object reference for JavaBean is stored to an instance variable 
`javaBean...`. Then, the instance variable will be referenced in the split 
methods.
    
    Generated code with this PR
    ````
    /* 22098 */   private void apply130_0(InternalRow i) {
    ...
    /* 22125 */     boolean isNull238 = i.isNullAt(2);
    /* 22126 */     InternalRow value238 = isNull238 ? null : (i.getStruct(2, 
3));
    /* 22127 */     boolean isNull236 = false;
    /* 22128 */     test.org.apache.spark.sql.JavaDatasetSuite$Nesting1 
value236 = null;
    /* 22129 */     if (!false && isNull238) {
    /* 22130 */
    /* 22131 */       final test.org.apache.spark.sql.JavaDatasetSuite$Nesting1 
value239 = null;
    /* 22132 */       isNull236 = true;
    /* 22133 */       value236 = value239;
    /* 22134 */     } else {
    /* 22135 */
    /* 22136 */       final test.org.apache.spark.sql.JavaDatasetSuite$Nesting1 
value241 = false ? null : new 
test.org.apache.spark.sql.JavaDatasetSuite$Nesting1();
    /* 22137 */       this.javaBean14 = value241;
    /* 22138 */       if (!false) {
    /* 22139 */         apply25_0(i);
    /* 22140 */         apply25_1(i);
    /* 22141 */         apply25_2(i);
    /* 22142 */       }
    /* 22143 */       isNull236 = false;
    /* 22144 */       value236 = value241;
    /* 22145 */     }
    /* 22146 */     this.javaBean.setField2(value236);
    /* 22147 */
    /* 22148 */   }
    ...
    /* 22928 */   public java.lang.Object apply(java.lang.Object _i) {
    /* 22929 */     InternalRow i = (InternalRow) _i;
    /* 22930 */
    /* 22931 */     final 
test.org.apache.spark.sql.JavaDatasetSuite$NestedComplicatedJavaBean value1 = 
false ? null : new 
test.org.apache.spark.sql.JavaDatasetSuite$NestedComplicatedJavaBean();
    /* 22932 */     this.javaBean = value1;
    /* 22933 */     if (!false) {
    /* 22934 */       apply130_0(i);
    /* 22935 */       apply130_1(i);
    /* 22936 */       apply130_2(i);
    /* 22937 */       apply130_3(i);
    /* 22938 */       apply130_4(i);
    /* 22939 */     }
    /* 22940 */     if (false) {
    /* 22941 */       mutableRow.setNullAt(0);
    /* 22942 */     } else {
    /* 22943 */
    /* 22944 */       mutableRow.update(0, value1);
    /* 22945 */     }
    /* 22946 */
    /* 22947 */     return mutableRow;
    /* 22948 */   }
    ````
    
    ## How was this patch tested?
    
    added a test suite into `JavaDatasetSuite.java`
    
    Author: Kazuaki Ishizaki <[email protected]>
    
    Closes #16032 from kiszk/SPARK-18118.
    
    (cherry picked from commit f075cd9cb7157819df9aec67baee8913c4ed5c53)
    Signed-off-by: Herman van Hovell <[email protected]>

commit e449f7546897c5f29075e6a0913a5a6106bcbb5f
Author: Herman van Hovell <[email protected]>
Date:   2016-11-28T12:41:43Z

    [SPARK-18118][SQL] fix a compilation error due to nested JavaBeans
    
    Remove this reference.
    
    (cherry picked from commit 70dfdcbbf11c9c3174abc111afa2250236e31af2)
    Signed-off-by: Herman van Hovell <[email protected]>

commit a9d4febe900aa3eb9c595089e7283a64a24c8761
Author: gatorsmile <[email protected]>
Date:   2016-11-28T15:04:38Z

    [SPARK-17783][SQL] Hide Credentials in CREATE and DESC FORMATTED/EXTENDED a 
PERSISTENT/TEMP Table for JDBC
    
    ### What changes were proposed in this pull request?
    
    We should never expose the Credentials in the EXPLAIN and DESC 
FORMATTED/EXTENDED command. However, below commands exposed the credentials.
    
    In the related PR: https://github.com/apache/spark/pull/10452
    
    > URL patterns to specify credential seems to be vary between different 
databases.
    
    Thus, we hide the whole `url` value if it contains the keyword `password`. 
We also hide the `password` property.
    
    Before the fix, the command outputs look like:
    
    ``` SQL
    CREATE TABLE tab1
    USING org.apache.spark.sql.jdbc
    OPTIONS (
     url 'jdbc:h2:mem:testdb0;user=testUser;password=testPass',
     dbtable 'TEST.PEOPLE',
     user 'testUser',
     password '$password')
    
    DESC FORMATTED tab1
    DESC EXTENDED tab1
    ```
    
    Before the fix,
    - The output of SQL statement EXPLAIN
    ```
    == Physical Plan ==
    ExecutedCommand
       +- CreateDataSourceTableCommand CatalogTable(
        Table: `tab1`
        Created: Wed Nov 16 23:00:10 PST 2016
        Last Access: Wed Dec 31 15:59:59 PST 1969
        Type: MANAGED
        Provider: org.apache.spark.sql.jdbc
        Storage(Properties: 
[url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, 
user=testUser, password=testPass])), false
    ```
    
    - The output of `DESC FORMATTED`
    ```
    ...
    |Storage Desc Parameters:    |                                              
                    |       |
    |  url                       
|jdbc:h2:mem:testdb0;user=testUser;password=testPass               |       |
    |  dbtable                   |TEST.PEOPLE                                   
                    |       |
    |  user                      |testUser                                      
                    |       |
    |  password                  |testPass                                      
                    |       |
    
+----------------------------+------------------------------------------------------------------+-------+
    ```
    
    - The output of `DESC EXTENDED`
    ```
    |# Detailed Table Information|CatalogTable(
        Table: `default`.`tab1`
        Created: Wed Nov 16 23:00:10 PST 2016
        Last Access: Wed Dec 31 15:59:59 PST 1969
        Type: MANAGED
        Schema: [StructField(NAME,StringType,false), 
StructField(THEID,IntegerType,false)]
        Provider: org.apache.spark.sql.jdbc
        Storage(Location: 
file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: 
[url=jdbc:h2:mem:testdb0;user=testUser;password=testPass, dbtable=TEST.PEOPLE, 
user=testUser, password=testPass]))|       |
    ```
    
    After the fix,
    - The output of SQL statement EXPLAIN
    ```
    == Physical Plan ==
    ExecutedCommand
       +- CreateDataSourceTableCommand CatalogTable(
        Table: `tab1`
        Created: Wed Nov 16 22:43:49 PST 2016
        Last Access: Wed Dec 31 15:59:59 PST 1969
        Type: MANAGED
        Provider: org.apache.spark.sql.jdbc
        Storage(Properties: [url=###, dbtable=TEST.PEOPLE, user=testUser, 
password=###])), false
    ```
    - The output of `DESC FORMATTED`
    ```
    ...
    |Storage Desc Parameters:    |                                              
                    |       |
    |  url                       |###                                           
                    |       |
    |  dbtable                   |TEST.PEOPLE                                   
                    |       |
    |  user                      |testUser                                      
                    |       |
    |  password                  |###                                           
                    |       |
    
+----------------------------+------------------------------------------------------------------+-------+
    ```
    
    - The output of `DESC EXTENDED`
    ```
    |# Detailed Table Information|CatalogTable(
        Table: `default`.`tab1`
        Created: Wed Nov 16 22:43:49 PST 2016
        Last Access: Wed Dec 31 15:59:59 PST 1969
        Type: MANAGED
        Schema: [StructField(NAME,StringType,false), 
StructField(THEID,IntegerType,false)]
        Provider: org.apache.spark.sql.jdbc
        Storage(Location: 
file:/Users/xiaoli/IdeaProjects/sparkDelivery/spark-warehouse/tab1, Properties: 
[url=###, dbtable=TEST.PEOPLE, user=testUser, password=###]))|       |
    ```
    
    ### How was this patch tested?
    
    Added test cases
    
    Author: gatorsmile <[email protected]>
    
    Closes #15358 from gatorsmile/maskCredentials.
    
    (cherry picked from commit 9f273c5173c05017c3009faaf3e10f2f70a842d0)
    Signed-off-by: Herman van Hovell <[email protected]>

commit 32b259faed7e0573c0f465954205cbd3b94ee440
Author: Herman van Hovell <[email protected]>
Date:   2016-11-28T15:10:52Z

    [SPARK-18597][SQL] Do not push-down join conditions to the right side of a 
LEFT ANTI join
    
    ## What changes were proposed in this pull request?
    We currently push down join conditions of a Left Anti join to both sides of 
the join. This is similar to Inner, Left Semi and Existence (a specialized left 
semi) join. The problem is that this changes the semantics of the join; a left 
anti join filters out rows that matches the join condition.
    
    This PR fixes this by only pushing down conditions to the left hand side of 
the join. This is similar to the behavior of left outer join.
    
    ## How was this patch tested?
    Added tests to `FilterPushdownSuite.scala` and created a SQLQueryTestSuite 
file for left anti joins with a regression test.
    
    Author: Herman van Hovell <[email protected]>
    
    Closes #16026 from hvanhovell/SPARK-18597.
    
    (cherry picked from commit 38e29824d9a50464daa397c28e89610ed0aed4b6)
    Signed-off-by: Herman van Hovell <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16502: Branch 2.1

Reply via email to