[GitHub] spark pull request #15020: Spark 2.0 error in Intellij

bigdatatraining Thu, 08 Sep 2016 19:57:07 -0700

GitHub user bigdatatraining opened a pull request:

    https://github.com/apache/spark/pull/15020


    Spark 2.0 error in Intellij

    
    If i run twitter code in Console  it's working fine, but if i run same 
command in Spark 2.0 in Intellij I got this error
    
    Exception in thread "main" java.lang.NoClassDefFoundError: 
org/apache/spark/Logging
    
    Not only this problem most of the programs getting same errors please let 
me know why?
    import org.apache.spark.Logging
    Its not available in spark 2.0 How to resolve this issues?


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-2.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15020.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15020
    
----
commit fb944a1e85a4d0e618cf7485afb0d0b39367fbda
Author: Tom Graves <tgra...@yahoo-inc.com>
Date:   2016-07-22T11:41:38Z

    [SPARK-16650] Improve documentation of spark.task.maxFailures
    
    Clarify documentation on spark.task.maxFailures
    
    No tests run as its documentation
    
    Author: Tom Graves <tgra...@yahoo-inc.com>
    
    Closes #14287 from tgravescs/SPARK-16650.
    
    (cherry picked from commit 6c56fff118ff2380c661456755db17976040de66)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 28bb2b0447e9b47c4c568de983adde4a49b29263
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-07-22T12:20:06Z

    [SPARK-16651][PYSPARK][DOC] Make `withColumnRenamed/drop` description more 
consistent with Scala API
    
    ## What changes were proposed in this pull request?
    
    `withColumnRenamed` and `drop` is a no-op if the given column name does not 
exists. Python documentation also describe that, but this PR adds more explicit 
line consistently with Scala to reduce the ambiguity.
    
    ## How was this patch tested?
    
    It's about docs.
    
    Author: Dongjoon Hyun <dongj...@apache.org>
    
    Closes #14288 from dongjoon-hyun/SPARK-16651.
    
    (cherry picked from commit 47f5b88db4d65f1870b16745d3c93d01051ba20b)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit da34e8e8faaf7239f6dfe248812c83e1b2e2c1fd
Author: Cheng Lian <l...@databricks.com>
Date:   2016-07-23T18:41:24Z

    [SPARK-16380][EXAMPLES] Update SQL examples and programming guide for 
Python language binding
    
    This PR is based on PR #14098 authored by wangmiao1981.
    
    ## What changes were proposed in this pull request?
    
    This PR replaces the original Python Spark SQL example file with the 
following three files:
    
    - `sql/basic.py`
    
      Demonstrates basic Spark SQL features.
    
    - `sql/datasource.py`
    
      Demonstrates various Spark SQL data sources.
    
    - `sql/hive.py`
    
      Demonstrates Spark SQL Hive interaction.
    
    This PR also removes hard-coded Python example snippets in the SQL 
programming guide by extracting snippets from the above files using the 
`include_example` Liquid template tag.
    
    ## How was this patch tested?
    
    Manually tested.
    
    Author: wm...@hotmail.com <wm...@hotmail.com>
    Author: Cheng Lian <l...@databricks.com>
    
    Closes #14317 from liancheng/py-examples-update.
    
    (cherry picked from commit 53b2456d1de38b9d4f18509e7b36eb3fbe09e050)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 31c3bcb46cb56b57d3cdcb8c42e7056dab0f7601
Author: Wenchen Fan <wenc...@databricks.com>
Date:   2016-07-23T18:39:48Z

    [SPARK-16690][TEST] rename SQLTestUtils.withTempTable to withTempView
    
    after https://github.com/apache/spark/pull/12945, we renamed the 
`registerTempTable` to `createTempView`, as we do create a view actually. This 
PR renames `SQLTestUtils.withTempTable` to reflect this change.
    
    N/A
    
    Author: Wenchen Fan <wenc...@databricks.com>
    
    Closes #14318 from cloud-fan/minor4.
    
    (cherry picked from commit 86c275206605c44e1ebca2f166d62868e44bf029)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 198b0426e07f3d4b1fbbef21d39daa32a75da36c
Author: Liwei Lin <lwl...@gmail.com>
Date:   2016-07-24T07:35:57Z

    [SPARK-16515][SQL][FOLLOW-UP] Fix test `script` on OS X/Windows...
    
    The current `sed` in `test_script.sh` is missing a `$`, leading to the 
failure of `script` test on OS X:
    ```
    == Results ==
    !== Correct Answer - 2 ==   == Spark Answer - 2 ==
    ![x1_y1]                    [x1]
    ![x2_y2]                    [x2]
    ```
    
    In addition, this `script` test would also fail on systems like Windows 
where we couldn't be able to invoke `bash` or `echo | sed`.
    
    This patch
    - fixes `sed` in `test_script.sh`
    - adds command guards so that the `script` test would pass on systems like 
Windows
    
    - Jenkins
    - Manually verified tests pass on OS X
    
    Author: Liwei Lin <lwl...@gmail.com>
    
    Closes #14280 from lw-lin/osx-sed.
    
    (cherry picked from commit d6795c7a254b83d4ae4785f3add74981e5273c91)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit d226dce12babcd9f30db033417b2b9ce79f44312
Author: Qifan Pu <qifan...@gmail.com>
Date:   2016-07-25T04:53:21Z

    [SPARK-16699][SQL] Fix performance bug in hash aggregate on long string keys
    
    ## What changes were proposed in this pull request?
    
    In the following code in `VectorizedHashMapGenerator.scala`:
    ```
        def hashBytes(b: String): String = {
          val hash = ctx.freshName("hash")
          s"""
             |int $result = 0;
             |for (int i = 0; i < $b.length; i++) {
             |  ${genComputeHash(ctx, s"$b[i]", ByteType, hash)}
             |  $result = ($result ^ (0x9e3779b9)) + $hash + ($result << 6) + 
($result >>> 2);
             |}
           """.stripMargin
        }
    
    ```
    when b=input.getBytes(), the current 2.0 code results in getBytes() being 
called n times, n being length of input. getBytes() involves memory copy is 
thus expensive and causes a performance degradation.
    Fix is to evaluate getBytes() before the for loop.
    
    ## How was this patch tested?
    
    Performance bug, no additional test added.
    
    Author: Qifan Pu <qifan...@gmail.com>
    
    Closes #14337 from ooq/SPARK-16699.

commit fcbb7f653df11d923a208c5af03c0a6b9a472376
Author: Cheng Lian <l...@databricks.com>
Date:   2016-07-25T09:22:29Z

    [SPARK-16648][SQL] Make ignoreNullsExpr a child expression of First and Last
    
    ## What changes were proposed in this pull request?
    
    Default `TreeNode.withNewChildren` implementation doesn't work for `Last` 
and when both constructor arguments are the same, e.g.:
    
    ```sql
    LAST_VALUE(FALSE) -- The 2nd argument defaults to FALSE
    LAST_VALUE(FALSE, FALSE)
    LAST_VALUE(TRUE, TRUE)
    ```
    
    This is because although `Last` is a unary expression, both of its 
constructor arguments, `child` and `ignoreNullsExpr`, are `Expression`s. When 
they have the same value, `TreeNode.withNewChildren` treats both of them as 
child nodes by mistake. `First` is also affected by this issue in exactly the 
same way.
    
    This PR fixes this issue by making `ignoreNullsExpr` a child expression of 
`First` and `Last`.
    
    ## How was this patch tested?
    
    New test case added in `WindowQuerySuite`.
    
    Author: Cheng Lian <l...@databricks.com>
    
    Closes #14295 from liancheng/spark-16648-last-value.
    
    (cherry picked from commit 68b4020d0c0d4f063facfbf4639ef4251dcfda8b)
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>

commit b52e639a84a851e0b9159a0f6dae92664425042e
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2016-07-25T14:51:30Z

    [SPARK-16698][SQL] Field names having dots should be allowed for 
datasources based on FileFormat
    
    ## What changes were proposed in this pull request?
    
    It seems this is a regression assuming from 
https://issues.apache.org/jira/browse/SPARK-16698.
    
    Field name having dots throws an exception. For example the codes below:
    
    ```scala
    val path = "/tmp/path"
    val json =""" {"a.b":"data"}"""
    spark.sparkContext
      .parallelize(json :: Nil)
      .saveAsTextFile(path)
    spark.read.json(path).collect()
    ```
    
    throws an exception as below:
    
    ```
    Unable to resolve a.b given [a.b];
    org.apache.spark.sql.AnalysisException: Unable to resolve a.b given [a.b];
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134)
        at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolve$1$$anonfun$apply$5.apply(LogicalPlan.scala:134)
        at scala.Option.getOrElse(Option.scala:121)
    ```
    
    This problem was introduced in 
https://github.com/apache/spark/commit/17eec0a71ba8713c559d641e3f43a1be726b037c#diff-27c76f96a7b2733ecfd6f46a1716e153R121
    
    When extracting the data columns, it does not count that it can contains 
dots in field names. Actually, it seems the fields name are not expected as 
quoted when defining schema. So, It not have to consider whether this is 
wrapped with quotes because the actual schema (inferred or user-given schema) 
would not have the quotes for fields.
    
    For example, this throws an exception. (**Loading JSON from RDD is fine**)
    
    ```scala
    val json =""" {"a.b":"data"}"""
    val rdd = spark.sparkContext.parallelize(json :: Nil)
    spark.read.schema(StructType(Seq(StructField("`a.b`", StringType, true))))
      .json(rdd).select("`a.b`").printSchema()
    ```
    
    as below:
    
    ```
    cannot resolve '```a.b```' given input columns: [`a.b`];
    org.apache.spark.sql.AnalysisException: cannot resolve '```a.b```' given 
input columns: [`a.b`];
        at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
    ```
    
    ## How was this patch tested?
    
    Unit tests in `FileSourceStrategySuite`.
    
    Author: hyukjinkwon <gurwls...@gmail.com>
    
    Closes #14339 from HyukjinKwon/SPARK-16698-regression.
    
    (cherry picked from commit 79826f3c7936ee27457d030c7115d5cac69befd7)
    Signed-off-by: Cheng Lian <l...@databricks.com>

commit 57d65e5111e281d3d5224c5ea11005c89718f791
Author: Cheng Lian <l...@databricks.com>
Date:   2016-07-25T16:42:39Z

    [SPARK-16703][SQL] Remove extra whitespace in SQL generation for window 
functions
    
    ## What changes were proposed in this pull request?
    
    This PR fixes a minor formatting issue of `WindowSpecDefinition.sql` when 
no partitioning expressions are present.
    
    Before:
    
    ```sql
    ( ORDER BY `a` ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    ```
    
    After:
    
    ```sql
    (ORDER BY `a` ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    ```
    
    ## How was this patch tested?
    
    New test case added in `ExpressionSQLBuilderSuite`.
    
    Author: Cheng Lian <l...@databricks.com>
    
    Closes #14334 from liancheng/window-spec-sql-format.
    
    (cherry picked from commit 7ea6d282b925819ddb3874a67b3c9da8cc41f131)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit d9bd066b9f37cfd18037b9a600371d0342703c0f
Author: Felix Cheung <felixcheun...@hotmail.com>
Date:   2016-07-25T18:25:41Z

    [SPARKR][DOCS] fix broken url in doc
    
    ## What changes were proposed in this pull request?
    
    Fix broken url, also,
    
    sparkR.session.stop doc page should have it in the header, instead of 
saying "sparkR.stop"
    
![image](https://cloud.githubusercontent.com/assets/8969467/17080129/26d41308-50d9-11e6-8967-79d6c920313f.png)
    
    Data type section is in the middle of a list of gapply/gapplyCollect 
subsections:
    
![image](https://cloud.githubusercontent.com/assets/8969467/17080122/f992d00a-50d8-11e6-8f2c-fd5786213920.png)
    
    ## How was this patch tested?
    
    manual test
    
    Author: Felix Cheung <felixcheun...@hotmail.com>
    
    Closes #14329 from felixcheung/rdoclinkfix.
    
    (cherry picked from commit b73defdd790cb823a4f9958ca89cec06fd198051)
    Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu>

commit f0d05f669b4e7be017d8d0cfba33c3a61a1eef8f
Author: Shuai Lin <linshuai2...@gmail.com>
Date:   2016-07-25T19:26:55Z

    [SPARK-16485][DOC][ML] Fixed several inline formatting in ml features doc
    
    ## What changes were proposed in this pull request?
    
    Fixed several inline formatting in ml features doc.
    
    Before:
    
    <img width="475" alt="screen shot 2016-07-14 at 12 24 57 pm" 
src="https://cloud.githubusercontent.com/assets/717363/16827974/1e1b6e04-49be-11e6-8aa9-4a0cb6cd3b4e.png";>
    
    After:
    
    <img width="404" alt="screen shot 2016-07-14 at 12 25 48 pm" 
src="https://cloud.githubusercontent.com/assets/717363/16827976/2576510a-49be-11e6-96dd-92a1fa464d36.png";>
    
    ## How was this patch tested?
    
    Genetate the docs locally by `SKIP_API=1 jekyll build` and view it in the 
browser.
    
    Author: Shuai Lin <linshuai2...@gmail.com>
    
    Closes #14194 from lins05/fix-docs-formatting.
    
    (cherry picked from commit 3b6e1d094e153599e158331b10d33d74a667be5a)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 1b4f7cf135eebc46f07649509a027b6d422dcfdf
Author: Takeshi YAMAMURO <linguin....@gmail.com>
Date:   2016-07-25T22:08:58Z

    [SQL][DOC] Fix a default name for parquet compression
    
    ## What changes were proposed in this pull request?
    This pr is to fix a wrong description for parquet default compression.
    
    Author: Takeshi YAMAMURO <linguin....@gmail.com>
    
    Closes #14351 from maropu/FixParquetDoc.
    
    (cherry picked from commit cda4603de340d533c49feac1b244ddfd291f9bcf)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 41e72f65929c345aa21ebd4e00dadfbfb5acfdf3
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2016-07-25T23:08:29Z

    [SPARK-16715][TESTS] Fix a potential ExprId conflict for 
SubexpressionEliminationSuite."Semantic equals and hash"
    
    ## What changes were proposed in this pull request?
    
    SubexpressionEliminationSuite."Semantic equals and hash" assumes the 
default AttributeReference's exprId wont' be "ExprId(1)". However, that depends 
on when this test runs. It may happen to use "ExprId(1)".
    
    This PR detects the conflict and makes sure we create a different ExprId 
when the conflict happens.
    
    ## How was this patch tested?
    
    Jenkins unit tests.
    
    Author: Shixiong Zhu <shixi...@databricks.com>
    
    Closes #14350 from zsxwing/SPARK-16715.
    
    (cherry picked from commit 12f490b5c85cdee26d47eb70ad1a1edd00504f21)
    Signed-off-by: Shixiong Zhu <shixi...@databricks.com>

commit b17fe4e412d27a4f3e8ad86ac5d8c2c108654eb3
Author: Tathagata Das <tathagata.das1...@gmail.com>
Date:   2016-07-25T23:09:22Z

    [SPARK-14131][STREAMING] SQL Improved fix for avoiding potential deadlocks 
in HDFSMetadataLog
    
    ## What changes were proposed in this pull request?
    Current fix for deadlock disables interrupts in the StreamExecution which 
getting offsets for all sources, and when writing to any metadata log, to avoid 
potential deadlocks in HDFSMetadataLog(see JIRA for more details). However, 
disabling interrupts can have unintended consequences in other sources. So I am 
making the fix more narrow, by disabling interrupt it only in the 
HDFSMetadataLog. This is a narrower fix for something risky like disabling 
interrupt.
    
    ## How was this patch tested?
    Existing tests.
    
    Author: Tathagata Das <tathagata.das1...@gmail.com>
    
    Closes #14292 from tdas/SPARK-14131.
    
    (cherry picked from commit c979c8bba02bc89cb9ad81b212f085a8a5490a07)
    Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>

commit 9d581dc61951eccf0f06868e0d3f10134f433e82
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2016-07-26T01:26:29Z

    [SPARK-16722][TESTS] Fix a StreamingContext leak in StreamingContextSuite 
when eventually fails
    
    ## What changes were proposed in this pull request?
    
    This PR moves `ssc.stop()` into `finally` for 
`StreamingContextSuite.createValidCheckpoint` to avoid leaking a 
StreamingContext since leaking a StreamingContext will fail a lot of tests and 
make us hard to find the real failure one.
    
    ## How was this patch tested?
    
    Jenkins unit tests
    
    Author: Shixiong Zhu <shixi...@databricks.com>
    
    Closes #14354 from zsxwing/ssc-leak.
    
    (cherry picked from commit e164a04b2ba3503e5c14cd1cd4beb40e0b79925a)
    Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>

commit 3d35474872d3b117abc3fc7debcb1eb6409769d6
Author: Nicholas Brown <nbr...@adroitdigital.com>
Date:   2016-07-26T02:18:27Z

    Fix description of spark.speculation.quantile
    
    ## What changes were proposed in this pull request?
    
    Minor doc fix regarding the spark.speculation.quantile configuration 
parameter.  It incorrectly states it should be a percentage, when it should be 
a fraction.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    I tried building the documentation but got some unidoc errors.  I also got 
them when building off origin/master, so I don't think I caused that problem.  
I did run the web app and saw the changes reflected as expected.
    
    Author: Nicholas Brown <nbr...@adroitdigital.com>
    
    Closes #14352 from nwbvt/master.
    
    (cherry picked from commit ba0aade6d517364363e07ed09278c2b44110c33b)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit aeb6d5c053d4e848df0e7842a3994154df464647
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-07-26T02:52:17Z

    [SPARK-16672][SQL] SQLBuilder should not raise exceptions on EXISTS queries
    
    ## What changes were proposed in this pull request?
    
    Currently, `SQLBuilder` raises `empty.reduceLeft` exceptions on 
*unoptimized* `EXISTS` queries. We had better prevent this.
    ```scala
    scala> sql("CREATE TABLE t1(a int)")
    scala> val df = sql("select * from t1 b where exists (select * from t1 a)")
    scala> new org.apache.spark.sql.catalyst.SQLBuilder(df).toSQL
    java.lang.UnsupportedOperationException: empty.reduceLeft
    ```
    
    ## How was this patch tested?
    
    Pass the Jenkins tests with a new test suite.
    
    Author: Dongjoon Hyun <dongj...@apache.org>
    
    Closes #14307 from dongjoon-hyun/SPARK-16672.
    
    (cherry picked from commit 8a8d26f1e27db5c2228307b1c3609b4713b9d0db)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 4b38a6a534d93b1eab3b19f62a2f78474be1d8bc
Author: Michael Armbrust <mich...@databricks.com>
Date:   2016-07-26T03:41:24Z

    [SPARK-16724] Expose DefinedByConstructorParams
    
    We don't generally make things in catalyst/execution private.  Instead they 
are just undocumented due to their lack of stability guarantees.
    
    Author: Michael Armbrust <mich...@databricks.com>
    
    Closes #14356 from marmbrus/patch-1.
    
    (cherry picked from commit f99e34e8e58c97ff30c6e054875533350d99fe5b)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 4391d4a3c60d59df625cbfdb918aa67c51ebcbc1
Author: Yin Huai <yh...@databricks.com>
Date:   2016-07-26T03:58:07Z

    [SPARK-16633][SPARK-16642][SPARK-16721][SQL] Fixes three issues related to 
lead and lag functions
    
    ## What changes were proposed in this pull request?
    This PR contains three changes.
    
    First, this PR changes the behavior of lead/lag back to Spark 1.6's 
behavior, which is described as below:
    1. lead/lag respect null input values, which means that if the offset row 
exists and the input value is null, the result will be null instead of the 
default value.
    2. If the offset row does not exist, the default value will be used.
    3. OffsetWindowFunction's nullable setting also considers the nullability 
of its input (because of the first change).
    
    Second, this PR fixes the evaluation of lead/lag when the input expression 
is a literal. This fix is a result of the first change. In current master, if a 
literal is used as the input expression of a lead or lag function, the result 
will be this literal even if the offset row does not exist.
    
    Third, this PR makes ResolveWindowFrame not fire if a window function is 
not resolved.
    
    ## How was this patch tested?
    New tests in SQLWindowFunctionSuite
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #14284 from yhuai/lead-lag.
    
    (cherry picked from commit 815f3eece5f095919a329af8cbd762b9ed71c7a8)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 44234b1c4266ac7be56892817d043fe6d9ea62f7
Author: Tathagata Das <tathagata.das1...@gmail.com>
Date:   2016-07-26T07:41:46Z

    [TEST][STREAMING] Fix flaky Kafka rate controlling test
    
    ## What changes were proposed in this pull request?
    
    The current test is incorrect, because
    - The expected number of messages does not take into account that the topic 
has 2 partitions, and rate is set per partition.
    - Also in some cases, the test ran out of data in Kafka while waiting for 
the right amount of data per batch.
    
    The PR
    - Reduces the number of partitions to 1
    - Adds more data to Kafka
    - Runs with 0.5 second so that batches are created slowly
    
    ## How was this patch tested?
    Ran many times locally, going to run it many times in Jenkins
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Author: Tathagata Das <tathagata.das1...@gmail.com>
    
    Closes #14361 from tdas/kafka-rate-test-fix.
    
    (cherry picked from commit 03c27435aee4e319abe290771ba96e69469109ac)
    Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com>

commit be9965b077cded3d30a2d35342f3440f4708c357
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-07-27T05:23:59Z

    [SPARK-16621][SQL] Generate stable SQLs in SQLBuilder
    
    Currently, the generated SQLs have not-stable IDs for generated attributes.
    The stable generated SQL will give more benefit for understanding or 
testing the queries.
    This PR provides stable SQL generation by the followings.
    
     - Provide unique ids for generated subqueries, `gen_subquery_xxx`.
     - Provide unique and stable ids for generated attributes, `gen_attr_xxx`.
    
    **Before**
    ```scala
    scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
    res0: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) 
AS gen_subquery_0
    scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
    res1: String = SELECT `gen_attr_4` AS `1` FROM (SELECT 1 AS `gen_attr_4`) 
AS gen_subquery_0
    ```
    
    **After**
    ```scala
    scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
    res1: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) 
AS gen_subquery_0
    scala> new org.apache.spark.sql.catalyst.SQLBuilder(sql("select 1")).toSQL
    res2: String = SELECT `gen_attr_0` AS `1` FROM (SELECT 1 AS `gen_attr_0`) 
AS gen_subquery_0
    ```
    
    Pass the existing Jenkins tests.
    
    Author: Dongjoon Hyun <dongj...@apache.org>
    
    Closes #14257 from dongjoon-hyun/SPARK-16621.
    
    (cherry picked from commit 5b8e848bbfbc0c99a5faf758e40b188b0bbebb7b)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 4e98e6905f0dd35902207d40e68befc0b0040b7d
Author: Yanbo Liang <yblia...@gmail.com>
Date:   2016-07-27T10:24:28Z

    [MINOR][ML] Fix some mistake in LinearRegression formula.
    
    ## What changes were proposed in this pull request?
    Fix some mistake in ```LinearRegression``` formula.
    
    ## How was this patch tested?
    Documents change, no tests.
    
    Author: Yanbo Liang <yblia...@gmail.com>
    
    Closes #14369 from yanboliang/LiR-formula.
    
    (cherry picked from commit 3c3371bbd6361011b138cce88f6396a2aa4e2cb9)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 8bc2877d8c7cad6831de73a3f7c032b7dd73ae78
Author: petermaxlee <petermax...@gmail.com>
Date:   2016-07-27T08:04:43Z

    [SPARK-16729][SQL] Throw analysis exception for invalid date casts
    
    Spark currently throws exceptions for invalid casts for all other data 
types except date type. Somehow date type returns null. It should be consistent 
and throws analysis exception as well.
    
    Added a unit test case in CastSuite.
    
    Author: petermaxlee <petermax...@gmail.com>
    
    Closes #14358 from petermaxlee/SPARK-16729.
    
    (cherry picked from commit ef0ccbcb07252db0ead8509e70d1a9a670d41616)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 2f4e06e381465289680348a78006a2e24be86e62
Author: Bartek WisÌniewski <wedi@ava.local>
Date:   2016-07-27T17:53:22Z

    [MINOR][DOC] missing keyword new
    
    ## What changes were proposed in this pull request?
    
    added missing keyword for java example
    
    ## How was this patch tested?
    
    wasn't
    
    Author: Bartek WisÌniewski <wedi@Ava.local>
    
    Closes #14381 from wedi-dev/quickfix/missing_keyword.
    
    (cherry picked from commit bc4851adeb386edc5bef47027a12ca44eda82b09)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 2d56a213622f699dd6c65b1c79621178a597bbf7
Author: petermaxlee <petermax...@gmail.com>
Date:   2016-07-28T05:13:17Z

    [SPARK-16730][SQL] Implement function aliases for type casts
    
    ## What changes were proposed in this pull request?
    Spark 1.x supports using the Hive type name as function names for doing 
casts, e.g.
    ```sql
    SELECT int(1.0);
    SELECT string(2.0);
    ```
    
    The above query would work in Spark 1.x because Spark 1.x fail back to Hive 
for unimplemented functions, and break in Spark 2.0 because the fall back was 
removed.
    
    This patch implements function aliases using an analyzer rule for the 
following cast functions:
    - boolean
    - tinyint
    - smallint
    - int
    - bigint
    - float
    - double
    - decimal
    - date
    - timestamp
    - binary
    - string
    
    ## How was this patch tested?
    Added end-to-end tests in SQLCompatibilityFunctionSuite.
    
    Author: petermaxlee <petermax...@gmail.com>
    
    Closes #14364 from petermaxlee/SPARK-16730-2.
    
    (cherry picked from commit 11d427c924d303e20af90c0179a105f6ff4d89e2)
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>

commit 0fd2dfb6dee9d7eaa277d6806e56f1b0531afa51
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-07-28T06:29:26Z

    [SPARK-15232][SQL] Add subquery SQL building tests to LogicalPlanToSQLSuite
    
    ## What changes were proposed in this pull request?
    
    We currently test subquery SQL building using the `HiveCompatibilitySuite`. 
The is not desired since SQL building is actually a part of `sql/core` and 
because we are slowly reducing our dependency on Hive. This PR adds the same 
tests from the whitelist of `HiveCompatibilitySuite` into 
`LogicalPlanToSQLSuite`.
    
    ## How was this patch tested?
    
    This adds more testcases. Pass the Jenkins tests.
    
    Author: Dongjoon Hyun <dongj...@apache.org>
    
    Closes #14383 from dongjoon-hyun/SPARK-15232.
    
    (cherry picked from commit 5c2ae79bfcf448d8dc9217efafa1409997c739de)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 825c8371784468ff976526deffd97ad7df997738
Author: Liang-Chi Hsieh <sim...@tw.ibm.com>
Date:   2016-07-28T14:33:33Z

    [SPARK-16639][SQL] The query with having condition that contains grouping 
by column should work
    
    ## What changes were proposed in this pull request?
    
    The query with having condition that contains grouping by column will be 
failed during analysis. E.g.,
    
        create table tbl(a int, b string);
        select count(b) from tbl group by a + 1 having a + 1 = 2;
    
    Having condition should be able to use grouping by column.
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Author: Liang-Chi Hsieh <sim...@tw.ibm.com>
    
    Closes #14296 from viirya/having-contains-grouping-column.
    
    (cherry picked from commit 9ade77c3fa2e1bf436b79368a97d5980c12fe215)
    Signed-off-by: Wenchen Fan <wenc...@databricks.com>

commit f46a074510e47206de9d3b3ac6902af321923ce8
Author: Sylvain Zimmer <sylv...@sylvainzimmer.com>
Date:   2016-07-28T16:51:45Z

    [SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap
    
    Avoid overflow of Long type causing a NegativeArraySizeException a few 
lines later.
    
    Unit tests for HashedRelationSuite still pass.
    
    I can confirm the python script I included in 
https://issues.apache.org/jira/browse/SPARK-16740 works fine with this patch. 
Unfortunately I don't have the knowledge/time to write a Scala test case for 
HashedRelationSuite right now. As the patch is pretty obvious I hope it can be 
included without this.
    
    Thanks!
    
    Author: Sylvain Zimmer <sylv...@sylvainzimmer.com>
    
    Closes #14373 from sylvinus/master.
    
    (cherry picked from commit 1178d61ede816bf1c8d5bb3dbb3b965c9b944407)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit fb09a693d6f58d71ec042224b8ea66b972c1adc2
Author: Sameer Agarwal <samee...@cs.berkeley.edu>
Date:   2016-07-28T20:04:19Z

    [SPARK-16764][SQL] Recommend disabling vectorized parquet reader on 
OutOfMemoryError
    
    ## What changes were proposed in this pull request?
    
    We currently don't bound or manage the data array size used by column 
vectors in the vectorized reader (they're just bound by INT.MAX) which may lead 
to OOMs while reading data. As a short term fix, this patch intercepts the 
OutOfMemoryError exception and suggest the user to disable the vectorized 
parquet reader.
    
    ## How was this patch tested?
    
    Existing Tests
    
    Author: Sameer Agarwal <samee...@cs.berkeley.edu>
    
    Closes #14387 from sameeragarwal/oom.
    
    (cherry picked from commit 3fd39b87bda77f3c3a4622d854f23d4234683571)
    Signed-off-by: Reynold Xin <r...@databricks.com>

commit 5cd79c396f98660e12b02c0151a084b4d1599b6b
Author: Nicholas Chammas <nicholas.cham...@gmail.com>
Date:   2016-07-28T21:57:15Z

    [SPARK-16772] Correct API doc references to PySpark classes + formatting 
fixes
    
    ## What's Been Changed
    
    The PR corrects several broken or missing class references in the Python 
API docs. It also correct formatting problems.
    
    For example, you can see 
[here](http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.SQLContext.registerFunction)
 how Sphinx is not picking up the reference to `DataType`. That's because the 
reference is relative to the current module, whereas `DataType` is in a 
different module.
    
    You can also see 
[here](http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.SQLContext.createDataFrame)
 how the formatting for byte, tinyint, and so on is italic instead of 
monospace. That's because in ReST single backticks just make things italic, 
unlike in Markdown.
    
    ## Testing
    
    I tested this PR by [building the Python 
docs](https://github.com/apache/spark/tree/master/docs#generating-the-documentation-html)
 and reviewing the results locally in my browser. I confirmed that the broken 
or missing class references were resolved, and that the formatting was 
corrected.
    
    Author: Nicholas Chammas <nicholas.cham...@gmail.com>
    
    Closes #14393 from nchammas/python-docstring-fixes.
    
    (cherry picked from commit 274f3b9ec86e4109c7678eef60f990d41dc3899f)
    Signed-off-by: Reynold Xin <r...@databricks.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15020: Spark 2.0 error in Intellij

Reply via email to