GitHub user paulzwu opened a pull request:

    https://github.com/apache/spark/pull/15183

    https://issues.apache.org/jira/browse/SPARK-17614 

    ## What changes were proposed in this pull request?
    
    Use JdbcDialect's getTableExistsQuery rather than the hard-coded "SELECT * 
FROM $table WHERE 1=0" to get table's schema.
    
    
    ## How was this patch tested?
    
    Create unit test using DataFrameReader.read() with Cassandra can test this 
since Cassandra does  not support "SELECT * FROM $table WHERE 1=0".
    
    
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/15183.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #15183
    
----
commit f2093107196b9af62908ecf15bac043f3b1e64c4
Author: Michael Allman <mich...@videoamp.com>
Date:   2016-08-25T18:57:38Z

    [SPARK-17231][CORE] Avoid building debug or trace log messages unless the 
respective log level is enabled
    
    (This PR addresses https://issues.apache.org/jira/browse/SPARK-17231)
    
    ## What changes were proposed in this pull request?
    
    While debugging the performance of a large GraphX connected components 
computation, we found several places in the `network-common` and 
`network-shuffle` code bases where trace or debug log messages are constructed 
even if the respective log level is disabled. According to YourKit, these 
constructions were creating substantial churn in the eden region. Refactoring 
the respective code to avoid these unnecessary constructions except where 
necessary led to a modest but measurable reduction in our job's task time, GC 
time and the ratio thereof.
    
    ## How was this patch tested?
    
    We computed the connected components of a graph with about 2.6 billion 
vertices and 1.7 billion edges four times. We used four different EC2 clusters 
each with 8 r3.8xl worker nodes. Two test runs used Spark master. Two used 
Spark master + this PR. The results from the first test run, master and 
master+PR:
    
![master](https://cloud.githubusercontent.com/assets/833693/17951634/7471cbca-6a18-11e6-9c26-78afe9319685.jpg)
    
![logging_perf_improvements](https://cloud.githubusercontent.com/assets/833693/17951632/7467844e-6a18-11e6-9a0e-053dc7650413.jpg)
    
    The results from the second test run, master and master+PR:
    ![master 
2](https://cloud.githubusercontent.com/assets/833693/17951633/746dd6aa-6a18-11e6-8e27-606680b3f105.jpg)
    ![logging_perf_improvements 
2](https://cloud.githubusercontent.com/assets/833693/17951631/74488710-6a18-11e6-8a32-08692f373386.jpg)
    
    Though modest, I believe these results are significant.
    
    Author: Michael Allman <mich...@videoamp.com>
    
    Closes #14798 from mallman/spark-17231-logging_perf_improvements.

commit 9958ac0ce2b9e451d400604767bef2fe12a3399d
Author: wm...@hotmail.com <wm...@hotmail.com>
Date:   2016-08-25T19:11:27Z

    [SPARKR][BUILD] ignore cran-check.out under R folder
    
    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    R add cran check which will generate the cran-check.out. This file should 
be ignored in git.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    Manual test it. Run clean test and git status to make sure the file is not 
included in git.
    
    Author: wm...@hotmail.com <wm...@hotmail.com>
    
    Closes #14774 from wangmiao1981/ignore.

commit a133057ce5817f834babe9f25023092aec3c321d
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-08-25T21:22:40Z

    [SPARK-17229][SQL] PostgresDialect shouldn't widen float and short types 
during reads
    
    ## What changes were proposed in this pull request?
    
    When reading float4 and smallint columns from PostgreSQL, Spark's 
`PostgresDialect` widens these types to Decimal and Integer rather than using 
the narrower Float and Short types. According to 
https://www.postgresql.org/docs/7.1/static/datatype.html#DATATYPE-TABLE, 
Postgres maps the `smallint` type to a signed two-byte integer and the `real` / 
`float4` types to single precision floating point numbers.
    
    This patch fixes this by adding more special-cases to `getCatalystType`, 
similar to what was done for the Derby JDBC dialect. I also fixed a similar 
problem in the write path which causes Spark to create integer columns in 
Postgres for what should have been ShortType columns.
    
    ## How was this patch tested?
    
    New test cases in `PostgresIntegrationSuite` (which I ran manually because 
Jenkins can't run it right now).
    
    Author: Josh Rosen <joshro...@databricks.com>
    
    Closes #14796 from JoshRosen/postgres-jdbc-type-fixes.

commit 3e4c7db4d11c474457e7886a5501108ebab0cf6d
Author: Josh Rosen <joshro...@databricks.com>
Date:   2016-08-25T22:15:01Z

    [SPARK-17205] Literal.sql should handle Infinity and NaN
    
    This patch updates `Literal.sql` to properly generate SQL for `NaN` and 
`Infinity` float and double literals: these special values need to be handled 
differently from regular values, since simply appending a suffix to the value's 
`toString()` representation will not work for these values.
    
    Author: Josh Rosen <joshro...@databricks.com>
    
    Closes #14777 from JoshRosen/SPARK-17205.

commit 9b5a1d1d53bc4412de3cbc86dc819b0c213229a8
Author: Marcelo Vanzin <van...@cloudera.com>
Date:   2016-08-25T23:11:42Z

    [SPARK-17240][CORE] Make SparkConf serializable again.
    
    Make the config reader transient, and initialize it lazily so that
    serialization works with both java and kryo (and hopefully any other
    custom serializer).
    
    Added unit test to make sure SparkConf remains serializable and the
    reader works with both built-in serializers.
    
    Author: Marcelo Vanzin <van...@cloudera.com>
    
    Closes #14813 from vanzin/SPARK-17240.

commit d96d1515638da20b594f7bfe3cfdb50088f25a04
Author: Sean Zhong <seanzh...@databricks.com>
Date:   2016-08-25T23:36:16Z

    [SPARK-17187][SQL] Supports using arbitrary Java object as internal 
aggregation buffer object
    
    ## What changes were proposed in this pull request?
    
    This PR introduces an abstract class `TypedImperativeAggregate` so that an 
aggregation function of TypedImperativeAggregate can use  **arbitrary** 
user-defined Java object as intermediate aggregation buffer object.
    
    **This has advantages like:**
    1. It now can support larger category of aggregation functions. For 
example, it will be much easier to implement aggregation function 
`percentile_approx`, which has a complex aggregation buffer definition.
    2. It can be used to avoid doing serialization/de-serialization for every 
call of `update` or `merge` when converting domain specific aggregation object 
to internal Spark-Sql storage format.
    3. It is easier to integrate with other existing monoid libraries like 
algebird, and supports more aggregation functions with high performance.
    
    Please see 
`org.apache.spark.sql.TypedImperativeAggregateSuite.TypedMaxAggregate` to find 
an example of how to defined a `TypedImperativeAggregate` aggregation function.
    Please see Java doc of `TypedImperativeAggregate` and Jira ticket 
SPARK-17187 for more information.
    
    ## How was this patch tested?
    
    Unit tests.
    
    Author: Sean Zhong <seanzh...@databricks.com>
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #14753 from clockfly/object_aggregation_buffer_try_2.

commit b964a172a8c075486189cc9be09a51b8446f0da4
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2016-08-26T00:58:43Z

    [SPARK-17212][SQL] TypeCoercion supports widening conversion between 
DateType and TimestampType
    
    ## What changes were proposed in this pull request?
    
    Currently, type-widening does not work between `TimestampType` and 
`DateType`.
    
    This applies to `SetOperation`, `Union`, `In`, `CaseWhen`, `Greatest`,  
`Leatest`, `CreateArray`, `CreateMap`, `Coalesce`, `NullIf`, `IfNull`, `Nvl` 
and `Nvl2`, .
    
    This PR adds the support for widening `DateType` to `TimestampType` for 
them.
    
    For a simple example,
    
    **Before**
    
    ```scala
    Seq(Tuple2(new Timestamp(0), new Date(0))).toDF("a", 
"b").selectExpr("greatest(a, b)").show()
    ```
    
    shows below:
    
    ```
    cannot resolve 'greatest(`a`, `b`)' due to data type mismatch: The 
expressions should all have the same type, got GREATEST(timestamp, date)
    ```
    
    or union as below:
    
    ```scala
    val a = Seq(Tuple1(new Timestamp(0))).toDF()
    val b = Seq(Tuple1(new Date(0))).toDF()
    a.union(b).show()
    ```
    
    shows below:
    
    ```
    Union can only be performed on tables with the compatible column types. 
DateType <> TimestampType at the first column of the second table;
    ```
    
    **After**
    
    ```scala
    Seq(Tuple2(new Timestamp(0), new Date(0))).toDF("a", 
"b").selectExpr("greatest(a, b)").show()
    ```
    
    shows below:
    
    ```
    +----------------------------------------------------+
    |greatest(CAST(a AS TIMESTAMP), CAST(b AS TIMESTAMP))|
    +----------------------------------------------------+
    |                                1969-12-31 16:00:...|
    +----------------------------------------------------+
    ```
    
    or union as below:
    
    ```scala
    val a = Seq(Tuple1(new Timestamp(0))).toDF()
    val b = Seq(Tuple1(new Date(0))).toDF()
    a.union(b).show()
    ```
    
    shows below:
    
    ```
    +--------------------+
    |                  _1|
    +--------------------+
    |1969-12-31 16:00:...|
    |1969-12-31 00:00:...|
    +--------------------+
    ```
    
    ## How was this patch tested?
    
    Unit tests in `TypeCoercionSuite`.
    
    Author: hyukjinkwon <gurwls...@gmail.com>
    Author: HyukjinKwon <gurwls...@gmail.com>
    
    Closes #14786 from HyukjinKwon/SPARK-17212.

commit 341e0e778dff8c404b47d34ee7661b658bb91880
Author: Shixiong Zhu <shixi...@databricks.com>
Date:   2016-08-26T04:08:42Z

    [SPARK-17242][DOCUMENT] Update links of external dstream projects
    
    ## What changes were proposed in this pull request?
    
    Updated links of external dstream projects.
    
    ## How was this patch tested?
    
    Just document changes.
    
    Author: Shixiong Zhu <shixi...@databricks.com>
    
    Closes #14814 from zsxwing/dstream-link.

commit 6063d5963fcf01768570c1a9b542be6175a3bcbc
Author: hyukjinkwon <gurwls...@gmail.com>
Date:   2016-08-26T15:29:37Z

    [SPARK-16216][SQL][FOLLOWUP] Enable timestamp type tests for JSON and 
verify all unsupported types in CSV
    
    ## What changes were proposed in this pull request?
    
    This PR enables the tests for `TimestampType` for JSON and unifies the 
logics for verifying schema when writing in CSV.
    
    In more details, this PR,
    
    - Enables the tests for `TimestampType` for JSON and
    
      This was disabled due to an issue in `DatatypeConverter.parseDateTime` 
which parses dates incorrectly, for example as below:
    
      ```scala
       val d = 
javax.xml.bind.DatatypeConverter.parseDateTime("0900-01-01T00:00:00.000").getTime
      println(d.toString)
      ```
      ```
      Fri Dec 28 00:00:00 KST 899
      ```
    
      However, since we use `FastDateFormat`, it seems we are safe now.
    
      ```scala
      val d = 
FastDateFormat.getInstance("yyyy-MM-dd'T'HH:mm:ss.SSS").parse("0900-01-01T00:00:00.000")
      println(d)
      ```
      ```
      Tue Jan 01 00:00:00 PST 900
      ```
    
    - Verifies all unsupported types in CSV
    
      There is a separate logics to verify the schemas in `CSVFileFormat`. This 
is actually not quite correct enough because we don't support `NullType` and 
`CalanderIntervalType` as well `StructType`, `ArrayType`, `MapType`. So, this 
PR adds both types.
    
    ## How was this patch tested?
    
    Tests in `JsonHadoopFsRelation` and `CSVSuite`
    
    Author: hyukjinkwon <gurwls...@gmail.com>
    
    Closes #14829 from HyukjinKwon/SPARK-16216-followup.

commit 28ab17922a227e8d93654d3478c0d493bfb599d5
Author: Wenchen Fan <wenc...@databricks.com>
Date:   2016-08-26T15:52:10Z

    [SPARK-17260][MINOR] move CreateTables to HiveStrategies
    
    ## What changes were proposed in this pull request?
    
    `CreateTables` rule turns a general `CreateTable` plan to 
`CreateHiveTableAsSelectCommand` for hive serde table. However, this rule is 
logically a planner strategy, we should move it to `HiveStrategies`, to be 
consistent with other DDL commands.
    
    ## How was this patch tested?
    
    existing tests.
    
    Author: Wenchen Fan <wenc...@databricks.com>
    
    Closes #14825 from cloud-fan/ctas.

commit 970ab8f6ddc66401ad1cf4b2d1050dd0c8876224
Author: Wenchen Fan <wenc...@databricks.com>
Date:   2016-08-26T17:56:57Z

    [SPARK-17187][SQL][FOLLOW-UP] improve document of TypedImperativeAggregate
    
    ## What changes were proposed in this pull request?
    
    improve the document to make it easier to understand and also mention 
window operator.
    
    ## How was this patch tested?
    
    N/A
    
    Author: Wenchen Fan <wenc...@databricks.com>
    
    Closes #14822 from cloud-fan/object-agg.

commit 18832162357282ec81515b5b2ba93747be3ad18b
Author: Junyang Qian <junya...@databricks.com>
Date:   2016-08-26T18:01:48Z

    [SPARKR][MINOR] Fix example of spark.naiveBayes
    
    ## What changes were proposed in this pull request?
    
    The original example doesn't work because the features are not categorical. 
This PR fixes this by changing to another dataset.
    
    ## How was this patch tested?
    
    Manual test.
    
    Author: Junyang Qian <junya...@databricks.com>
    
    Closes #14820 from junyangq/SPARK-FixNaiveBayes.

commit fd4ba3f626f49d7d616a2a334d45b1c736e1db1c
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-08-26T18:13:38Z

    [SPARK-17192][SQL] Issue Exception when Users Specify the Partitioning 
Columns without a Given Schema
    
    ### What changes were proposed in this pull request?
    Address the comments by yhuai in the original PR: 
https://github.com/apache/spark/pull/14207
    
    First, issue an exception instead of logging a warning when users specify 
the partitioning columns without a given schema.
    
    Second, refactor the codes a little.
    
    ### How was this patch tested?
    Fixed the test cases.
    
    Author: gatorsmile <gatorsm...@gmail.com>
    
    Closes #14572 from gatorsmile/followup16552.

commit 261c55dd8808502fb7f3384eb537d26a4a8123d7
Author: gatorsmile <gatorsm...@gmail.com>
Date:   2016-08-26T18:19:03Z

    [SPARK-17250][SQL] Remove HiveClient and setCurrentDatabase from 
HiveSessionCatalog
    
    ### What changes were proposed in this pull request?
    This is the first step to remove `HiveClient` from `HiveSessionState`. In 
the metastore interaction, we always use the fully qualified table name when 
accessing/operating a table. That means, we always specify the database. Thus, 
it is not necessary to use `HiveClient` to change the active database in Hive 
metastore.
    
    In `HiveSessionCatalog `, `setCurrentDatabase` is the only function that 
uses `HiveClient`. Thus, we can remove it after removing `setCurrentDatabase`
    
    ### How was this patch tested?
    The existing test cases.
    
    Author: gatorsmile <gatorsm...@gmail.com>
    
    Closes #14821 from gatorsmile/setCurrentDB.

commit 9812f7d5381f7cd8112fd30c7e45ae4f0eab6e88
Author: petermaxlee <petermax...@gmail.com>
Date:   2016-08-26T18:30:23Z

    [SPARK-17165][SQL] FileStreamSource should not track the list of seen files 
indefinitely
    
    ## What changes were proposed in this pull request?
    Before this change, FileStreamSource uses an in-memory hash set to track 
the list of files processed by the engine. The list can grow indefinitely, 
leading to OOM or overflow of the hash set.
    
    This patch introduces a new user-defined option called "maxFileAge", 
default to 24 hours. If a file is older than this age, FileStreamSource will 
purge it from the in-memory map that was used to track the list of files that 
have been processed.
    
    ## How was this patch tested?
    Added unit tests for the underlying utility, and also added an end-to-end 
test to validate the purge in FileStreamSourceSuite. Also verified the new test 
cases would fail when the timeout was set to a very large number.
    
    Author: petermaxlee <petermax...@gmail.com>
    
    Closes #14728 from petermaxlee/SPARK-17165.

commit c0949dc944b7e2fc8a4465acc68a8f2713b3fa13
Author: Peng, Meng <peng.m...@intel.com>
Date:   2016-08-26T18:54:10Z

    [SPARK-17207][MLLIB] fix comparing Vector bug in TestingUtils
    
    ## What changes were proposed in this pull request?
    
    fix comparing Vector bug in TestingUtils.
    There is the same bug for Matrix comparing. How to check the length of 
Matrix should be discussed first.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
    
    (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
    
    Author: Peng, Meng <peng.m...@intel.com>
    
    Closes #14785 from mpjlu/testUtils.

commit 8e5475be3c9a620f18f6712631b093464a7d0ee7
Author: Michael Gummelt <mgumm...@mesosphere.io>
Date:   2016-08-26T19:25:22Z

    [SPARK-16967] move mesos to module
    
    ## What changes were proposed in this pull request?
    
    Move Mesos code into a mvn module
    
    ## How was this patch tested?
    
    unit tests
    manually submitting a client mode and cluster mode job
    spark/mesos integration test suite
    
    Author: Michael Gummelt <mgumm...@mesosphere.io>
    
    Closes #14637 from mgummelt/mesos-module.

commit a11d10f1826b578ff721c4738224eef2b3c3b9f3
Author: Herman van Hovell <hvanhov...@databricks.com>
Date:   2016-08-26T20:29:22Z

    [SPARK-17246][SQL] Add BigDecimal literal
    
    ## What changes were proposed in this pull request?
    This PR adds parser support for `BigDecimal` literals. If you append the 
suffix `BD` to a valid number then this will be interpreted as a `BigDecimal`, 
for example `12.0E10BD` will interpreted into a BigDecimal with scale -9 and 
precision 3. This is useful in situations where you need exact values.
    
    ## How was this patch tested?
    Added tests to `ExpressionParserSuite`, `ExpressionSQLBuilderSuite` and 
`SQLQueryTestSuite`.
    
    Author: Herman van Hovell <hvanhov...@databricks.com>
    
    Closes #14819 from hvanhovell/SPARK-17246.

commit f64a1ddd09a34d5d867ccbaba46204d75fad038d
Author: petermaxlee <petermax...@gmail.com>
Date:   2016-08-26T23:05:34Z

    [SPARK-17235][SQL] Support purging of old logs in MetadataLog
    
    ## What changes were proposed in this pull request?
    This patch adds a purge interface to MetadataLog, and an implementation in 
HDFSMetadataLog. The purge function is currently unused, but I will use it to 
purge old execution and file source logs in follow-up patches. These changes 
are required in a production structured streaming job that runs for a long 
period of time.
    
    ## How was this patch tested?
    Added a unit test case in HDFSMetadataLogSuite.
    
    Author: petermaxlee <petermax...@gmail.com>
    
    Closes #14802 from petermaxlee/SPARK-17235.

commit 540e91280147a61727f99592a66c0cbb12328fac
Author: Sameer Agarwal <samee...@cs.berkeley.edu>
Date:   2016-08-26T23:40:59Z

    [SPARK-17244] Catalyst should not pushdown non-deterministic join conditions
    
    ## What changes were proposed in this pull request?
    
    Given that non-deterministic expressions can be stateful, pushing them down 
the query plan during the optimization phase can cause incorrect behavior. This 
patch fixes that issue by explicitly disabling that.
    
    ## How was this patch tested?
    
    A new test in `FilterPushdownSuite` that checks catalyst behavior for both 
deterministic and non-deterministic join conditions.
    
    Author: Sameer Agarwal <samee...@cs.berkeley.edu>
    
    Closes #14815 from sameeragarwal/constraint-inputfile.

commit a6bca3ad02bd896e7637dec37ed8ba1a7306b58c
Author: Yin Huai <yh...@databricks.com>
Date:   2016-08-27T02:38:52Z

    [SPARK-17266][TEST] Add empty strings to the regressionTests of 
PrefixComparatorsSuite
    
    ## What changes were proposed in this pull request?
    This PR adds a regression test to PrefixComparatorsSuite's "String prefix 
comparator" because this test failed on jenkins once 
(https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-sbt-hadoop-2.4/1620/testReport/junit/org.apache.spark.util.collection.unsafe.sort/PrefixComparatorsSuite/String_prefix_comparator/).
    
    I could not reproduce it locally. But, let's this test case in the 
regressionTests.
    
    Author: Yin Huai <yh...@databricks.com>
    
    Closes #14837 from yhuai/SPARK-17266.

commit cc0caa690b32246b076c699ea3f8d8a84797fb94
Author: Reynold Xin <r...@databricks.com>
Date:   2016-08-27T04:41:58Z

    [SPARK-17270][SQL] Move object optimization rules into its own file
    
    ## What changes were proposed in this pull request?
    As part of breaking Optimizer.scala apart, this patch moves various Dataset 
object optimization rules into a single file. I'm submitting separate pull 
requests so we can more easily merge this in branch-2.0 to simplify optimizer 
backports.
    
    ## How was this patch tested?
    This should be covered by existing tests.
    
    Author: Reynold Xin <r...@databricks.com>
    
    Closes #14839 from rxin/SPARK-17270.

commit dcefac438788c51d84641bfbc505efe095731a39
Author: Reynold Xin <r...@databricks.com>
Date:   2016-08-27T05:10:28Z

    [SPARK-17269][SQL] Move finish analysis optimization stage into its own file
    
    ## What changes were proposed in this pull request?
    As part of breaking Optimizer.scala apart, this patch moves various finish 
analysis optimization stage rules into a single file. I'm submitting separate 
pull requests so we can more easily merge this in branch-2.0 to simplify 
optimizer backports.
    
    ## How was this patch tested?
    This should be covered by existing tests.
    
    Author: Reynold Xin <r...@databricks.com>
    
    Closes #14838 from rxin/SPARK-17269.

commit 0243b328736f83faea5f83d18c4d331890ed8e81
Author: Reynold Xin <r...@databricks.com>
Date:   2016-08-27T07:32:57Z

    [SPARK-17272][SQL] Move subquery optimizer rules into its own file
    
    ## What changes were proposed in this pull request?
    As part of breaking Optimizer.scala apart, this patch moves various 
subquery rules into a single file.
    
    ## How was this patch tested?
    This should be covered by existing tests.
    
    Author: Reynold Xin <r...@databricks.com>
    
    Closes #14844 from rxin/SPARK-17272.

commit 5aad4509c15e131948d387157ecf56af1a705e19
Author: Reynold Xin <r...@databricks.com>
Date:   2016-08-27T07:34:35Z

    [SPARK-17273][SQL] Move expression optimizer rules into a separate file
    
    ## What changes were proposed in this pull request?
    As part of breaking Optimizer.scala apart, this patch moves various 
expression optimization rules into a single file.
    
    ## How was this patch tested?
    This should be covered by existing tests.
    
    Author: Reynold Xin <r...@databricks.com>
    
    Closes #14845 from rxin/SPARK-17273.

commit 718b6bad2d698b76be6906d51da13626e9f3890e
Author: Reynold Xin <r...@databricks.com>
Date:   2016-08-27T07:36:18Z

    [SPARK-17274][SQL] Move join optimizer rules into a separate file
    
    ## What changes were proposed in this pull request?
    As part of breaking Optimizer.scala apart, this patch moves various join 
rules into a single file.
    
    ## How was this patch tested?
    This should be covered by existing tests.
    
    Author: Reynold Xin <r...@databricks.com>
    
    Closes #14846 from rxin/SPARK-17274.

commit cd0ed31ea9965563a9b1ea3e8bfbeaf8347cacd9
Author: Takeshi YAMAMURO <linguin....@gmail.com>
Date:   2016-08-27T07:42:41Z

    [SPARK-15382][SQL] Fix a bug in sampling with replacement
    
    ## What changes were proposed in this pull request?
    This pr to fix a bug below in sampling with replacement
    ```
    val df = Seq((1, 0), (2, 0), (3, 0)).toDF("a", "b")
    df.sample(true, 2.0).withColumn("c", 
monotonically_increasing_id).select($"c").show
    +---+
    |  c|
    +---+
    |  0|
    |  1|
    |  1|
    |  1|
    |  2|
    +---+
    ```
    
    ## How was this patch tested?
    Added a test in `DataFrameSuite`.
    
    Author: Takeshi YAMAMURO <linguin....@gmail.com>
    
    Closes #14800 from maropu/FixSampleBug.

commit 40168dbe771ae662ed61851a1f3c677dd14fe344
Author: Peng, Meng <peng.m...@intel.com>
Date:   2016-08-27T07:46:01Z

    [ML][MLLIB] The require condition and message doesn't match in SparseMatrix.
    
    ## What changes were proposed in this pull request?
    The require condition and message doesn't match, and the condition also 
should be optimized.
    Small change.  Please kindly let me know if JIRA required.
    
    ## How was this patch tested?
    No additional test required.
    
    Author: Peng, Meng <peng.m...@intel.com>
    
    Closes #14824 from mpjlu/smallChangeForMatrixRequire.

commit 9fbced5b25c2f24d50c50516b4b7737f7e3eaf86
Author: Robert Kruszewski <robe...@palantir.com>
Date:   2016-08-27T07:47:15Z

    [SPARK-17216][UI] fix event timeline bars length
    
    ## What changes were proposed in this pull request?
    
    Make event timeline bar expand to full length of the bar (which is total 
time)
    
    This issue occurs only on chrome, firefox looks fine. Haven't tested other 
browsers.
    
    ## How was this patch tested?
    Inspection in browsers
    
    Before
    ![screen shot 2016-08-24 at 3 38 24 
pm](https://cloud.githubusercontent.com/assets/512084/17935104/0d6cda74-6a12-11e6-9c66-e00cfa855606.png)
    
    After
    ![screen shot 2016-08-24 at 3 36 39 
pm](https://cloud.githubusercontent.com/assets/512084/17935114/15740ea4-6a12-11e6-83a1-7c06eef6abb8.png)
    
    Author: Robert Kruszewski <robe...@palantir.com>
    
    Closes #14791 from robert3005/robertk/event-timeline.

commit e07baf14120bc94b783649dabf5fffea58bff0de
Author: Sean Owen <so...@cloudera.com>
Date:   2016-08-27T07:48:56Z

    [SPARK-17001][ML] Enable standardScaler to standardize sparse vectors when 
withMean=True
    
    ## What changes were proposed in this pull request?
    
    Allow centering / mean scaling of sparse vectors in StandardScaler, if 
requested. This is for compatibility with `VectorAssembler` in common usages.
    
    ## How was this patch tested?
    
    Jenkins tests, including new caes to reflect the new behavior.
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #14663 from srowen/SPARK-17001.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to