GitHub user avinashkolla opened a pull request:
https://github.com/apache/spark/pull/15118
Branch 2.0
## What changes were proposed in this pull request?
(Please fill in changes proposed in this fix)
## How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration
tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/apache/spark branch-2.0
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15118.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15118
commit f46a074510e47206de9d3b3ac6902af321923ce8
Author: Sylvain Zimmer
Date: 2016-07-28T16:51:45Z
[SPARK-16740][SQL] Fix Long overflow in LongToUnsafeRowMap
Avoid overflow of Long type causing a NegativeArraySizeException a few
lines later.
Unit tests for HashedRelationSuite still pass.
I can confirm the python script I included in
https://issues.apache.org/jira/browse/SPARK-16740 works fine with this patch.
Unfortunately I don't have the knowledge/time to write a Scala test case for
HashedRelationSuite right now. As the patch is pretty obvious I hope it can be
included without this.
Thanks!
Author: Sylvain Zimmer
Closes #14373 from sylvinus/master.
(cherry picked from commit 1178d61ede816bf1c8d5bb3dbb3b965c9b944407)
Signed-off-by: Reynold Xin
commit fb09a693d6f58d71ec042224b8ea66b972c1adc2
Author: Sameer Agarwal
Date: 2016-07-28T20:04:19Z
[SPARK-16764][SQL] Recommend disabling vectorized parquet reader on
OutOfMemoryError
## What changes were proposed in this pull request?
We currently don't bound or manage the data array size used by column
vectors in the vectorized reader (they're just bound by INT.MAX) which may lead
to OOMs while reading data. As a short term fix, this patch intercepts the
OutOfMemoryError exception and suggest the user to disable the vectorized
parquet reader.
## How was this patch tested?
Existing Tests
Author: Sameer Agarwal
Closes #14387 from sameeragarwal/oom.
(cherry picked from commit 3fd39b87bda77f3c3a4622d854f23d4234683571)
Signed-off-by: Reynold Xin
commit 5cd79c396f98660e12b02c0151a084b4d1599b6b
Author: Nicholas Chammas
Date: 2016-07-28T21:57:15Z
[SPARK-16772] Correct API doc references to PySpark classes + formatting
fixes
## What's Been Changed
The PR corrects several broken or missing class references in the Python
API docs. It also correct formatting problems.
For example, you can see
[here](http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.SQLContext.registerFunction)
how Sphinx is not picking up the reference to `DataType`. That's because the
reference is relative to the current module, whereas `DataType` is in a
different module.
You can also see
[here](http://spark.apache.org/docs/2.0.0/api/python/pyspark.sql.html#pyspark.sql.SQLContext.createDataFrame)
how the formatting for byte, tinyint, and so on is italic instead of
monospace. That's because in ReST single backticks just make things italic,
unlike in Markdown.
## Testing
I tested this PR by [building the Python
docs](https://github.com/apache/spark/tree/master/docs#generating-the-documentation-html)
and reviewing the results locally in my browser. I confirmed that the broken
or missing class references were resolved, and that the formatting was
corrected.
Author: Nicholas Chammas
Closes #14393 from nchammas/python-docstring-fixes.
(cherry picked from commit 274f3b9ec86e4109c7678eef60f990d41dc3899f)
Signed-off-by: Reynold Xin
commit ed03d0a690c9a7920a21c858df7f42f9a41f28d7
Author: Wesley Tang
Date: 2016-07-29T11:26:05Z
[SPARK-16664][SQL] Fix persist call on Data frames with more than 200â¦
## What changes were proposed in this pull request?
f12f11e578169b47e3f8b18b299948c0670ba585 introduced this bug, missed
foreach as map
## How was this patch tested?
Test added
Author: Wesley Tang
Closes #14324 from breakdawn/master.
(cherry picked from commit d1d5069aa3744d46abd3889abab5f15e9067382a)
Signed-off-by: Sean Owen
commit efad4aa1468867b36cffb1e8c91f9731c48eca81
Author: Yanbo Liang
Date: 2016-07-29T11:40:20Z
[SPARK-16750][ML] Fix GaussianMixture training failed due to feature column
type mistake
## What changes were proposed in this pull request?
ML ```GaussianMixture``` training failed due to feature column type
mistake. The feature column type sh